A Comprehensive Survey On Graph Summarization With Graph Neural Networks - Shabani
A Comprehensive Survey On Graph Summarization With Graph Neural Networks - Shabani
0, MONTH 2023 1
Abstract—As large-scale graphs become more widespread, more challenging. Additionally, performing fast computations
more and more computational challenges with extracting, pro- with large graphs and visualizing the knowledge they can
cessing, and interpreting large graph data are being exposed. yield is also becoming more difficult. Many claim that faster
It is therefore natural to search for ways to summarize these
arXiv:2302.06114v3 [cs.LG] 4 Jan 2024
expansive graphs while preserving their key characteristics. In and more effective algorithms are needed to overcome these
the past, most graph summarization techniques sought to capture obstacles [1], [2]. However, a growing cohort of researchers
the most important part of a graph statistically. However, today, believe that summarization might hold the answer to this
the high dimensionality and complexity of modern graph data are unyielding problem. Summarization not only helps existing
making deep learning techniques more popular. Hence, this paper algorithms to parse the data faster, it can also compress
presents a comprehensive survey of progress in deep learning
summarization techniques that rely on graph neural networks the data, reduce storage requirements, and assist with graph
(GNNs). Our investigation includes a review of the current state- visualization and sense-making [3].
of-the-art approaches, including recurrent GNNs, convolutional Graph summarization is the process of finding a condensed
GNNs, graph autoencoders, and graph attention networks. A representation of a graph while preserving its key proper-
new burgeoning line of research is also discussed where graph ties [2]. A toy example of a typical graph summarization
reinforcement learning is being used to evaluate and improve the
quality of graph summaries. Additionally, the survey provides process is shown in Figure 1. The process includes removing
details of benchmark datasets, evaluation metrics, and open- the original graph’s objects and replacing them with fewer
source tools that are often employed in experimentation settings, objects of the same type to produce a condensed representation
along with a detailed comparison, discussion, and takeaways of the original graph.
for the research community focused on graph summarization. Most traditional approaches to graph summarization involve
Finally, the survey concludes with a number of open research
challenges to motivate further study in this area. using a conventional machine learning method or a graph-
structured query, such as degree, adjacency, or eigenvector
Impact Statement—Graph summarization is a key task in man- centrality, to find a condensed graphical representation of the
aging large graphs, which are ubiquitous in modern applications.
In this article, we summarize the latest developments in graph graph [2]. A popular summarization technique is to group
summarization methods, offer a more profound understanding structures in the input graph by aggregating the densest
of these methods, and list source codes and available resources. subgraphs [4]. For example, the GraSS model [5] focuses on
The study covers a broad range of techniques, including both accurate query handling and incorporates formal semantics
conventional and deep learning-based approaches, with a par- for answering queries on graph structure summaries based
ticular emphasis on GNNs. We aim to help the researchers
develop a basic understanding of GNN-based methods for graph on a random walk model, while Graph Cube [6] is a data
summarization, benefit from useful resources, and think about warehousing model that integrates both network structure sum-
future directions. marization and attribute aggregation. This model also supports
Index Terms—Deep Learning, Graph Neural Networks, Graph OLAP queries on large multidimensional networks.
Summarization Notably, clustering methods follow a similar approach to
summarization, partitioning a graph into groups of nodes that
I. I NTRODUCTION can be further summarized. Most traditional graph clustering
methods use conventional machine learning and statistical
L ARGE graphs are becoming increasingly ubiquitous.
With the increasing amount of data being generated, large
graphs are becoming more prevalent in modelling a variety of
inference to measure the closeness of nodes based on their
connectivity and structural similarities [7]. For instance, Karrer
domains, such as social networks, proteins, the World Wide et al. [8] used a stochastic block model to detect clusters or
Web, user actions, and beyond. However, as these graphs communities in large sparse graphs. However, another method
grow in size, understanding and analyzing them is becoming of graph summarization focuses more on node selection and
identifying sparse graphs that can be used to derive smaller
We acknowledge the Centre for Applied Artificial Intelligence at graphs [9]. As an example, Doerr et al. [10] introduced a
Macquarie University for funding this study. sampling method based on traversing a graph that begins with
Nasrin Shabani, Jia Wu, Amin Beheshti, Quan Z. Sheng, Venus
Haghighi, Ambreen Hanif, and Maryam Shahabikargar are with the a collection of starting points, e.g., nodes or edges, and then
School of Computing, Macquarie University, NSW 2109, Australia. adds to the sample pool depending on recent information
E-mails: {nasrin.shabani@hdr, jia.wu, amin.beheshti, michael.sheng, about the graph objects. However, despite the popularity of
eujin.foo@hdr, venus.haghighi@hdr, ambreen.hanif@hdr, maryam.shahabi-
kargar@hdr}mq.edu.au. these approaches in the past, they are very computationally-
Corresponding authors: Nasrin Shabani and Amin Beheshti. intensive. They also require a great deal of memory to store,
2 IEEE TRANSACTIONS ON ARTIFICIAL INTELLIGENCE, VOL. 00, NO. 0, MONTH 2023
B. Contributions
II. D EFINITIONS AND BACKGROUND lower dimensional representation while preserving the original
graph’s topology [25].
This section provides an overview of the key definitions and
background information on graph summarization techniques.
III. G RAPH S UMMARIZATION : A N E VOLUTION
• Definition 1 (Graph): Graph G can be represented as a
tuple (V, E), where V denotes the set of nodes or vertices Graph summarization has been playing an important role
{v1 , v2 , ..., vn }, and E represents the set of edges or links in areas such as network analysis, data mining, machine
E = {eij }ni,j=1 connecting node pairs. The graph is learning, and visualization for some time. The evolution of
represented by an n × n dimensional adjacency matrix graph summarization is illustrated in Figure 2, which shows
A = [aij ], with aij being 1 if the edge eij is present in how it has progressed from traditional computing methods to
E and 0 otherwise. If aij is not equal to aji , the graph multi-layer GNNs. This section briefly overviews the three
is directed; otherwise, it is undirected. When edges are different traditional methods within this field and explains the
associated with weights from the set W , the graph is advantages of GNN techniques over traditional ones.
called a weighted network; otherwise, it is an unweighted
network. G is considered labeled if every edge e ∈ E has A. Clustering-based Approaches
an associated label. Additionally, if each node v ∈ V has Graph clustering can be thought of as a graph summariza-
a unique label, the nodes are also labeled; otherwise, G tion technique since it involves grouping together nodes in a
is considered unlabelled. graph that are similar or related and, in so doing, the complex-
• Definition 2 (Graph Summary): Given a graph G, a ity and size of the original graph are reduced. In simpler terms,
summary G(VS , Es ) is a condensed representation of G graph clustering provides a way to compress or summarize a
that preserves its key properties. Graph summarization large and complex graph into a smaller set of clusters, each
techniques involve either aggregation, selection, or trans- of which captures some aspect of the structure or function
formation on a given graph and produce a graph summary of the original graph [1]. Graph summarization techniques
as the output. using clustering can be classified into three main categories:
As outlined in Definition 2, graph summarization ap- structural-based, attribute-based, and structural-attribute-based
proaches fall into three main categories: aggregation, selection, approaches. The latter, combining both structural and attribute
and transformation. While selection approaches make graphs information, is considered the most effective [26]. For ex-
sparser by simply removing objects without replacing them, ample, Boobalan et al. [27] proposed a method called k-
aggregation approaches replace those removed objects with Neighborhood Attribute Structural Similarity (k-NASS) that
similar objects only with fewer of them. For example, a incorporates both structural and attribute similarities of graph
supernode might replace a group of nodes. Similar to selection nodes. This method improves clustering accuracy for complex
and aggregation, the transformation approaches also involve graphs with rich attributes. However, clustering large graphs
removing objects from the graph, but this time the objects with many attributes remains challenging due to high memory
removed are transformed into a different type of object, such and computational requirements.
as an embedding vector [23].
Aggregation. Aggregation is one of the most extensively B. Statistical Inference
employed techniques of graph summarization. Aggregation
Statistical inference techniques for graph summarization
methods can be divided into two main groups: those that
simplify the complexity of the original graph while preserving
involve node grouping and those that involve edge grouping.
its significant characteristics. These techniques fall into two
Node grouping methods group nodes into supernodes, whereas
groups: pattern mining and sampling. Pattern mining identifies
edge grouping methods reduce the number of edges in a graph
representative patterns or subgraphs in the graph to create a
by aggregating them into virtual nodes. Clustering and com-
condensed summary. On the other hand, sampling randomly
munity detection are examples of a grouping-based approach.
selects a subset of nodes or edges from the graph and estimates
Although summarizing graphs is not explicitly the primary
the properties of the entire graph based on this subset. One
objective of these processes, the outputs can be modified into
example of a sampling technique is Node2vec [28], which
non-application-specific summaries [2].
generates random sequences of nodes within a graph, known
Selection. There are two main groups of selection tech-
as walks, to create a graph summary. Various sampling tech-
niques: sampling and simplification. While sampling methods
niques, such as random sampling [28], stratified sampling [29],
focus on picking subsets of nodes and edges from the input
and snowball sampling [30], can be used for graph summa-
graph [24], simplification or sparsification methods involve re-
rization. Each technique has its advantages and disadvantages,
moving less important edges or nodes. In this way, they tend to
and the choice depends on the specific problem and data being
resemble solutions to dimensionality reduction problems [9].
addressed.
Transformation. Graph projection and graph embedding
are two categories of this method. Generally, graph projection
refers to the summarization techniques that transform bipartite C. Goal-driven
graphs with various inter-layer nodes and edges into simple Goal-driven techniques for graph summarization involve
(single-layer) summarized graphs. Conversely, graph embed- constructing a graph summary that is tailored to a specific
ding refers to the techniques that transform a graph into a application or task. They are a powerful tool for capturing
4 IEEE TRANSACTIONS ON ARTIFICIAL INTELLIGENCE, VOL. 00, NO. 0, MONTH 2023
specific features or relationships in a graph that are relevant domains (e.g., graphs and manifold structures). As a deep
to a specific application or task. By optimizing the graph learning approach, GNNs are multi-layer neural networks
summary to a specific goal, it is possible to create a more that learn on graph structures to ultimately perform graph-
effective and efficient summary that can be used to derive related tasks like classification, clustering, pattern mining, and
better insights and make better decisions [31]. Significant goal- summarization [34].
driven techniques for graph summarization include utility- As mentioned, traditional graph summarization approaches
driven and query-driven techniques. Utility-driven techniques are mostly based on conventional machine learning or graph-
aim to summarize large graphs while preserving their essential structured queries, such as degree, adjacency, and eigenvector
properties and structure to maximize their usefulness for centrality, where the aim is to find a condensed graphical
downstream tasks. Human reviewers evaluate the utility of the representation of the whole graph [6]. However, the pairwise
summary against specific tasks like node classification and link similarity calculations involved in these approaches demand a
prediction [32]. Query-driven techniques summarize graphs by considerably high level of computational power. The explicit
identifying relevant subgraphs or patterns using queries in a learning capabilities of GNNs skirt this problem. Additionally,
query language. The resulting subgraph that matches the query powerful models can be built from even low-dimensional
becomes a building block for the graph summary, supporting representations of attributed graphs [36]. Unlike standard
the target downstream task [33]. The choice of the goal-driven machine learning algorithms, with a GNN, there is no need
summarization technique depends on the specific goals of the to traverse all possible orders of the nodes to represent a
analysis, as some techniques may preserve global properties, graph. Instead, GNNs consider each node separately without
while others may capture local structures. It also depends on taking the order of the input nodes into account. This avoids
available computational resources and the complexity and size redundant computations.
of the original graph.
The major advantage of GNN models for graph sum-
marization over traditional methods is the ability to use
D. Why GNNs for Graph Summarization? low-dimensional vectors to represent the features of large
In recent times, deep learning has gained significant promi- graphs [37]. Additionally, the message-passing mechanism
nence and is now considered one of the most effective forms used by GNNs to communicate information from one node
of AI due to its high accuracy. Conventional deep learning to the next has been the most successful learning framework
methods have shown that they perform extremely well with for learning the patterns and neighbours of nodes and the
Euclidean data (e.g., images, signals, and text), and now sub-graphs in large graphs [11]. It is also easy to train a
there are a growing number of applications in non-Euclidean GNN in semi- or unsupervised way to aggregate, select, or
N. SHABANI et al.: A COMPREHENSIVE SURVEY ON GRAPH SUMMARIZATION WITH GRAPH NEURAL NETWORKS 5
t
u3 u3 h2
a12
Z
a13
a11 Encoder Decoder
v v h1
u1 u1 h3
a14
u2 u2 h4
ℎ!( = 𝑓(Σ#$% ! 𝑊ℎ#()* ) ℎ!" = 𝜎(Σ#$% ! 𝑊 " ℎ#")* ) ℎ!" = 𝜎(Σ#$% ! 𝑎&' 𝑊ℎ# )
transform graphs into low dimensional representations [38]. of neighboring node u at time t − 1, which contributes to
In this regard, recent successes in graph summarization with the update of node v’s hidden state at the previous time
GNNs point to promising directions for new research. For step, reflecting the influence of neighboring nodes on v’s
example, the GNN models developed by Brody et al. [39] and representation. W is the weight matrix used for aggregating
Goyal et al. [40] represent dynamic graphs in low dimensions, the hidden states of neighboring nodes.
providing a good foundation for popularizing GNNs into more Here, f (·) is usually a simple element-wise activation like
complex dynamic graphs. ReLU, tanh, or sigmoid. This simple activation is typically
used to introduce non-linearity and capture complex patterns
IV. G RAPH S UMMARIZATION WITH GNN S in the node representations. However, this can be replaced
This section provides an overview of recent research into by recurrent update functions, which use gated mechanisms
graph summarization with GNNs. Each subsection covers four like LSTM (Long Short-Term Memory) [42] or GRU (Gated
main categories of approach – these being: Recurrent Graph Recurrent Units) [43] cells. In this case, each node’s hidden
Neural Networks (RecGNNs), Convolutional Graph Neural state update would be computed using an LSTM or GRU
Networks (ConvGNNs), Graph Autoencoders (GAEs), and cell, which is more complex and sophisticated than simple
Graph Attention Networks (GATs). Four different types of element-wise activation functions. These cells determine how
GNN models are shown in Figure 3. Within each subsection, the hidden state of node v at time t is updated based on its
we first provide a brief introduction about the architecture of current and previous hidden states and the information from
the GNN model and then review the most notable approaches its neighboring nodes, enabling the model to capture temporal
and the contributions each has made to the field. At the end dependencies in the dynamic graph-structured data [44].
of each subsection, we provide a comprehensive summary of RecGNN-based approaches for graph summarization mostly
the key features of representative GNN-based approaches. We focus on graph sampling and embedding by generating se-
present a comparative analysis in Tables I, II, III, and IV for quences from graphs and embedding those sequences into a
RecGNN, ConvGNN, GAE, and GAT architectures, respec- continuous vector space at lower dimensions. In the following,
tively. The tables include comparisons of evaluation methods, we will first briefly introduce LSTM and GRU architectures
performance metrics, training data, advantages, and limitations and then delve into the graph summarization approaches that
across the different models. are built upon their respective structures.
1) LSTM-based Approaches: LSTMs are a class of RNNs
that use a sequence of gates to concurrently model long- and
A. RecGNN-based Approaches short-term dependencies in sequential data [54]. The modified
RecGNNs are early implementations of GNNs, designed to architecture of an LSTM to handle large graph-structured data
acquire node representations through a generalized recurrent is known as a GraphLSTM [55]. Typically, the input to the
neural network (RNN) architecture. Within these frameworks, model consists of a sequence of either graph nodes or edges,
information is transmitted between nodes and their neighbours which are processed in order using LSTM units. At each time
to reach a stable equilibrium [34], [41]. A node’s hidden state step, the model updates its internal state based on the input
is continuously updated by node or edge and the previous state, as shown in Equation 2.
X
htv = f ( W ht−1
X
u ) (1) htv = LST M ( W hut−1 ) (2)
u∈N (v) u∈N (v)
where htv
is the hidden state of node v at time t, which The cell is composed of multiple gates, and its operation can
represents the information learned by the RecGNN about node be described as follows [55]:
v at a specific time step in the dynamic graph sequence.
N (v) denotes the set of neighboring nodes of node v in the fvt = σ(Wf × [ht−1 t
v , xv ] + bf ) (3)
graph, providing the context and connectivity information for
node v within the graph structure. ht−1u is the hidden state itv = σ(Wi [ht−1 t
v , xv ] + bi ) (4)
6 IEEE TRANSACTIONS ON ARTIFICIAL INTELLIGENCE, VOL. 00, NO. 0, MONTH 2023
TABLE I
C OMPARATIVE ANALYSIS OF SELECTED R EC GNN- BASED APPROACHES FOR G RAPH S UMMARIZATION .
Jin et al. [46] also developed an approach to learning cells to capture temporal dependencies within graph-structured
representations of graphs based on a graph LSTM. Here, data allows for effective information aggregation and context
graph representations of diverse sizes are encoded into low- modeling, making GRU-based methods a promising choice
dimensional vectors. Li et al. [57] proposed a graph summa- for summarizing complex graph-structured information. For
rization technique that uses a graph LSTM and a ConvGNN instance, Taheri et al. [52] proposed the DyGrAE model,
to improve question answering with knowledge graphs. In this which is able to learn the structure and temporal dynamics
approach, the questions, entities, and relations are represented of a dynamic graph while condensing its dimensions. A GRU
as vectors with very few dimensions, but the key properties of model captures the graph’s topology, while an LSTM model
the relations are well preserved. learns the graph’s dynamics. Ge et al. [53] developed a gated
Several studies have also focused on evolving node patterns recursive algorithm that not only solves some node aggregation
in dynamic graphs. For instance, Zhang et al. [56] intro- problems but also extracts deeply dependent features between
duced an LSTM-based approach, a one-stage model called nodes. The resulting model, called GR-GNN, is based on a
DynGNN. The model embeds an RNN into a GNN model GRU which performs the aggregation and structure. Li et al.’s
to produce a representation in compact form. Khoshraftar GRU model [51] encodes an input graph into a fixed-size vec-
et al. [50] presented a dynamic graph embedding method tor representation, which is then fed into a sequence decoder
via LSTM to convert a large graph into a low-dimensional to generate the summary as the output. The model effectively
representation. The model captures temporal changes with captures the structural information and dependencies among
LSTM using temporal walks and then transfers the learned pa- the nodes and edges in the input graph, which is crucial for
rameters into node2vec [28] to incorporate the local structure producing a coherent and informative graph summary.
of each graph. Similarly, Ma et al. [49] introduced a dynamic
RecGNN model that relies on a graph LSTM to model the B. ConvGNN-based Approaches
dynamic information in an evolving graph while reducing the
The general idea of ConvGNN-based approaches is to
graph’s dimensionality and learning manifold structures. Node
generalize the CNNs on graph-structured data [34]. The pri-
information is continuously updated by: recording the time
mary distinction between a ConvGNN and a RecGNN is
intervals between edges; recording the sequence of edges; and
the way information is propagated. While ConvGNNs apply
coherently transmitting information between nodes. Another
various weights at each timestep, RecGNNs apply the same
work by Goyal et al. [40] also presents a method for learning
weight matrices in an iterative manner until an equilibrium is
temporal transitions in dynamic graphs. This framework is
reached [69].
based on a deep architecture that mainly consists of dense
In other words, ConvGNN models are a form of neural
and recurrent layers. Model size and the number of weights to
network architecture that supports graph structures and ag-
be trained can be a problem during training, but the authors
gregates node information from the neighbourhood of each
overcome this issue with a uniform sampling of nodes.
node in a convolutional manner. ConvGNN models have
2) GRU-based Approaches: GRUs are a variant of graph
demonstrated a strong expressive capacity for learning graph
LSTMs that include a gated RNN structure and have fewer
representations, resulting in superior performance with graph
training parameters than a standard graph LSTM. The key
summarization [69].
distinction between a GRU and an LSTM is the number of
ConvGNN-based approaches fall into two categories:
gates in each model. GRU units are less complex with only
spectral-based and spatial-based methods [34].
two gates, “reset” and “update” [58].
1) Spectral-based Approaches: Spectral-based methods de-
scribe graph convolutions based on spectral graph theory and
rvt = σ(Wr [ht−1 t
v , xv ] + br ) (11)
graph signal filtering. In spectral graph theory, the multiplica-
tion of the graph with a filter (the convolution) is defined in
zvt = σ(Wz [ht−1 t
v , xv ] + bz ) (12) a Fourier domain [70].
Although the computation contains well-defined transla-
tional properties, it is relatively expensive, and the filters are
Cvt = tanh(WC [rvt × ht−1 t
v , xv ] + bC ) (13)
not generally localized. Since the level of complexity grows
with the scale of the graphs, one solution is to only check a
htv = (1 − zvt ) × ht−1
v + zvt × Cvt (14) limited number of neighbours using Chebyshev’s theory [71].
The Chebyshev polynomials Tk (x) are defined recursively as:
In these equations,rvt is a reset gate and zvt is update gate.
ht−1
v is the output of the model at the previous time step.
Tk (x) = 2xTk−1 (x) − Tk−2 (x) (15)
Similar to LSTM, Cvt computes the new candidate value, and
htv updated hidden state for node v at time t using the update where T0 (x) = 1 and T1 (x) = x. Here, x represents the
gate and the new candidate values. bx , Wx are biases and variable of the Chebyshev polynomial. k is a non-negative
weights for respective gates. integer that determines the degree of the polynomial which
Again, by repeating this process over several time steps, the is the order of the Chebyshev polynomial. It is a polynomial
model learns the dependencies that exist between the nodes in function of degree k, and its value depends on the values of
the graph, allowing it to construct a final hidden state that sum- x and k as defined by the recurrence relation mentioned in
marizes all the graph’s information. The adaptability of GRU Equation 15.
8 IEEE TRANSACTIONS ON ARTIFICIAL INTELLIGENCE, VOL. 00, NO. 0, MONTH 2023
TABLE II
C OMPARATIVE ANALYSIS OF SELECTED C ONV GNN- BASED APPROACHES FOR G RAPH S UMMARIZATION .
This theory has led to several studies that explore the idea Convolutional Network) model, which has since become a
of applying approximation to the spectral graph convolution. popular choice for various graph-related tasks. Given an undi-
For example, Defferrard et al. [72] generalized the CNNs to rected graph with an adjacency matrix A, they computed the
graphs by designing localised convolutional filters on graphs. normalized graph Laplacian L e as follows:
Their main idea was to leverage the spectral domain of graphs
and use Chebyshev polynomial approximation to efficiently e = I − D−1/2 AD−1/2
L (17)
compute localized filters as follows:
where I is the identity matrix, and D is the diagonal degree
K−1
X matrix, with Dii representing the sum of the weights of the
Z= θk Tk (L)X
e (16) edges connected to node i.
k=0
In this work, instead of directly computing graph convolu-
where L represents a graph Laplacian, X is the node feature tions using high-order Chebyshev polynomials, as done in the
matrix, θ is the graph convolutional operator with a filter pa- previous work by Defferrard et al. [72], Kipf et al. proposed
rameter, L e = 2L/I −I, and
e is a scaled Laplacian defined as L using a simple first-order approximation of graph filters. They
K is the order of the Chebyshev polynomial approximation. defined the graph convolution operation as [59]:
The θk parameters are the learnable filter coefficients. They
also introduced a graph summarization procedure that groups e −1/2 A
H (l+1) = σ(D eDe −1/2 H (l) W (l) ) (18)
similar vertices together and a graph pooling procedure that
focuses on producing a higher filter resolution. This work where H (l) represents the hidden node features at layer l.
has been used as the basis of several studies that draw on W (l) is the weight matrix for the layer l. Ae = A + I is
Chebyshev polynomials to speed up convolution computations. the adjacency matrix with self-loops added. And, D e is the
As a variant, Kipf et al. [59] introduced several simpli- diagonal degree matrix of A. e Here, the normalized graph
fications to the original framework to improve the model’s Laplacian D e −1/2 A
eDe −1/2 is used to aggregate information
classification performance and scalability to large networks. from neighboring nodes. Hence, the propagation can be written
These simplifications formed the basis of the GCN (Graph as follows:
N. SHABANI et al.: A COMPREHENSIVE SURVEY ON GRAPH SUMMARIZATION WITH GRAPH NEURAL NETWORKS 9
a novel approach called GroupINN, which enhances Con- where G′ represents the reconstructed graph, which can consist
vGNN through summarization, leading to faster training, data of either reconstructed features, graph structure, or both. A is
denoising, and improved interpretability. They employed an the adjacency matrix, and X is the input node feature matrix.
end-to-end neural network model with a specialized node fE serves as the graph encoder, responsible for transforming
grouping layer, which effectively summarizes the graph by the graph and node features into a condensed representation.
reducing its dimensionality. Hu et al. [76] took structural Conversely, fD acts as the graph decoder, responsible for
similarity into consideration, aggregating nodes that share a reconstructing the original graph or its components from the
similar structure into hypernodes. The summarized graph is latent representation.
then refined to restore each node’s representation. To this GAEs can be trained using various loss functions, such as
end, a deep hierarchical ConvGNN (H-GCN) architecture mean squared error (MSE) or binary cross-entropy (BCE).
with several coarsening operation layers followed by several They can also be extended to incorporate additional constraints
refinement operation layers performs semi-supervised node or regularization techniques to improve the quality of the
classification. The refinement layers are designed to restore the learned graph representation [88]. For example, for graph
structure of the original graph for graph classification tasks. reconstruction, the goal is to minimize the difference between
The most recent developments in ConvGNNs demonstrate the original adjacency matrix A and the reconstructed adja-
the exciting potential graph summarization holds for a range cency matrix Â. The MSE loss is calculated as follows [78]:
of applications in healthcare and human motion analysis [68], 1 X
[77]. Wen et al. [68], for example, presented a promising LM SE = (Aij − Âij )2 (24)
N × N i,j
approach to diagnosing autism spectrum disorder by pars-
ing brain structure information through a multi-view graph where, N is the number of nodes in the graph, and Aij
convolution network. Dang et al. [77] introduced a new type and Âij are the elements of the original and reconstructed
of graph convolution network, called a multi-scale residual adjacency matrices, respectively.
graph convolution network, that shows superior performance The majority of GAE-based approaches for graph summa-
in predicting human motion compared to other state-of-the-art rization use combined architectures that include ConvGNNs
models. or RecGNNs [78], [89], [80]. For example, Kipf et al. [78]
proposed a variational graph autoencoder (VGAE) for undi-
C. GAE-based Approaches rected graphs based on their previous work on spectral convo-
An autoencoder is a neural network that consists of an lutions [59]. VGAE incorporates a two-layer ConvGNN model
encoder and a decoder. Generally, the encoder transforms based on the variational autoencoder in [89]. The main concept
the input data into a condensed representation, while the of VGAE is to represent the input graph data not as a single
decoder reconstructs the actual input data from the encoder’s point in the latent space but as a probability distribution.
output [85]. Graph autoencoders, or GAEs, are a type of This distribution captures the uncertainty and variability in
GNN that can be applied over graph structures, allowing the the graph’s latent representation. Instead of directly obtaining
model to learn a compact and informative representation of a a fixed latent representation from the encoder, VGAE samples
graph. Lately, GAEs have garnered increasing interest for their a random point from the learned distribution. The encoder in
ability to summarize graphs due to their significant potential VGAE typically consists of two or more graph convolutional
for dimensionality reduction [36]. layers that process the input graph data and produce latent
The structure of the encoder and decoder in a GAE can vary node representations. Each graph convolutional layer can be
depending on the specific implementation and the complexity defined as follows [78]:
of the graph data. Generally, both the encoder and decoder
e −1/2 A
Z (l+1) = σ(D eDe −1/2 Z (l) W (l) ) (25)
are neural network architectures that are designed to process
graph data efficiently [86]. The architecture of the encoder may (l)
where Z represents the latent node representations at layer
include multiple layers of graph convolutions or aggregations, l of the encoder. A
e is the adjacency matrix of the graph with
followed by non-linear activation functions. The output of the added self-loops. D e W (l)
e is the diagonal degree matrix of A.
encoder is a compact and informative representation of the is the weight matrix for the lth layer. σ(·) is the activation
graph in the latent space. On the other hand, the decoder function (e.g., ReLU) applied element-wise.
takes the latent representation obtained from the encoder The VGAE introduces stochasticity to GAEs by sampling
and reconstructs the original graph structure from it. The the latent representation Z from a Gaussian distribution in
decoder’s architecture should mirror the encoder’s architecture the latent space. The mean µ and log-variance log σ 2 of the
in reverse. It transforms the latent representation back into a distribution are obtained from the output of the last graph
graph structure [11]. convolutional layer:
The goal of a GAE is to learn an encoder and decoder that
reduces the gap between the original graph and the reconstruc- µ = Z (L) · W (µ) (26)
tion error of the decoded graph, while also encouraging the 2
log σ = Z (L)
·W (σ)
(27)
latent representation to capture meaningful information about
the graph’s structure [87]: Here, L represents the last layer of the encoder. W (µ) and
W (σ) are learnable weight matrices for obtaining the mean
Z = fE (A, X), G′ = fD (A, Z) (23) and log-variance, respectively.
N. SHABANI et al.: A COMPREHENSIVE SURVEY ON GRAPH SUMMARIZATION WITH GRAPH NEURAL NETWORKS 11
TABLE III
C OMPARATIVE ANALYSIS OF GAE- BASED APPROACHES FOR G RAPH S UMMARIZATION .
To sample from the Gaussian distribution, the reparameter- problem [82], [79], [81], [84]. For example, Cai et al. [82]
ization trick [90] is used. A random noise vector ϵ is sampled suggested a graph recurrent autoencoder model for use in clus-
from a standard Gaussian distribution (ϵ ∼ N (0, 1)). The tering attributed multi-view graphs. The fundamental concept
sampled latent representation Z is then computed as: behind the approach is to consider both the characteristics
1 that all views have in common and those that make each
Z = µ + ϵ · exp( log σ 2 ) (28) graph view special. To this end, the framework includes two
2
separate models: the Global Graph Autoencoder (GGAE) and
Finally, the decoder maps the sampled latent representation
the Partial Graph Autoencoder (PGAE). The purpose of the
Z back into the graph space. In VGAE, the reconstruction is
GGAE is to learn the characteristics shared by all views, while
typically performed using an inner product between the latent
the PGAE captures the distinct features. The cells are grouped
node representations to predict the adjacency matrix Â:
into clusters using a soft K-means clustering algorithm after
 = σ(Z · Z T ) (29) the output is obtained. Fan et al. [81] introduced the One2Multi
Graph Autoencoder (OMGAE) for multi-view graph cluster-
Here, σ(·) is the sigmoid activation function to ensure that
ing. OMGAE leverages a shared encoder to learn a common
the predicted adjacency matrix  is within the range [0, 1].
representation from multiple views of a graph and uses multi-
The loss function in VGAE consists of two terms: a
ple decoders to reconstruct each view separately. Additionally,
reconstruction loss and a kullback-leibler (KL) divergence
OMGAE introduces a new attention mechanism that assigns
loss [91]. The reconstruction loss measures the difference
different weights to each view during the clustering process
between the predicted adjacency matrix  and the actual
based on their importance. The model is trained to minimize a
adjacency matrix A. The KL divergence loss penalizes the
joint loss function that considers both the reconstruction error
deviation of the learned latent distribution from the standard
and the clustering performance. Mrabah et al. [84] devised a
Gaussian distribution. The overall loss function is the sum of
new graph autoencoder model for attributed graph clustering
these two losses as follows:
called GAE-KL. The model uses a new formulation of the
L = Eq(Z|X,A) [logp(A|Z)] − KL[a(Z|X, A)||p(Z)] (30) objective function, which includes a KL-divergence term, to
learn a disentangled representation of the graph structure and
As an extension to VGAE, Hajiramezanali et al. [80]
the node attributes. The disentangled representation is then
constructed a variational graph RNN by integrating a RecGNN
used to cluster the nodes based on their similarity in terms
and a GAE to model the dynamics of the node attributes and
of both structure and attributes. The authors also introduced
the topological dependencies. The aim of the model is to learn
a new evaluation metric called cluster-based classification
an interpretable latent graph representation as well as to model
accuracy (CCA) to measure clustering performance.
sparse dynamic graphs.
There are also several aggregation-based approaches built Recently, Salha et al. [83] proposed a graph autoencoder
on GAEs. These are generally designed to formulate the architecture that uses one-hop linear models to encode and
challenges with graph clustering tasks as a summarization decode graph information. The approach simplifies the model
12 IEEE TRANSACTIONS ON ARTIFICIAL INTELLIGENCE, VOL. 00, NO. 0, MONTH 2023
while still achieving high performance with graph summariza- internal processes in the GAT equations, as shown in Equation
tion tasks, such as node clustering and graph classification. 34.
Uniquely, this paper presents a direction for designing graph
autoencoder models that balances performance with simplicity. eij = aT LeakyReLU (W.[hi ||hj ]) (34)
As another variation of GAT, Xie et al. [93] proposed a
D. GAT-based Approaches novel multi-view graph attention network named MGAT, to
support low-dimensional representation learning based on an
The idea of an attention mechanism was first proposed by attention mechanism in a multi-view manner. The authors
Bahdanau and his colleagues in 2014 [98]. The goal was to focus on a view-based attention approach that not only ag-
allow for modelling long-term dependencies in sequential data gregates view-based node representations but also integrates
and to improve the performance of autoencoders. Essentially, various types of relationships into multiple views.
attention allows the decoder to concentrate on the most rele- Tu et al. [95] explored the benefits of using graph sum-
vant part of the input sequence with the most relevant vectors marization and refining bipartite user-item graphs for recom-
receiving the highest weights. Graph attention networks or mendation tasks. They applied a conditional attention mecha-
GATs [92] are based on the same idea. They use attention- nism to task-based sub-graphs to determine user preferences,
based neighborhood aggregation, assigning different weights which emphasizes the potential of summarizing and enhancing
to the nodes in a neighborhood. This type of model is one of knowledge graphs to support recommender systems. Salehi et
the most popular GNN models for node aggregation, largely al. [94] defined a model based on an autoencoder architecture
because it reduces storage complexity along with the number with a graph attention mechanism that learns low-dimensional
of nodes and edges. The key formulation for a GAT is: representations of graphs. The model compresses the informa-
X tion in the input graph into a fixed-size latent vector, which
hi = σ( αij W hj ) (31)
serves as a summary of the entire graph. Through the use of
i∈N (j)
attention, the model is able to discern and prioritize critical
where hi is the hidden feature vector of node ui , N (j) is nodes and edges within the graph, making it more effective at
the set of neighbouring nodes of ui , hj is the hidden state of capturing the graph’s structural and semantic properties.
neighbouring node uj , W is a weight matrix, and αij is the More recent works on GATs conducted by Chen et al. [96]
attention coefficient that measures the importance of node uj and Li et al. [97] demonstrate the potential of graph attention
to node ui .The attention coefficients are computed as: networks for summarizing and analyzing complex graph data
in various domains. Chen et al. proposed a multi-view graph
exp(eij )
αij = sof tmaxj (eij ) = P (32) attention network for travel recommendations. The model
k∈Ni exp(eik ) takes several different types of user behaviors into account,
where eij is a scalar energy value computed as: such as making hotel reservations, booking flights, and leaving
restaurant reviews, and, in the process, learns an attention
eij = LeakyReLU (aT .[W hi ||W hj ]) (33) mechanism to weigh the importance of different views for a
recommendation. Li et al. developed a multi-relational graph
where a is a learnable parameter vector, and || denotes attention network for knowledge graph completion. The model
concatenation. The LeakyReLU function introduces non- integrates an attention mechanism and edge-type embeddings
linearity into the model and helps prevent vanishing gradients. to capture the complex semantic relations between entities in
The softmax function normalizes the energy values across all a knowledge graph.
neighboring nodes so that the attention coefficients sum to one.
By computing attention coefficients for neighboring nodes,
V. G RAPH R EINFORCEMENT L EARNING
GATs are able to selectively focus on the most important
parts of the graph for each node. This allows the model to Reinforcement learning (RL) is a mathematical model based
adaptively adjust to different graph structures and tasks. The on sequential decisions that allows an agent to learn via trial
attention mechanism also means GATs can incorporate node and error in an interactive setting through feedback on its
and edge features into the model, making them well-suited to actions. Due to the success and fast growth of reinforcement
summarization tasks, such as node classification with complex learning in interdisciplinary fields, scholars have recently
graphs [99]. been inspired to investigate reinforcement learning models
Today, GATs are considered to be one of the most advanced for graph-structured data, i.e., graph reinforcement learning
models for learning with large-scale graphs. However, recently or GRL [100]. GRL is largely implemented based on the
Brody et al. [39] argued that GATs do not actually compute Bellman theory [101], where the environment is represented
dynamic attention; rather, they only compute a restricted form as a graph, nodes represent states, edges represent possible
of static attention. To support their claim, they introduced transitions between states, and rewards are associated with
GATv2, a new version of this type of attention network, which specific state-action pairs or nodes. The key components of
is capable of expressing problems that require computing GRL are as follows [100]:
dynamic attention. Focusing on the importance of dynamic • Environment (graph): The graph G represents the en-
weights, these authors argue that the problem of only support- vironment in which the agent operates. It is defined as
ing static attention can be fixed by changing the sequence of G = (V, E), where V is the set of nodes representing
N. SHABANI et al.: A COMPREHENSIVE SURVEY ON GRAPH SUMMARIZATION WITH GRAPH NEURAL NETWORKS 13
TABLE IV
C OMPARATIVE ANALYSIS OF GAT- BASED APPROACHES FOR G RAPH S UMMARIZATION .
states and E is the set of edges representing possible π(a|s) = Probability of taking action a in state s.
transitions between states. • Value Function (V ) and Q-function (Q): In Graph RL, the
• State (s): In GRL, a state s corresponds to a specific node value function V(s) and the Q-function Q(s, a) represent
in the graph. Each node may have associated attributes the expected cumulative reward the agent can obtain start-
or features that provide information about the state. ing from a particular state (node) and following a policy
• Action (a): An action a corresponds to a decision or π, or by taking action a in state s and then following
move that the agent can make when in a particular policy π, respectively. The Q-learning algorithm can be
state (node). In graph-based environments, actions can be formulated as:
related to traversing edges between nodes or performing
some operation on a node. Q(st , at ) ← Q(st , at ) + α[Rt+1 +
• Transition Model (T ): The transition model defines the γmaxa Q(st+1 , a) − Q(st , at )] (35)
dynamics of the graph, specifying the probability of where at each timestep t, the state st interacts with the
moving from one state (node) to another by taking a environment using a behavior policy based on the Q-
specific action (edge). table values. It takes action a, receives a reward R, and
T (s, a, s′ ) = P (s′ |s, a), where s is the current state transitions to a new state st+1 based on the environment’s
(node), a is the action (edge), and s′ is the next state feedback. This process is used to update the Q-table
(node). iteratively, continually incorporating information from the
• Reward Function (R): The reward function defines the new state st+1 until reaching the termination time t.
immediate reward the agent receives after taking a par-
The primary objective in GRL is to acquire a policy that
ticular action in a given state (node).
maximizes the expected Q-function Q(s, a) over a sequence
R(s, a) = Expected immediate reward received when
of actions, the target policy is defined as [102]:
taking action a in state s.
• Policy (π): Similar to standard RL, the policy in Graph π ∗ = arg max Q(s, a)
π
RL is a strategy that the agent uses to decide which action X
to take in each state (node). = arg max Eπ,T [ γ k rt+k |st = s, at = a] (36)
π
k=0
14 IEEE TRANSACTIONS ON ARTIFICIAL INTELLIGENCE, VOL. 00, NO. 0, MONTH 2023
where Eπ,T [·] denotes the expectation with respect to both the In another study, Wickman et al. [108] recently presented
policy π and the distribution of transitions T (i.e., state tran- a graph sparsification framework, SparRL, empowered by a
sitions and rewards). The expression k=0 γ k rt+k represents
P
GRL to be used for any edge sparsification assignment with
the sum of discounted rewards obtained in the future starting a specific target for reduction. The model takes an edge
from time step t (the current time step) and continuing for k reduction ratio as its input, and a learning model decides
steps into the future. rt+k is the reward obtained at time step how best to prune the edges. SparRL proceeds in a sequential
t+k after taking action at at time step t and following policy π manner, removing edges from the graph until a total of
thereafter. The discount factor γ is a value between 0 and 1 that edges have been pruned. In another work, Wu et al. [48]
determines the importance of immediate rewards compared to introduced GSGAN, a novel method for graph sparsification
future rewards. It discounts future rewards to make them less in community detection tasks. GSGAN excels at identifying
significant than immediate rewards. Smaller γ values make the crucial relationships not apparent in the original graph and
agent more myopic, whereas larger γ values make the agent enhances community detection effectiveness by introducing
more far-sighted. The overall objective is to find the policy π artificial edges. Employing a generative adversarial network
that maximizes the expected sum of rewards (the Q-function) (GAN) model, GSGAN generates random walks that effec-
starting from state s and taking action a. tively capture the network’s underlying structure. What sets
Achieving this goal involves employing various algorithms, this approach apart is its utilization of reinforcement learning,
like Q-learning, or utilizing a policy gradient method that which enables the method to optimize learning objectives by
updates Q-values or policy parameters based on observed deriving rewards from a specially designed reward function.
rewards and transitions [86]. This reinforcement learning component guides the generator
GRL employs a diverse range of algorithms, and it fre- to create highly informative random walks, ultimately leading
quently utilizes GNNs to efficiently process and learn from to improved performance in community detection tasks. Yan et
data structured as graphs. GNNs play a crucial role in updating al. [111] introduced a GRL approach to summarize geographic
node representations by considering their neighboring nodes, knowledge graphs. To obtain a more thorough understanding
and they are seamlessly integrated into the RL framework to of the summarizing process, the model exploits components
handle tasks specific to graphs with effectiveness. They are with spatial specificity and includes both the extrinsic and the
seamlessly integrated into the graph summarization frame- intrinsic information in the graph. The authors also discuss the
work to effectively handle tasks that involve summarizing effectiveness of spatial-based models and compare the results
graph structures. For instance, Yan et al. [103] introduced of their model with models that include non-spatial entities.
a ConGNN-based neural network specifically designed for Recently, many articles have discussed the potential of using
graph sampling, enabling the automatic extraction of spatial GNN-based GRLs to summarize and analyze complex graph
features from the irregular graph topology of the substrate data in domains like neuroscience and computer vision [109],
network. To optimize the learning agent, they adopt a popular [110]. For example, Zhao et al. [109] suggested a deep
parallel policy gradient training method, enhancing efficiency reinforcement learning scheme guided by a GNN as a way
and robustness during training. Wu et al. [104] tackled the to analyze brain networks. The model uses a reinforcement
problem of graph signal sampling by formulating it as a learning framework to learn a policy for selecting the most
reinforcement learning task in a discrete action space. They informative nodes in the network and combines that with a
use a deep Q-network (DQN) to enable the agent to learn an GNN to learn the node representations. Also, Goyal et al. [110]
effective sampling strategy. To make the training process more presented a GNN-based approach to image classification that
adaptable, they modify the steps and episodes. During each relies on reinforcement learning. The model represents images
episode, the agent learns how to choose a node at each step as graphs and learns graph convolutional filters to extract
and selects the best node at the end of the episode. They also features from the graph representation. They showed that their
redefine the actions and rewards to suit the sampling problem. model outperforms several state-of-the-art methods on bench-
In another work by Wu et al. [105], a reinforced sample mark datasets with both image classification and reinforcement
selection approach for GNNs’ transfer learning is proposed. learning tasks.
The approach uses GRL to guide transfer learning and reduce In Table V, we summarize the key features of representative
the divergence between the source and target domains. GRL-based approaches for graph summarization. Evaluation
There is also a line of GRL research that seeks to use methods, performance metrics, training data, advantages, and
this paradigm to evaluate and improve the quality of graph limitations are compared among different models.
summaries. For example, Amiri et al. [106] introduced a
task-based GRL framework to automatically learn how to
VI. P UBLISHED A LGORITHMS AND DATASETS
generate a summary of a given network. To provide an optimal
solution for finding the best task-based summary, the authors In the following section, we will offer a comprehensive
made use of CNN layers in combination with a reinforcement overview of the benchmark datasets, evaluation metrics, and
learning technique. To improve the quality of the summary, the open-source implementations. These critical components are
authors later proposed NetReAct [107], an interactive learning extensively examined and elaborated upon in Sections IV
framework for graph summarization. The model uses human and V of the literature survey. By delving into these aspects,
feedback in tandem with reinforcement learning to improve we aim to provide a thorough understanding of the landscape
the summaries, while visualizing the document network. covered in the aforementioned sections.
N. SHABANI et al.: A COMPREHENSIVE SURVEY ON GRAPH SUMMARIZATION WITH GRAPH NEURAL NETWORKS 15
TABLE V
C OMPARATIVE ANALYSIS OF SELECTED GRL- BASED APPROACHES FOR G RAPH S UMMARIZATION .
TABLE VI
P UBLISHED DATASETS .
Category Dataset Publications URL
[59], [78], [99], [79], [65],
Cora [74], [60], [61], [76], [94], https://relational.fit.cvut.cz/dataset/CORA
[83], [53], [84], [47]
[59], [78], [99], [79], [74],
Citeseer [61], [76], [94], [83], [53], https://relational.fit.cvut.cz/dataset/CiteSeer
[84], [47]
Citation Networks
[59], [78], [99], [74], [61],
PubMed [76], [94], [83], [53], [84], https://relational.fit.cvut.cz/dataset/PubMed Diabetes
[47]
DBLP [81], [82], [47], [50] https://dblp.uni-trier.de/xml/
ACM [81], [82], [50] http://www.arnetminer.org/openacademicgraph
[64], [65], [74], [60], [61],
Reddit https://github.com/redditarchive/reddit
[66], [62], [83], [56]
IMDB [81], [82] https://datasets.imdbws.com/
Karate [106] http://networkdata.ics.uci.edu/data/karate/
Social Networks Facebook [106], [80], [108], [48] http://snap.stanford.edu/data/ego-Facebook.html
DNC [49] https://github.com/alge24/DyGNN/tree/main/Dataset
UCI [49] https://github.com/alge24/DyGNN/tree/main/Dataset
Twitter [93], [108] http://snap.stanford.edu/data/ego-Twitter.html
Amazon [66], [108] http://snap.stanford.edu/data/amazon-meta.html
Yelp [66], [62], [95], [56] https://www.yelp.com/dataset
Epinions [49] http://snap.stanford.edu/data/soc-Epinions1.html
Taobao [75] https://tianchi.aliyun.com/dataset/649
MovieLens [95], [96] https://grouplens.org/datasets/movielens/
User-generated Networks
Last-FM [95] https://grouplens.org/datasets/hetrec-2011/
Eumail [48] https://snap.stanford.edu/data/email-EuAll.html
Enron [80] https://snap.stanford.edu/data/email-Enron.html
POL. BLOGS [47] https://networks.skewed.de/net/polblogs
[64], [99], [61], [66], [62],
PPI https://github.com/williamleif/GraphSAGE
[83], [56]
MUTAG [45], [46], [94] https://networkrepository.com/Mutag.php
PTC [45] http://www.predictive-toxicology.org/ptc/
https://github.com/snap-
Bio-informatic Networks, ENZymes [45], [46]
stanford/GraphRNN/tree/master/dataset/ENZYMES
Image/Neuroimage
NCI [45], [46] https://cdas.cancer.gov/
Flickr [66], [62] https://shannon.cs.illinois.edu/DenotationGraph/
fMRI [67], [109] https://adni.loni.usc.edu/data-samples/access-data/
ADNI [63] https://adni.loni.usc.edu/data-samples/access-data/
ABIDE [63] https://fcon˙1000.projects.nitrc.org/indi/abide/
Knowledge Graphs [111], [59], [76], [103]
Synthetic Networks [106], [107], [40], [39], [56]
efficient. Notably, RecGNN models can also improve numer- decode graph data, GAEs can generate compact representa-
ical stability during training if they incorporate convolutional tions that preserve essential topological information. However,
filters. However, they may face challenges in long-range the quality of the summarization is heavily dependent on the
dependency modeling due to the vanishing gradient problem, choice of the encoder and decoder, which can be a non-
a common issue in recurrent architectures. trivial design choice. In addition, most GAE-based approaches
ConvGNN-based approaches leverage a more spatial ap- are typically unregularized and mainly focus on minimizing
proach, effectively aggregating local neighborhood informa- the reconstruction error while ignoring the data distribution
tion. This method has been particularly effective in tasks where of the latent space. However, this might lead to poor graph
local structure is highly informative. Nonetheless, the convolu- summarization when working with sparse and noisy real-
tional approach may not fully capture the global context, which world graph data. Although there are a few studies on GAE
can be critical in certain summarization tasks. In addition, regularization [112], more research is needed in this regard.
most existing ConvGNN models for graph summarization GAT-based approaches introduce an attention mechanism
simply presume the input graphs are static. However, in the that allows for the weighting of nodes’ contributions to the
real world, dynamically evolving graphs/networks are more representation. This approach can adaptively highlight impor-
common. For instance, in a social network the number of users, tant features and relationships within the graph. While GATs
their connected friends, and their activities are constantly provide a flexible mechanism that can potentially outperform
changing. To this end, learning ConvGNNs on static graphs other methods, they may also require more computational
may not yield the best results. Hence, more research on resources and can be prone to overfitting on smaller datasets.
dynamic ConvGNN models is needed to increase the quality Given the recent advancements in this area, we expect to see
of summaries with large-scale dynamic graphs. more research in the future on using GATs to create condensed
GAE-based approaches offer a powerful framework for representations of both static and dynamic graphs.
unsupervised learning on graphs. By learning to encode and GRL-based approaches merge reinforcement learning with
N. SHABANI et al.: A COMPREHENSIVE SURVEY ON GRAPH SUMMARIZATION WITH GRAPH NEURAL NETWORKS 17
A. Dynamic Graphs
In the real world, the data that graphs represent can evolve
over time, creating changes in a graph’s topology, such as
new edges that appear, nodes that disappear, or attributes that
change over time. These dynamics can cause fundamental
changes to the entire graph. Summarizing dynamic graphs
typically involves boiling the graph down into a series of
Fig. 4. Year-wise development of datasets in the reviewed papers. snapshots taken at various time increments. The model is
then trained over these snapshots, yet a number of challenges
TABLE VII
E VALUATION METRICS . can arise. To date, the approaches developed to tackle these
problems have primarily focused on capturing temporal pat-
Evaluation Metric Formula/Description terns and changes in graph topology. However, these methods
Accuracy tp+tn
tp+tn+f p+f n often struggle with efficiently processing large-scale dynamic
Precision tp graphs and accurately capturing the evolving nature of graph
tp+f p
Recall tp relationships. Future work in this area could focus on develop-
tp+f n
Pn ing more scalable algorithms that can handle larger dynamic
Average Precision i=1 (Ri − Ri−1 ) · Pi graphs without compromising processing speed or accuracy.
F1-score 2 × Recall×P recision
Recall+P recision
Micro F1-score tp
tp+f p+f n B. Task-based Summarization
Specificity tn
tn+f p Graph summarization is crucial as graph sizes grow, aiding
AUC-ROC The Area Under the ROC Curve. in understanding, sensemaking, and analysis. Different tasks,
ARI Adjusted Rand Index, measures the like detecting communities or identifying influential nodes,
similarity between two data clusterings. require tailored summarization strategies. This necessitates
The Normalized Mutual Information developing specific approaches for each unique task to ef-
NMI measures the similarity between two
clusters. fectively identify relevant patterns. Moreover, although task-
Hit@k Number of relevant items in top k based summarization is critical, this field of study has seen
k few successes [106], [107] and is still an underexplored area.
NDCG@k Normalized Discounted Cumulative Gain Open research problems include how to perform task-based
at k.
summarization on streaming and heterogeneous graphs and
Measures the strength and direction of
Spearman’s ρ index the monotonic relationship between two how to leverage human feedback in the learning process.
variables.
ρnetgist Expected ratio. C. Evaluation Benchmarks
ρnetreact Quantifying the ease of identifying The optimal outcome of a graph summarization process is
relevant documents.
1 Pn
a “good” summary of the original graph. However, evaluating
Mean Rank
n i=1 ranki the “goodness” of a summary is an application-specific task
Mean Reciprocal 1 P N 1
Rank N i=1 ranki that depends on the task at hand. For example, sampling-based
methods are evaluated based on the quality of the sampling,
while aggregation-based methods are evaluated based on the
graph models to selectively summarize graphs by learning quality of classification, and so on. Current studies commonly
from reward feedback. This method is promising for decision- use comparisons between their method and one or more
centric summarization tasks like graph compression or key established methods to measure the quality of their results.
substructure identification. Designing customized deep GRL For instance, metrics that have been used in the literature
architectures for the purpose of graph summarization stands include information loss, ease of visualization, and sparsity [2].
to be a promising direction in the future. However, being However, more and different evaluation metrics are required
relatively new, GRL faces challenges in defining rewards and for cases where the validation process becomes complex and
efficient exploration of graph spaces. more elements are involved, such as visualization and multi-
Across all approaches, we observe a trade-off between the resolution summaries.
ability to capture different aspects of graph structure and the
computational efficiency. Furthermore, each method’s perfor- D. Generative Models
mance can vary significantly depending on the characteristics In the context of generative models, graph summarization
of the dataset in use and the particular summarization task can be used to generate new graphs that have similar proper-
being addressed. Our analysis has revealed that the complexity ties to the original ones. Generative models, such as VAEs,
18 IEEE TRANSACTIONS ON ARTIFICIAL INTELLIGENCE, VOL. 00, NO. 0, MONTH 2023
TABLE VIII
C OMPARABLE MODELS FROM PUBLISHED LITERATURE USING N ODE C LASSIFICATION FOR EVALUATIONS . S/D: S TATIC /DYNAMIC , NL: N UMBER OF
L AYERS , AF: ACTIVATION F UNCTIONS , DR: D ROPOUT RATE , LR: L EARNING R ATE , WD: W EIGHT D ECAY.
PyTorch- Cora (S) 97.16±0.50 96.36±0.50 NL, AF, DR, LR, WD, Optimizer,
GraphSAINT [66] LINK
Geometric Citeseer (S) 91.90±0.40 89.66±0.30 Sampling Technique, Sample Size
PyTorch- Cora (S) 79.40±0.05 79.45±0.06 NL, AF, DR, LR, WD, Optimizer,
FastGCN [65] LINK
Geometric Citeseer (S) 67.70±0.09 66.88±0.12 Layer-wise Importance Sampling
PyTorch- Cora (S) 81.98±0.00 82.00±0.01 NL, AF, DR, LR, WD, Optimizer,
GAT [99] LINK
Geometric Citeseer (S) 67.70±0.02 67.44±0.02 Attention Mechanism, nHeads, Atten. DR
PyTorch- Cora (S) 80.90±0.01 81.00±0.02 NL, AF, DR, LR, WD, Optimizer,
GAT v2 [39] LINK
Geometric Citeseer (S) 67.01±0.03 66.30±0.04 Attention Mechanism, nHeads, Atten. DR
Cora (S) 83.10±0.02 83.02±0.01 NL, AF, LR, Optimizer, Lambda(λ),
GATE [94] Tensorflow LINK
Citeseer (S) 71.55±0.02 71.88±0.02 Weight Sharing
Cora (S) 82.44±0.50 81.98±0.50 NL, AF, LR, Optimizer, Channels,
H-GCN [76] Tensorflow LINK
Citeseer (S) 71.84±0.60 70.96±0.60 Coarsening Layers, Num. Channel
TABLE IX
C OMPARABLE MODELS FROM PUBLISHED LITERATURE USING L INK P REDICTION FOR EVALUATIONS . S/D: S TATIC /DYNAMIC , NL: N UMBER OF
L AYERS , AF: ACTIVATION F UNCTIONS , DR: D ROPOUT RATE , LR: L EARNING R ATE , WD: W EIGHT D ECAY.
Graph Transformers, Graph Adversarial Networks (GANs), To advance research in this field, we also outlined sev-
and Graph Auto-Regressive Models, offer effective approaches eral frequently-used benchmarking tools, including datasets,
for graph summarization. These models can learn patterns in open-source codes, and techniques for generating summarized
graph data and generate new graph summaries by sampling graphs. In addition, we identified four promising directions
from learned distributions or sequentially generating nodes and for future research based on our findings from the survey. We
edges. Researchers are continually exploring new techniques strongly believe that using GNNs for graph summarization is
to pushing the boundaries of model architectures, scalability, not just a passing trend. Rather, it has a bright future in a wide
controllability, and interpretability [47], [108], [48]. The field range of applications across different domains.
presents exciting opportunities for innovation and has the As a potential area of focus for our future work, we
potential to transform various domains through more efficient endeavor to delve into the capabilities of GNN-based gener-
and accurate graph summarization techniques. ative models including VGAEs [78] and GANs [47], [48], to
push the boundaries of graph summarization and generation.
VIII. C ONCLUSION Additionally, we will explore the potential of GRL [108] to
New advancements in deep learning with multi-layer deep create new graphs based on their summarized representations.
neural networks have made it possible to quickly and ef- By addressing challenges and expanding the frontiers of
fectively produce a condensed representation of a large and graph synthesis, we envision empowering data analysts and
complex graph. In this paper, we surveyed the technical researchers with powerful tools for efficient and insightful
trends and the most current research in graph summarization analysis of complex graph-structured data.
with GNNs. We provided an overview of different graph
summarization techniques and categorized and described the
current GNN-based approaches of graph summarization. We R EFERENCES
also discussed a new line of research focusing on RL methods [1] C. C. Aggarwal and H. Wang, Managing and mining graph data.
to evaluate and improve the quality of graph summaries. Springer, 2010, vol. 40.
N. SHABANI et al.: A COMPREHENSIVE SURVEY ON GRAPH SUMMARIZATION WITH GRAPH NEURAL NETWORKS 19
[2] Y. Liu, T. Safavi, A. Dighe, and D. Koutra, “Graph summariza- [27] M. P. Boobalan, D. Lopez, and X. Z. Gao, “Graph clustering using k-
tion methods and applications: A survey,” ACM Computing Surveys neighbourhood attribute structural similarity,” Applied soft computing,
(CSUR), vol. 51, no. 3, pp. 1–34, 2018. vol. 47, pp. 216–223, 2016.
[3] Š. Čebirić, F. Goasdoué, H. Kondylakis, D. Kotzinos, I. Manolescu, [28] A. Grover and J. Leskovec, “node2vec: Scalable feature learning for
G. Troullinou, and M. Zneika, “Summarizing semantic graphs: a networks,” in Proceedings of the 22nd ACM SIGKDD International
survey,” The VLDB Journal, vol. 28, no. 3, pp. 295–327, 2019. Conference on Knowledge Discovery and Data Mining, 2016, pp. 855–
[4] D. Gibson, R. Kumar, and A. Tomkins, “Discovering large dense 864.
subgraphs in massive graphs,” in Proceedings of the 31st International [29] M. Kurant, M. Gjoka, C. T. Butts, and A. Markopoulou, “Walking on a
Conference on Very Large Data Bases, 2005, pp. 721–732. graph with a magnifying glass: stratified sampling via weighted random
[5] K. LeFevre and E. Terzi, “Grass: Graph structure summarization,” walks,” in Proceedings of the ACM SIGMETRICS Joint International
in Proceedings of the 2010 SIAM International Conference on Data Conference on Measurement and Modeling of Computer Systems, 2011,
Mining. SIAM, 2010, pp. 454–465. pp. 281–292.
[6] P. Zhao, X. Li, D. Xin, and J. Han, “Graph cube: on warehousing and [30] A. D. Stivala, J. H. Koskinen, D. A. Rolls, P. Wang, and G. L. Robins,
olap multidimensional networks,” in SIGMOD International Confer- “Snowball sampling for estimating exponential random graph models
ence on Management of Data, 2011, pp. 853–864. for large networks,” Social Networks, vol. 47, pp. 167–188, 2016.
[7] B. Kulis, S. Basu, I. Dhillon, and R. Mooney, “Semi-supervised graph [31] M. Hajiabadi, “Efficient graph summarization of large networks,” Ph.D.
clustering: a kernel approach,” in Proceedings of the 22nd International dissertation, University of Victoria, 2022.
Conference on Machine Learning, 2005, pp. 457–464.
[32] K. A. Kumar and P. Efstathopoulos, “Utility-driven graph summariza-
[8] B. Karrer and M. E. Newman, “Stochastic blockmodels and community
tion,” Proceedings of the VLDB Endowment, vol. 12, no. 4, pp. 335–
structure in networks,” Physical Review E, vol. 83, no. 1, p. 016107,
347, 2018.
2011.
[33] S. Dumbrava, A. Bonifati, A. N. R. Diaz, and R. Vuillemot, “Ap-
[9] P. Hu and W. C. Lau, “A survey and taxonomy of graph sampling,”
proximate querying on property graphs,” in Scalable Uncertainty
arXiv preprint arXiv:1308.5865, 2013.
Management: 13th International Conference, SUM 2019, Compiègne,
[10] C. Doerr and N. Blenn, “Metric convergence in social network sam-
France, December 16–18, 2019, Proceedings 13. Springer, 2019, pp.
pling,” in Proceedings of the 5th ACM Workshop on HotPlanet, 2013,
250–265.
pp. 45–50.
[11] J. Zhou, G. Cui, S. Hu, Z. Zhang, C. Yang, Z. Liu, L. Wang, C. Li, [34] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and S. Y. Philip, “A
and M. Sun, “Graph neural networks: A review of methods and comprehensive survey on graph neural networks,” IEEE transactions
applications,” AI Open, vol. 1, pp. 57–81, 2020. on neural networks and learning systems, vol. 32, no. 1, pp. 4–24,
[12] J. Chen, Y. Saad, and Z. Zhang, “Graph coarsening: from scientific 2020.
computing to machine learning,” SeMA Journal, pp. 1–37, 2022. [35] G. Dong, M. Tang, Z. Wang, J. Gao, S. Guo, L. Cai, R. Gutierrez,
[13] L.-C. Zhang, “Graph sampling: An introduction,” The Survey Statisti- B. Campbel, L. E. Barnes, and M. Boukhechba, “Graph neural net-
cian, vol. 83, pp. 27–37, 2021. works in iot: a survey,” ACM Transactions on Sensor Networks, vol. 19,
[14] X. Liu, M. Yan, L. Deng, G. Li, X. Ye, and D. Fan, “Sampling no. 2, pp. 1–50, 2023.
methods for efficient training of graph convolutional networks: A [36] H. Cai, V. W. Zheng, and K. C.-C. Chang, “A comprehensive survey
survey,” IEEE/CAA Journal of Automatica Sinica, vol. 9, no. 2, pp. of graph embedding: Problems, techniques, and applications,” IEEE
205–234, 2021. Transactions on Knowledge and Data Engineering, vol. 30, no. 9, pp.
[15] Ü. Çatalyürek, K. Devine, M. Faraj, L. Gottesbüren, T. Heuer, H. Mey- 1616–1637, 2018.
erhenke, P. Sanders, S. Schlag, C. Schulz, D. Seemaier et al., “More [37] F. Liu, S. Xue, J. Wu, C. Zhou, W. Hu, C. Paris, S. Nepal, J. Yang, and
recent advances in (hyper) graph partitioning,” ACM Computing Sur- P. S. Yu, “Deep learning for community detection: progress, challenges
veys, vol. 55, no. 12, pp. 1–38, 2023. and opportunities,” arXiv preprint arXiv:2005.08225, 2020.
[16] S. A. Bhavsar, V. H. Patil, and A. H. Patil, “Graph partitioning [38] L. Wu, P. Cui, J. Pei, L. Zhao, and L. Song, “Graph neural networks,”
and visualization in graph mining: a survey,” Multimedia Tools and in Graph Neural Networks: Foundations, Frontiers, and Applications.
Applications, vol. 81, no. 30, pp. 43 315–43 356, 2022. Springer, 2022, pp. 27–37.
[17] L. Yue, X. Jun, Z. Sihang, W. Siwei, G. Xifeng, Y. Xihong, L. Ke, [39] S. Brody, U. Alon, and E. Yahav, “How attentive are graph attention
T. Wenxuan, L. X. Wang et al., “A survey of deep graph clustering: Tax- networks?” arXiv preprint arXiv:2105.14491, 2021.
onomy, challenge, and application,” arXiv preprint arXiv:2211.12875, [40] P. Goyal, S. R. Chhetri, and A. Canedo, “dyngraph2vec: Captur-
2022. ing network dynamics using dynamic graph representation learning,”
[18] C. Lee and D. J. Wilkinson, “A review of stochastic block models Knowledge-Based Systems, vol. 187, p. 104816, 2020.
and extensions for graph clustering,” Applied Network Science, vol. 4, [41] B. Huang and K. M. Carley, “Inductive graph representation learning
no. 1, pp. 1–50, 2019. with recurrent graph neural networks,” CoRR, abs/1904.08035, 2019.
[19] Y. Xie, B. Yu, S. Lv, C. Zhang, G. Wang, and M. Gong, “A survey on [42] S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,”
heterogeneous network representation learning,” Pattern recognition, Neural Computation, vol. 9, no. 8, pp. 1735–1780, 11 1997. [Online].
vol. 116, p. 107936, 2021. Available: https://doi.org/10.1162/neco.1997.9.8.1735
[20] S. M. Kazemi, R. Goel, K. Jain, I. Kobyzev, A. Sethi, P. Forsyth, and
[43] K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares,
P. Poupart, “Representation learning for dynamic graphs: A survey,”
H. Schwenk, and Y. Bengio, “Learning phrase representations using
The Journal of Machine Learning Research, vol. 21, no. 1, pp. 2648–
rnn encoder-decoder for statistical machine translation,” arXiv preprint
2720, 2020.
arXiv:1406.1078, 2014.
[21] M. Xu, “Understanding graph embedding methods and their applica-
[44] X. Lai, P. Yang, K. Wang, Q. Yang, and D. Yu, “Mgrnn: Structure
tions,” SIAM Review, vol. 63, no. 4, pp. 825–853, 2021.
generation of molecules based on graph recurrent neural networks,”
[22] X. Wang, D. Bo, C. Shi, S. Fan, Y. Ye, and S. Y. Philip, “A survey
Molecular Informatics, vol. 40, no. 10, p. 2100091, 2021.
on heterogeneous graph embedding: methods, techniques, applications
and sources,” IEEE Transactions on Big Data, 2022. [45] A. Taheri, K. Gimpel, and T. Berger-Wolf, “Learning graph repre-
[23] W. Jin, L. Zhao, S. Zhang, Y. Liu, J. Tang, and N. Shah, “Graph con- sentations with recurrent neural network autoencoders,” KDD Deep
densation for graph neural networks,” arXiv preprint arXiv:2110.07580, Learning Day, 2018.
2021. [46] Y. Jin and J. F. JaJa, “Learning graph-level representations with
[24] B. Mayer and B. Perozzi, “Scaling heterogeneous recurrent neural networks,” arXiv preprint arXiv:1805.07683, 2018.
graph sampling and gnns with google cloud dataflow,” [47] A. Bojchevski, O. Shchur, D. Zügner, and S. Günnemann, “Netgan:
https://cloud.google.com/blog/products/ai-machine-learning/ Generating graphs via random walks,” in International conference on
scaling-heterogeneous-graph-sampling-gnns-google-cloud-dataflow, machine learning. PMLR, 2018, pp. 610–619.
2022, accessed: April 4, 2023. [48] H.-Y. Wu and Y.-L. Chen, “Graph sparsification with generative ad-
[25] R. Interdonato, M. Magnani, D. Perna, A. Tagarelli, and D. Vega, versarial network,” in 2020 IEEE International Conference on Data
“Multilayer network simplification: approaches, models and methods,” Mining (ICDM). IEEE, 2020, pp. 1328–1333.
Computer Science Review, vol. 36, p. 100246, 2020. [49] Y. Ma, Z. Guo, Z. Ren, J. Tang, and D. Yin, “Streaming graph
[26] P. S. Chodrow, N. Veldt, and A. R. Benson, “Generative hypergraph neural networks,” in Proceedings of the 43rd International ACM SIGIR
clustering: From blockmodels to modularity,” Science Advances, vol. 7, Conference on Research and Development in Information Retrieval,
no. 28, p. eabh1303, 2021. 2020, pp. 719–728.
20 IEEE TRANSACTIONS ON ARTIFICIAL INTELLIGENCE, VOL. 00, NO. 0, MONTH 2023
[50] S. Khoshraftar, S. Mahdavi, A. An, Y. Hu, and J. Liu, “Dynamic [73] A. Ajit, K. Acharya, and A. Samanta, “A review of convolutional
graph embedding via lstm history tracking,” in 2019 IEEE International neural networks,” in International Conference on Emerging Trends in
Conference on Data Science and Advanced Analytics (DSAA). IEEE, Information Technology and Engineering. IEEE, 2020, pp. 1–5.
2019, pp. 119–127. [74] W. Huang, T. Zhang, Y. Rong, and J. Huang, “Adaptive sampling
[51] Y. Li, D. Tarlow, M. Brockschmidt, and R. Zemel, “Gated graph towards fast graph representation learning,” Advances in Neural In-
sequence neural networks,” arXiv preprint arXiv:1511.05493, 2015. formation Processing Systems, vol. 31, 2018.
[52] A. Taheri, K. Gimpel, and T. Berger-Wolf, “Learning to represent the [75] Z. Li, X. Shen, Y. Jiao, X. Pan, P. Zou, X. Meng, C. Yao, and J. Bu,
evolution of dynamic graphs with recurrent models,” in Companion “Hierarchical bipartite graph neural networks: Towards large-scale e-
Proceedings of The 2019 World Wide Web Conference, 2019, pp. 301– commerce applications,” in IEEE 36th International Conference on
307. Data Engineering (ICDE). IEEE, 2020, pp. 1677–1688.
[53] K. Ge, J.-Q. Zhao, and Y.-Y. Zhao, “Gr-gnn: Gated recursion-based [76] F. Hu, Y. Zhu, S. Wu, L. Wang, and T. Tan, “Hierarchical graph
graph neural network algorithm,” Mathematics, vol. 10, no. 7, p. 1171, convolutional networks for semi-supervised node classification,” arXiv
2022. preprint arXiv:1902.06667, 2019.
[54] L. R. Medsker and L. Jain, “Recurrent neural networks,” Design and [77] L. Dang, Y. Nie, C. Long, Q. Zhang, and G. Li, “Msr-gcn: Multi-scale
Applications, vol. 5, pp. 64–67, 2001. residual graph convolution networks for human motion prediction,” in
[55] X. Liang, X. Shen, J. Feng, L. Lin, and S. Yan, “Semantic object Proceedings of the IEEE/CVF International Conference on Computer
parsing with graph lstm,” in Computer Vision–ECCV 2016: 14th Vision, 2021, pp. 11 467–11 476.
European Conference, Amsterdam, The Netherlands, October 11–14, [78] T. N. Kipf and M. Welling, “Variational graph auto-encoders,” arXiv
2016, Proceedings, Part I 14. Springer, 2016, pp. 125–143. preprint arXiv:1611.07308, 2016.
[56] C.-Y. Zhang, Z.-L. Yao, H.-Y. Yao, F. Huang, and C. P. Chen, “Dynamic [79] C. Wang, S. Pan, G. Long, X. Zhu, and J. Jiang, “Mgae: Marginalized
representation learning via recurrent graph neural networks,” IEEE graph autoencoder for graph clustering,” in Proceedings of the 2017
Transactions on Systems, Man, and Cybernetics: Systems, vol. 52, ACM on Conference on Information and Knowledge Management,
no. 3, pp. 468–479, 2022. 2017, pp. 889–898.
[57] S. Li, K. W. Wong, C. C. Fung, and D. Zhu, “Improving question [80] E. Hajiramezanali, A. Hasanzadeh, K. Narayanan, N. Duffield,
answering over knowledge graphs using graph summarization,” in M. Zhou, and X. Qian, “Variational graph recurrent neural networks,”
International Conference on Neural Information Processing. Springer, Advances in Neural Information Processing Systems, vol. 32, 2019.
2021, pp. 489–500. [81] S. Fan, X. Wang, C. Shi, E. Lu, K. Lin, and B. Wang, “One2multi
[58] K. Zarzycki and M. Ławryńczuk, “Advanced predictive control for gru graph autoencoder for multi-view graph clustering,” in proceedings of
and lstm networks,” Information Sciences, vol. 616, pp. 229–254, 2022. the web conference 2020, 2020, pp. 3070–3076.
[59] T. N. Kipf and M. Welling, “Semi-supervised classification with graph [82] E. Cai, J. Huang, B. Huang, S. Xu, and J. Zhu, “Grae: Graph recurrent
convolutional networks,” arXiv preprint arXiv:1609.02907, 2016. autoencoder for multi-view graph clustering,” in 2021 4th International
[60] F. Wu, A. Souza, T. Zhang, C. Fifty, T. Yu, and K. Weinberger, “Sim- Conference on Algorithms, Computing and Artificial Intelligence, 2021,
plifying graph convolutional networks,” in International Conference on pp. 1–9.
Machine Learning. PMLR, 2019, pp. 6861–6871. [83] G. Salha, R. Hennequin, and M. Vazirgiannis, “Simple and effective
[61] C. Deng, Z. Zhao, Y. Wang, Z. Zhang, and Z. Feng, “Graphzoom: A graph autoencoders with one-hop linear models,” in Machine Learning
multi-level spectral approach for accurate and scalable graph embed- and Knowledge Discovery in Databases: European Conference, ECML
ding,” arXiv preprint arXiv:1910.02370, 2019. PKDD 2020, Ghent, Belgium, September 14–18, 2020, Proceedings,
[62] E. Rossi, F. Frasca, B. Chamberlain, D. Eynard, M. Bronstein, and Part I. Springer, 2021, pp. 319–334.
F. Monti, “Sign: Scalable inception graph neural networks,” arXiv [84] N. Mrabah, M. Bouguessa, M. F. Touati, and R. Ksantini, “Rethinking
preprint arXiv:2004.11198, vol. 7, p. 15, 2020. graph auto-encoder models for attributed graph clustering,” IEEE
[63] H. Jiang, P. Cao, M. Xu, J. Yang, and O. Zaiane, “Hi-gcn: a hierarchical Transactions on Knowledge and Data Engineering, 2022.
graph convolution network for graph embedding learning of brain [85] W. H. L. Pinaya, S. Vieira, R. Garcia-Dias, and A. Mechelli, “Autoen-
network and brain disorders prediction,” Computers in Biology and coders,” in Machine Learning. Academic Press, 2020, pp. 193–208.
Medicine, vol. 127, p. 104096, 2020. [86] Z. Zhang, P. Cui, and W. Zhu, “Deep learning on graphs: A survey,”
[64] W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation learn- IEEE Transactions on Knowledge and Data Engineering, vol. 34, no. 1,
ing on large graphs,” in Advances in Neural Information Processing pp. 249–270, 2020.
Systems. MIT Press, 2017, pp. 1024–1034. [87] Z. Hou, X. Liu, Y. Dong, C. Wang, J. Tang et al., “Graph-
[65] J. Chen, T. Ma, and C. Xiao, “Fastgcn: fast learning with graph mae: Self-supervised masked graph autoencoders,” arXiv preprint
convolutional networks via importance sampling,” arXiv preprint arXiv:2205.10803, 2022.
arXiv:1801.10247, 2018. [88] G. Salha, R. Hennequin, and M. Vazirgiannis, “Keep it simple: Graph
[66] H. Zeng, H. Zhou, A. Srivastava, R. Kannan, and V. Prasanna, autoencoders without graph convolutional networks,” arXiv preprint
“Graphsaint: Graph sampling based inductive learning method,” arXiv arXiv:1910.00942, 2019.
preprint arXiv:1907.04931, 2019. [89] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv
[67] Y. Yan, J. Zhu, M. Duda, E. Solarz, C. Sripada, and D. Koutra, preprint arXiv:1312.6114, 2013.
“Groupinn: Grouping-based interpretable neural network for classifi- [90] D. P. Kingma, T. Salimans, and M. Welling, “Variational dropout and
cation of limited, noisy brain data,” in Proceedings of the 25th ACM the local reparameterization trick,” Advances in neural information
SIGKDD International Conference on Knowledge Discovery & Data processing systems, vol. 28, 2015.
Mining, 2019, pp. 772–782. [91] T. Kim, J. Oh, N. Y. Kim, S. Cho, and S.-Y. Yun, “Comparing kullback-
[68] G. Wen, P. Cao, H. Bao, W. Yang, T. Zheng, and O. Zaiane, “Mvs-gcn: leibler divergence and mean squared error loss in knowledge distilla-
A prior brain structure learning-guided multi-view graph convolution tion,” in 30th International Joint Conference on Artificial Intelligence
network for autism spectrum disorder diagnosis,” Computers in Biology (IJCAI-21). IJCAI, 2021, pp. 2628–2635.
and Medicine, vol. 142, p. 105239, 2022. [92] P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Lio,
[69] S. Zhang, H. Tong, J. Xu, and R. Maciejewski, “Graph convolutional and Y. Bengio, “Graph attention networks,” arXiv preprint
networks: a comprehensive review,” Computational Social Networks, arXiv:1710.10903, 2017.
vol. 6, no. 1, pp. 1–23, 2019. [93] Y. Xie, Y. Zhang, M. Gong, Z. Tang, and C. Han, “Mgat: Multi-view
[70] R. Levie, F. Monti, X. Bresson, and M. M. Bronstein, “Cayleynets: graph attention networks,” Neural Networks, vol. 132, pp. 180–189,
Graph convolutional neural networks with complex rational spectral 2020.
filters,” IEEE Transactions on Signal Processing, vol. 67, no. 1, pp. [94] A. Salehi and H. Davulcu, “Graph attention auto-encoders,” in 2020
97–109, 2018. IEEE 32nd International Conference on Tools with Artificial Intelli-
[71] D. K. Hammond, P. Vandergheynst, and R. Gribonval, “Wavelets gence (ICTAI). IEEE, 2020, pp. 989–996.
on graphs via spectral graph theory,” Applied and Computational [95] K. Tu, P. Cui, D. Wang, Z. Zhang, J. Zhou, Y. Qi, and W. Zhu,
Harmonic Analysis, vol. 30, no. 2, pp. 129–150, 2011. “Conditional graph attention networks for distilling and refining knowl-
[72] M. Defferrard, X. Bresson, and P. Vandergheynst, “Convolutional neu- edge graphs in recommendation,” in Proceedings of the 30th ACM
ral networks on graphs with fast localized spectral filtering,” Advances International Conference on Information & Knowledge Management,
in Neural Information Processing Systems, vol. 29, 2016. 2021, pp. 1834–1843.
N. SHABANI et al.: A COMPREHENSIVE SURVEY ON GRAPH SUMMARIZATION WITH GRAPH NEURAL NETWORKS 21
[96] L. Chen, J. Cao, Y. Wang, W. Liang, and G. Zhu, “Multi-view graph Amin Beheshti holds B.S. (1st Hons.) and M.S. de-
attention network for travel recommendation,” Expert Systems with grees (1st Hons.) in computer science and engineer-
Applications, vol. 191, p. 116234, 2022. ing, and a Ph.D. in computer science from UNSW
[97] Z. Li, Y. Zhao, Y. Zhang, and Z. Zhang, “Multi-relational graph at- Sydney, Australia. Amin is a Full Professor of data
tention networks for knowledge graph completion,” Knowledge-Based science at Macquarie University. He is currently
Systems, vol. 251, p. 109262, 2022. the Director of the Centre for Applied Artificial
[98] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by Intelligence and the Head of the Data Science Re-
jointly learning to align and translate,” arXiv preprint arXiv:1409.0473, search Laboratory, School of Computing, Macquarie
2014. University. He is a leading author of several authored
[99] P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Lio, Y. Bengio books in data, social, and process analytics, co-
et al., “Graph attention networks,” stat, vol. 1050, no. 20, pp. 10– authored with other high-profile researchers.
48 550, 2017.
[100] N. Mingshuo, C. Dongming, and W. Dongqi, “Reinforcement learning
on graph: A survey,” arXiv preprint arXiv:2204.06127, 2022. Quan Z. Sheng received his Ph.D. degree in
[101] R. Bellman and R. E. Kalaba, Selected papers on mathematical trends computer science from the University of New South
in control theory. Dover Publications, 1964. Wales, Sydney, NSW, Australia. He is currently a
[102] M. Nie, D. Chen, and D. Wang, “Reinforcement learning on graphs: full Professor and Head of the School of Comput-
A survey,” IEEE Transactions on Emerging Topics in Computational ing, at Macquarie University, Sydney. His research
Intelligence, 2023. interests include big data analytics, service-oriented
[103] Z. Yan, J. Ge, Y. Wu, L. Li, and T. Li, “Automatic virtual network em- computing, and the Internet of Things. Microsoft
bedding: A deep reinforcement learning approach with graph convolu- Academic ranked Prof. Michael Sheng as one of
tional networks,” IEEE Journal on Selected Areas in Communications, the Most Impactful Authors in Services Computing
vol. 38, no. 6, pp. 1040–1057, 2020. (ranked Top 5 All-Time) and in the Web of Things
[104] M. Wu, Q. Zhang, Y. Gao, and N. Li, “Graph signal sampling (ranked Top 20 All-Time).
with deep q-learning,” in 2020 International Conference on Computer
Information and Big Data Applications (CIBDA), 2020, pp. 450–453. Jin Foo is the Staff Data Scientist at Prospa,
[105] B. Wu, X. Liang, X. Zheng, J. Wang, and X. Zhou, “Reinforced sample Australia’s top online lender to small businesses and
selection for graph neural networks transfer learning,” in 2022 IEEE was previously Data Science Lead at Woolworths
International Conference on Bioinformatics and Biomedicine (BIBM), Group; specialising in identity resolution, hyper-
2022, pp. 1281–1288. personalised offers, propensity modelling, sequential
[106] S. E. Amiri, B. Adhikari, A. Bharadwaj, and B. A. Prakash, “Netgist: bin packing and time-series analysis. Jin is cur-
Learning to generate task-based network summaries,” in IEEE Interna- rently a 2nd year Master of Research student at
tional Conference on Data Mining (ICDM). IEEE, 2018, pp. 857–862. the School of Computing, Macquarie University,
[107] S. E. Amiri, B. Adhikari, J. Wenskovitch, A. Rodriguez, M. Dowling, Sydney, Australia. His research focuses on anomaly
C. North, and B. A. Prakash, “Netreact: Interactive learning for network detection with word embeddings, hashing algorithms
summarization,” arXiv preprint arXiv:2012.11821, 2020. and graph networks.
[108] R. Wickman, X. Zhang, and W. Li, “Sparrl: Graph sparsification via
deep reinforcement learning,” arXiv preprint arXiv:2112.01565, 2021. Venus Haghighi is currently a Ph.D. student
[109] X. Zhao, J. Wu, H. Peng, A. Beheshti, J. J. Monaghan, D. McAlpine, in computer science at the School of Computing,
H. Hernandez-Perez, M. Dras, Q. Dai, Y. Li et al., “Deep reinforcement Macquarie University, Sydney, NSW, Australia. The
learning guided graph neural networks for brain network analysis,” focus of her research is to enhance classic GNN
Neural Networks, vol. 154, pp. 56–67, 2022. models and explore robust graph learning paradigms
[110] N. Goyal and D. Steiner, “Graph neural networks for image classifi- to detect and mitigate the camouflage behavior of
cation and reinforcement learning using graph representations,” arXiv malicious actors in both static and dynamic net-
preprint arXiv:2203.03457, 2022. works. Her research interests include graph-based
[111] B. Yan, K. Janowicz, G. Mai, and R. Zhu, “A spatially explicit anomaly detection, graph neural networks, graph-
reinforcement learning model for geographic knowledge graph sum- based fraud detection, and graph data mining.
marization,” Transactions in GIS, vol. 23, no. 3, pp. 620–640, 2019.
[112] S. Pan, R. Hu, G. Long, J. Jiang, L. Yao, and C. Zhang, “Adversarially
regularized graph autoencoder for graph embedding,” arXiv preprint Ambreen Hanif is a 2nd year Ph.D. student at the
arXiv:1802.04407, 2018. School of Computing, Macquarie University NSW,
Australia. After completing her Master’s degree,
Nasrin Shabani received a Master of Research she decided to continue her research as a Ph.D.
degree in Computer Science with First Class Hon- candidate. Her research interests lie in the field of
ours from the Macquarie University, Sydney, NSW, Explainable Artificial Intelligence (XAI) for Deep
Australia. She is currently pursuing a Ph.D. in Com- Neural Networks, Data Provenance, and Storytelling
puter Science at the same institution. Her research with XAI. Specifically, she aims to develop novel
interests lie at the intersection of graph mining, methods to enhance the interpretability and trans-
graph summarization, and deep learning. Through parency of deep neural networks.
her work, she aims to develop novel algorithms and
techniques that can extract meaningful insights and
patterns from complex graph data structures. Maryam Shahabikargar received her Master of
Research degree in computer science from the Mac-
quarie University, Sydney, Australia. After complet-
Jia Wu (Senior Member, IEEE) is currently the ing her Master’s degree, she decided to continue her
Research Director of the Centre for Applied Artifi- research as a Ph.D. candidate in computer science at
cial Intelligence and the Director of Higher Degree Macquarie University. Her current research interests
Research in the School of Computing at Macquarie include not only NLP and graph embeddings but also
University, Sydney, Australia. Dr Wu received his link prediction and anomaly detection. She aims to
Ph.D. degree in computer science from the Univer- develop a model that combines her research interests
sity of Technology Sydney, Australia. His current to tackle a specific problem in the field of finance.
research interests include data mining and machine
learning. Since 2009, he has published 100+ refereed
journal and conference papers, including TKDE,
TKDD, KDD, ICDM, WWW, and NeurIPS.