0% found this document useful (0 votes)

33 views21 pages

A Comprehensive Survey On Graph Summarization With Graph Neural Networks - Shabani

This document presents a comprehensive survey on graph summarization techniques utilizing graph neural networks (GNNs). It reviews the evolution of graph summarization methods, highlights the advantages of GNN-based approaches over traditional methods, and discusses various techniques including clustering, statistical inference, and reinforcement learning. The survey also identifies open research challenges and provides resources to aid researchers in the field.

Uploaded by

dpzoro1994

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views21 pages

A Comprehensive Survey On Graph Summarization With Graph Neural Networks - Shabani

Uploaded by

dpzoro1994

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

IEEE TRANSACTIONS ON ARTIFICIAL INTELLIGENCE, VOL. 00, NO.

0, MONTH 2023 1

A Comprehensive Survey on Graph Summarization

with Graph Neural Networks
Nasrin Shabani, Jia Wu Senior Member, IEEE, Amin Beheshti, Quan Z. Sheng
Jin Foo, Venus Haghighi, Ambreen Hanif, Maryam Shahabikargar

Abstract—As large-scale graphs become more widespread, more challenging. Additionally, performing fast computations
more and more computational challenges with extracting, pro- with large graphs and visualizing the knowledge they can
cessing, and interpreting large graph data are being exposed. yield is also becoming more difficult. Many claim that faster
It is therefore natural to search for ways to summarize these
arXiv:2302.06114v3 [cs.LG] 4 Jan 2024

expansive graphs while preserving their key characteristics. In and more effective algorithms are needed to overcome these
the past, most graph summarization techniques sought to capture obstacles [1], [2]. However, a growing cohort of researchers
the most important part of a graph statistically. However, today, believe that summarization might hold the answer to this
the high dimensionality and complexity of modern graph data are unyielding problem. Summarization not only helps existing
making deep learning techniques more popular. Hence, this paper algorithms to parse the data faster, it can also compress
presents a comprehensive survey of progress in deep learning
summarization techniques that rely on graph neural networks the data, reduce storage requirements, and assist with graph
(GNNs). Our investigation includes a review of the current state- visualization and sense-making [3].
of-the-art approaches, including recurrent GNNs, convolutional Graph summarization is the process of finding a condensed
GNNs, graph autoencoders, and graph attention networks. A representation of a graph while preserving its key proper-
new burgeoning line of research is also discussed where graph ties [2]. A toy example of a typical graph summarization
reinforcement learning is being used to evaluate and improve the
quality of graph summaries. Additionally, the survey provides process is shown in Figure 1. The process includes removing
details of benchmark datasets, evaluation metrics, and open- the original graph’s objects and replacing them with fewer
source tools that are often employed in experimentation settings, objects of the same type to produce a condensed representation
along with a detailed comparison, discussion, and takeaways of the original graph.
for the research community focused on graph summarization. Most traditional approaches to graph summarization involve
Finally, the survey concludes with a number of open research
challenges to motivate further study in this area. using a conventional machine learning method or a graph-
structured query, such as degree, adjacency, or eigenvector
Impact Statement—Graph summarization is a key task in man- centrality, to find a condensed graphical representation of the
aging large graphs, which are ubiquitous in modern applications.
In this article, we summarize the latest developments in graph graph [2]. A popular summarization technique is to group
summarization methods, offer a more profound understanding structures in the input graph by aggregating the densest
of these methods, and list source codes and available resources. subgraphs [4]. For example, the GraSS model [5] focuses on
The study covers a broad range of techniques, including both accurate query handling and incorporates formal semantics
conventional and deep learning-based approaches, with a par- for answering queries on graph structure summaries based
ticular emphasis on GNNs. We aim to help the researchers
develop a basic understanding of GNN-based methods for graph on a random walk model, while Graph Cube [6] is a data
summarization, benefit from useful resources, and think about warehousing model that integrates both network structure sum-
future directions. marization and attribute aggregation. This model also supports
Index Terms—Deep Learning, Graph Neural Networks, Graph OLAP queries on large multidimensional networks.
Summarization Notably, clustering methods follow a similar approach to
summarization, partitioning a graph into groups of nodes that
I. I NTRODUCTION can be further summarized. Most traditional graph clustering
methods use conventional machine learning and statistical
L ARGE graphs are becoming increasingly ubiquitous.
With the increasing amount of data being generated, large
graphs are becoming more prevalent in modelling a variety of
inference to measure the closeness of nodes based on their
connectivity and structural similarities [7]. For instance, Karrer
domains, such as social networks, proteins, the World Wide et al. [8] used a stochastic block model to detect clusters or
Web, user actions, and beyond. However, as these graphs communities in large sparse graphs. However, another method
grow in size, understanding and analyzing them is becoming of graph summarization focuses more on node selection and
identifying sparse graphs that can be used to derive smaller
We acknowledge the Centre for Applied Artificial Intelligence at graphs [9]. As an example, Doerr et al. [10] introduced a
Macquarie University for funding this study. sampling method based on traversing a graph that begins with
Nasrin Shabani, Jia Wu, Amin Beheshti, Quan Z. Sheng, Venus
Haghighi, Ambreen Hanif, and Maryam Shahabikargar are with the a collection of starting points, e.g., nodes or edges, and then
School of Computing, Macquarie University, NSW 2109, Australia. adds to the sample pool depending on recent information
E-mails: {nasrin.shabani@hdr, jia.wu, amin.beheshti, michael.sheng, about the graph objects. However, despite the popularity of
eujin.foo@hdr, venus.haghighi@hdr, ambreen.hanif@hdr, maryam.shahabi-
kargar@hdr}mq.edu.au. these approaches in the past, they are very computationally-
Corresponding authors: Nasrin Shabani and Amin Beheshti. intensive. They also require a great deal of memory to store,
2 IEEE TRANSACTIONS ON ARTIFICIAL INTELLIGENCE, VOL. 00, NO. 0, MONTH 2023

B. Contributions

In this survey, we review developments in graph sum-

marization with GNNs. Overall, we have aimed to provide
researchers and practitioners with a comprehensive under-
standing of past, present, and future approaches to graph sum-
marization using GNNs. Our specific contributions include:

• The first deep GNN-based graph summarization sur-

Fig. 1. An example of a graph summarization process. Objects are
removed from the original graph and replaced with fewer objects of the
vey. To our best knowledge, this paper is the first thor-
same type to result in a condensed representation of the original graph. ough survey that is devoted to graph summarization with
GNNs. Previous surveys have primarily concentrated on
traditional graph summarization methods without consid-
making them unsuitable for today’s modern, complex, and ering deep learning techniques. As there is no existing
large-scale datasets. specialized study on graph summarization with GNNs,
Deep learning on graphs is a powerful computational tech- this research aims to fill that gap and facilitate progress
nique for graphs of all sizes that continues to interest both in the field through a detailed and structured survey.
academics and industry. Graph neural networks (GNNs) are • Comprehensive review. We present a comprehensive
the most successful paradigm among the deep learning tech- review of GNN-based graph summarization methods. We
niques for graphs. Their multilayer deep neural networks are also highlight the benefits of different GNN techniques
not only able to reduce the dimensionality of tasks, they also compared to traditional methods. Within each method,
support high-speed calculations with large-scale graphs [11]. we present a review of the most notable approaches and
Several strategies that use a GNN as a summarization engine their contributions to the field.
have been proposed over the last few years, and this line of • Resources and reproducing existing results. We have
research is expected to generate even more fruitful results put together a set of resources that support graph sum-
in the future. These GNN-based methods have demonstrated marization with GNNs, including cutting-edge models,
promising performance with a diverse range of tasks, such standardized datasets for comparison, and reproducing
as graph classification, node representation learning, and link existing publicly available implementations. Our goal is
prediction. Further, as research in this area continues to to provide a resource for those seeking to understand and
evolve, we can anticipate even more innovative and effective apply GNNs for graph summarization.
approaches to graph summarization that rely on GNNs. • Future directions. We identify and discuss the open
challenges and new directions that lie ahead. Through
A. Existing Surveys on Graph Summarization an exploration of the existing challenges and potential
Over the past decade, numerous reviews have been con- avenues for progress, we aim to guide future research
ducted that acknowledge the importance of graph summa- and development in GNNs for graph summarization.
rization. They generally cover a range of topics, like graph
partitioning, graph sampling, graph coarsening, and graph The majority of the papers we reviewed were published at
embedding, along with specific use cases for graph summa- prominent international conferences (e.g., SIGKDD, NeurIPS,
rization, such as pattern discovery, community detection, fraud ICLR, ICML, KDD, WWW, IJCAI, and VLDB) or in peer-
detection, and link prediction. Additionally, several compre- reviewed journals (e.g, ACM, IEEE, Elsevier, and Springer)
hensive surveys on graph summarization techniques have been in the domains of artificial intelligence, big data, data mining,
conducted based on scientific computing: [2], [3], [12]. Yet and web services. In our investigations, we found that different
only the most current work by Chen et al. [12] compares fields referred to graph summarization using different terms,
the most recent machine learning techniques to traditional e.g., “coarsening”, “reduction”, “simplification”, “abstraction”,
methods. and “compression”. While these concepts were used relatively
There are also several surveys on graph sampling [13], [14], interchangeably, the terms coarsening, simplification, and sum-
graph partitioning [15], [16], and graph clustering [17], [18]. marization were generally more common.
Other surveys focus on graph representation learning [19], The remaining sections of this survey are structured as
[20] and graph embedding [21], [22]. As these streams of follows: Section II introduces the concept of graph summa-
research look to create low-dimension versions of a graph, rization with primary definitions and background informa-
they are strongly connected to the concept of graph sum- tion. Section III provides an overview of the development
marization. However, although these studies provide in-depth of graph summarization. Section IV outlines the recently
analyses of how graph summarization techniques are being developed GNN-based graph summarization methods. Sec-
used in important and high-demand domains, GNN-based tion V discusses graph reinforcement learning methods for
graph summarization methods are not their main area of focus. graph summarization. Section VI outlines the structure of
Consequently, these surveys do not provide a comprehensive widely adopted implementation resources. Finally, Section VII
and structured evaluation of all available techniques for graph explores a number of open research challenges that would
summarization. motivate further study before the conclusion in Section VIII.
N. SHABANI et al.: A COMPREHENSIVE SURVEY ON GRAPH SUMMARIZATION WITH GRAPH NEURAL NETWORKS 3

II. D EFINITIONS AND BACKGROUND lower dimensional representation while preserving the original
graph’s topology [25].
This section provides an overview of the key definitions and
background information on graph summarization techniques.
III. G RAPH S UMMARIZATION : A N E VOLUTION
• Definition 1 (Graph): Graph G can be represented as a
tuple (V, E), where V denotes the set of nodes or vertices Graph summarization has been playing an important role
{v1 , v2 , ..., vn }, and E represents the set of edges or links in areas such as network analysis, data mining, machine
E = {eij }ni,j=1 connecting node pairs. The graph is learning, and visualization for some time. The evolution of
represented by an n × n dimensional adjacency matrix graph summarization is illustrated in Figure 2, which shows
A = [aij ], with aij being 1 if the edge eij is present in how it has progressed from traditional computing methods to
E and 0 otherwise. If aij is not equal to aji , the graph multi-layer GNNs. This section briefly overviews the three
is directed; otherwise, it is undirected. When edges are different traditional methods within this field and explains the
associated with weights from the set W , the graph is advantages of GNN techniques over traditional ones.
called a weighted network; otherwise, it is an unweighted
network. G is considered labeled if every edge e ∈ E has A. Clustering-based Approaches
an associated label. Additionally, if each node v ∈ V has Graph clustering can be thought of as a graph summariza-
a unique label, the nodes are also labeled; otherwise, G tion technique since it involves grouping together nodes in a
is considered unlabelled. graph that are similar or related and, in so doing, the complex-
• Definition 2 (Graph Summary): Given a graph G, a ity and size of the original graph are reduced. In simpler terms,
summary G(VS , Es ) is a condensed representation of G graph clustering provides a way to compress or summarize a
that preserves its key properties. Graph summarization large and complex graph into a smaller set of clusters, each
techniques involve either aggregation, selection, or trans- of which captures some aspect of the structure or function
formation on a given graph and produce a graph summary of the original graph [1]. Graph summarization techniques
as the output. using clustering can be classified into three main categories:
As outlined in Definition 2, graph summarization ap- structural-based, attribute-based, and structural-attribute-based
proaches fall into three main categories: aggregation, selection, approaches. The latter, combining both structural and attribute
and transformation. While selection approaches make graphs information, is considered the most effective [26]. For ex-
sparser by simply removing objects without replacing them, ample, Boobalan et al. [27] proposed a method called k-
aggregation approaches replace those removed objects with Neighborhood Attribute Structural Similarity (k-NASS) that
similar objects only with fewer of them. For example, a incorporates both structural and attribute similarities of graph
supernode might replace a group of nodes. Similar to selection nodes. This method improves clustering accuracy for complex
and aggregation, the transformation approaches also involve graphs with rich attributes. However, clustering large graphs
removing objects from the graph, but this time the objects with many attributes remains challenging due to high memory
removed are transformed into a different type of object, such and computational requirements.
as an embedding vector [23].
Aggregation. Aggregation is one of the most extensively B. Statistical Inference
employed techniques of graph summarization. Aggregation
Statistical inference techniques for graph summarization
methods can be divided into two main groups: those that
simplify the complexity of the original graph while preserving
involve node grouping and those that involve edge grouping.
its significant characteristics. These techniques fall into two
Node grouping methods group nodes into supernodes, whereas
groups: pattern mining and sampling. Pattern mining identifies
edge grouping methods reduce the number of edges in a graph
representative patterns or subgraphs in the graph to create a
by aggregating them into virtual nodes. Clustering and com-
condensed summary. On the other hand, sampling randomly
munity detection are examples of a grouping-based approach.
selects a subset of nodes or edges from the graph and estimates
Although summarizing graphs is not explicitly the primary
the properties of the entire graph based on this subset. One
objective of these processes, the outputs can be modified into
example of a sampling technique is Node2vec [28], which
non-application-specific summaries [2].
generates random sequences of nodes within a graph, known
Selection. There are two main groups of selection tech-
as walks, to create a graph summary. Various sampling tech-
niques: sampling and simplification. While sampling methods
niques, such as random sampling [28], stratified sampling [29],
focus on picking subsets of nodes and edges from the input
and snowball sampling [30], can be used for graph summa-
graph [24], simplification or sparsification methods involve re-
rization. Each technique has its advantages and disadvantages,
moving less important edges or nodes. In this way, they tend to
and the choice depends on the specific problem and data being
resemble solutions to dimensionality reduction problems [9].
addressed.
Transformation. Graph projection and graph embedding
are two categories of this method. Generally, graph projection
refers to the summarization techniques that transform bipartite C. Goal-driven
graphs with various inter-layer nodes and edges into simple Goal-driven techniques for graph summarization involve
(single-layer) summarized graphs. Conversely, graph embed- constructing a graph summary that is tailored to a specific
ding refers to the techniques that transform a graph into a application or task. They are a powerful tool for capturing
4 IEEE TRANSACTIONS ON ARTIFICIAL INTELLIGENCE, VOL. 00, NO. 0, MONTH 2023

Fig. 2. A timeline of graph summarization and reviewed techniques.

specific features or relationships in a graph that are relevant domains (e.g., graphs and manifold structures). As a deep
to a specific application or task. By optimizing the graph learning approach, GNNs are multi-layer neural networks
summary to a specific goal, it is possible to create a more that learn on graph structures to ultimately perform graph-
effective and efficient summary that can be used to derive related tasks like classification, clustering, pattern mining, and
better insights and make better decisions [31]. Significant goal- summarization [34].
driven techniques for graph summarization include utility- As mentioned, traditional graph summarization approaches
driven and query-driven techniques. Utility-driven techniques are mostly based on conventional machine learning or graph-
aim to summarize large graphs while preserving their essential structured queries, such as degree, adjacency, and eigenvector
properties and structure to maximize their usefulness for centrality, where the aim is to find a condensed graphical
downstream tasks. Human reviewers evaluate the utility of the representation of the whole graph [6]. However, the pairwise
summary against specific tasks like node classification and link similarity calculations involved in these approaches demand a
prediction [32]. Query-driven techniques summarize graphs by considerably high level of computational power. The explicit
identifying relevant subgraphs or patterns using queries in a learning capabilities of GNNs skirt this problem. Additionally,
query language. The resulting subgraph that matches the query powerful models can be built from even low-dimensional
becomes a building block for the graph summary, supporting representations of attributed graphs [36]. Unlike standard
the target downstream task [33]. The choice of the goal-driven machine learning algorithms, with a GNN, there is no need
summarization technique depends on the specific goals of the to traverse all possible orders of the nodes to represent a
analysis, as some techniques may preserve global properties, graph. Instead, GNNs consider each node separately without
while others may capture local structures. It also depends on taking the order of the input nodes into account. This avoids
available computational resources and the complexity and size redundant computations.
of the original graph.
The major advantage of GNN models for graph sum-
marization over traditional methods is the ability to use
D. Why GNNs for Graph Summarization? low-dimensional vectors to represent the features of large
In recent times, deep learning has gained significant promi- graphs [37]. Additionally, the message-passing mechanism
nence and is now considered one of the most effective forms used by GNNs to communicate information from one node
of AI due to its high accuracy. Conventional deep learning to the next has been the most successful learning framework
methods have shown that they perform extremely well with for learning the patterns and neighbours of nodes and the
Euclidean data (e.g., images, signals, and text), and now sub-graphs in large graphs [11]. It is also easy to train a
there are a growing number of applications in non-Euclidean GNN in semi- or unsupervised way to aggregate, select, or
N. SHABANI et al.: A COMPREHENSIVE SURVEY ON GRAPH SUMMARIZATION WITH GRAPH NEURAL NETWORKS 5

RecGNN ConvGNN GAT GAE

t
u3 u3 h2
a12
Z
a13
a11 Encoder Decoder
v v h1
u1 u1 h3
a14

u2 u2 h4

ℎ!( = 𝑓(Σ#$% ! 𝑊ℎ#()* ) ℎ!" = 𝜎(Σ#$% ! 𝑊 " ℎ#")* ) ℎ!" = 𝜎(Σ#$% ! 𝑎&' 𝑊ℎ# )

Fig. 3. GNN Architechtures [34], [35].

transform graphs into low dimensional representations [38]. of neighboring node u at time t − 1, which contributes to
In this regard, recent successes in graph summarization with the update of node v’s hidden state at the previous time
GNNs point to promising directions for new research. For step, reflecting the influence of neighboring nodes on v’s
example, the GNN models developed by Brody et al. [39] and representation. W is the weight matrix used for aggregating
Goyal et al. [40] represent dynamic graphs in low dimensions, the hidden states of neighboring nodes.
providing a good foundation for popularizing GNNs into more Here, f (·) is usually a simple element-wise activation like
complex dynamic graphs. ReLU, tanh, or sigmoid. This simple activation is typically
used to introduce non-linearity and capture complex patterns
IV. G RAPH S UMMARIZATION WITH GNN S in the node representations. However, this can be replaced
This section provides an overview of recent research into by recurrent update functions, which use gated mechanisms
graph summarization with GNNs. Each subsection covers four like LSTM (Long Short-Term Memory) [42] or GRU (Gated
main categories of approach – these being: Recurrent Graph Recurrent Units) [43] cells. In this case, each node’s hidden
Neural Networks (RecGNNs), Convolutional Graph Neural state update would be computed using an LSTM or GRU
Networks (ConvGNNs), Graph Autoencoders (GAEs), and cell, which is more complex and sophisticated than simple
Graph Attention Networks (GATs). Four different types of element-wise activation functions. These cells determine how
GNN models are shown in Figure 3. Within each subsection, the hidden state of node v at time t is updated based on its
we first provide a brief introduction about the architecture of current and previous hidden states and the information from
the GNN model and then review the most notable approaches its neighboring nodes, enabling the model to capture temporal
and the contributions each has made to the field. At the end dependencies in the dynamic graph-structured data [44].
of each subsection, we provide a comprehensive summary of RecGNN-based approaches for graph summarization mostly
the key features of representative GNN-based approaches. We focus on graph sampling and embedding by generating se-
present a comparative analysis in Tables I, II, III, and IV for quences from graphs and embedding those sequences into a
RecGNN, ConvGNN, GAE, and GAT architectures, respec- continuous vector space at lower dimensions. In the following,
tively. The tables include comparisons of evaluation methods, we will first briefly introduce LSTM and GRU architectures
performance metrics, training data, advantages, and limitations and then delve into the graph summarization approaches that
across the different models. are built upon their respective structures.
1) LSTM-based Approaches: LSTMs are a class of RNNs
that use a sequence of gates to concurrently model long- and
A. RecGNN-based Approaches short-term dependencies in sequential data [54]. The modified
RecGNNs are early implementations of GNNs, designed to architecture of an LSTM to handle large graph-structured data
acquire node representations through a generalized recurrent is known as a GraphLSTM [55]. Typically, the input to the
neural network (RNN) architecture. Within these frameworks, model consists of a sequence of either graph nodes or edges,
information is transmitted between nodes and their neighbours which are processed in order using LSTM units. At each time
to reach a stable equilibrium [34], [41]. A node’s hidden state step, the model updates its internal state based on the input
is continuously updated by node or edge and the previous state, as shown in Equation 2.
X
htv = f ( W ht−1
X
u ) (1) htv = LST M ( W hut−1 ) (2)
u∈N (v) u∈N (v)

where htv
is the hidden state of node v at time t, which The cell is composed of multiple gates, and its operation can
represents the information learned by the RecGNN about node be described as follows [55]:
v at a specific time step in the dynamic graph sequence.
N (v) denotes the set of neighboring nodes of node v in the fvt = σ(Wf × [ht−1 t
v , xv ] + bf ) (3)
graph, providing the context and connectivity information for
node v within the graph structure. ht−1u is the hidden state itv = σ(Wi [ht−1 t
v , xv ] + bi ) (4)
6 IEEE TRANSACTIONS ON ARTIFICIAL INTELLIGENCE, VOL. 00, NO. 0, MONTH 2023

TABLE I
C OMPARATIVE ANALYSIS OF SELECTED R EC GNN- BASED APPROACHES FOR G RAPH S UMMARIZATION .

Model Performance Training

Ref. Approach Evaluation Advantages Limitations
Name Metrics Data
Robust performance, capturing Limited performance with
Taheri et Labeled,
S2S-N2N-PP LSTM Node Classification Accuracy global structure, unsupervised Weisfeiler-Lehman labels,
al. [45] Unlabeled
learning. sensitive to noise.
Dependency on pretrained
Incorporation of long-term
Jin et embeddings, sensitivity to node
GraphLSTM LSTM Graph Classification Accuracy Labeled dependencies, effective node
al. [46] ordering, lack of discussion on
representations.
scalability.
practical limitations with massive
Bojchevski Link Prediction Generative approach, scalability,
NetGAN LSTM Link Prediction Labeled graphs, training instability due to
et al. [47] Accuracy unsupervised learning.
use of GANs.
Limited guarantee for analysis,
Flexibility in sparsification,
adding artificial edges (contradicts
Wu et Community handling large data by using
GSGAN LSTM ARI Labeled with the graph summariziation’s
al. [48] Detection random walks, accurate noise
aim), lack of comparison with
filtering.
other models.
Incorporation of temporal Limited label information, no
Ma et Node Classification, Labeled,
DyGNN LSTM MRR, Recall information, consideration of parameter analysis for hyper
al. [49] Link Prediction Unlabeled
varied influence. parameters.
Incorporates temporal information,
Node Classification,
Khoshraftar LSTM- outperforms state-of-the-art Fixed length of history, Memory
LSTM Link Prediction, AUC, F1-score Labeled
et al. [50] Node2vec methods, preserves long-term intensive, no attention mechanism.
Anomaly Detection
dependencies.
Capturing temporal information, Limited global information,
Li et al. [51] GGS-NNS GRU Node Classification Accuracy Labeled Reasonable scalability, end-to-end complexity for large graphs, task-
learning specific architecture.
Capturing topological and
Not scalable to very large graphs,
Taheri et temporal features, Sequence-
DyGGNN GRU Graph Classification Accuracy Labeled Limited evaluation, complexity
al. [52] to-sequence architecture,
and computational cost.
unsupervised.
Deep feature extraction, Limited scope of evaluation,
Ge et Graph Classification,
GR-GNN GRU Micro F1-score Labeled robustness and universality, limited extraction of deep chain-
al. [53] Node Classification
computational efficiency. dependent features.

otv = σ(Wo [ht−1 t

v , xv ] + bo ) (5) a record of both the previous inputs and the structure of the
graph. After processing all the nodes and edges, the final state
Cvt = tanh(WC [ht−1 t
v , xv ] + bC ) (6) of the graph LSTM is used as the summary representation of
ctv = fvt × ct−1 + itv × Cvt (7) the graph. This summary vector can then be used as input to
v
downstream machine learning models or as a feature for other
In the given equations, the variables fvt , itv , and otv serve as graph analysis tasks [56].
the forget, input, and output gates, respectively. The forget gate LSTM-based approaches for graph summarization have
plays a crucial role in determining what information to retain proven to be effective with a range of tasks, such as graph
or discard from the cell state (long-term memory) for node clustering, graph classification, and link prediction. For ex-
v at time t. On the other hand, the input gate is responsible ample, Taheri et al. [45] leveraged various graph-LSTM-
for determining which new information should be stored in based methods to generate sequences from a graph structure,
the cell state for the node. Also, the output gate regulates including breadth-first search, random walk, and shortest path.
what information is to be outputted from the current cell state. Finally, they trained a graph LSTM to embed those graph
Finally, the hidden state for node v at time t is updated using sequences into a continuous vector space. The hidden states
the output gate and the cell state as follows: at time step t of the encoder are as follows:
henc
t = LST Menc (xt , henc
t−1 ) (9)
htv = otv × tanh(ctv ) (8)
where xt is the input vector at time t and denotes henc
t
The cell state ctv represents the memory cell of the LSTM the hidden state at time step t in a given trained encoder
network at time t. It acts as a long-term memory, capable of LST Menc . Similarly, the hidden state at time step t of the
storing information over extended sequences. To calculate ctv , decoder is defined in Equation 10.
the new candidate value Cvt is first calculated and then the
cell state for node v at time t is updated using the forget gate, hdec
t = LST Mdec (xt−1 , hdec
t−1 ) (10)
input gate, and the new candidate values. The gates take into Bojchevski et al. [47] proposed NetGAN, a recurrent-based
consideration the previous hidden state ht−1
v of node v at time model to capture the underlying structural information that
t − 1 and the current input feature vector xtv . Moreover, Wx leads to the learning of meaningful network embeddings.
and bx are the weights and biases, respectively, associated with NetGAN leverages the power of generative models to learn a
each of the gates. compact representation of a graph via random walks, enabling
By repeating this process over multiple time steps, the more efficient processing of large graphs while preserving their
model captures the dependencies in the graph and generates essential features. As a variant, Wu et al. [48] modified the
a final hidden state that summarizes all the information con- NetGAN model to form a new social network with artificial
tained in the entire graph. This makes it possible to preserve edges that is suitable for community detection.
N. SHABANI et al.: A COMPREHENSIVE SURVEY ON GRAPH SUMMARIZATION WITH GRAPH NEURAL NETWORKS 7

Jin et al. [46] also developed an approach to learning cells to capture temporal dependencies within graph-structured
representations of graphs based on a graph LSTM. Here, data allows for effective information aggregation and context
graph representations of diverse sizes are encoded into low- modeling, making GRU-based methods a promising choice
dimensional vectors. Li et al. [57] proposed a graph summa- for summarizing complex graph-structured information. For
rization technique that uses a graph LSTM and a ConvGNN instance, Taheri et al. [52] proposed the DyGrAE model,
to improve question answering with knowledge graphs. In this which is able to learn the structure and temporal dynamics
approach, the questions, entities, and relations are represented of a dynamic graph while condensing its dimensions. A GRU
as vectors with very few dimensions, but the key properties of model captures the graph’s topology, while an LSTM model
the relations are well preserved. learns the graph’s dynamics. Ge et al. [53] developed a gated
Several studies have also focused on evolving node patterns recursive algorithm that not only solves some node aggregation
in dynamic graphs. For instance, Zhang et al. [56] intro- problems but also extracts deeply dependent features between
duced an LSTM-based approach, a one-stage model called nodes. The resulting model, called GR-GNN, is based on a
DynGNN. The model embeds an RNN into a GNN model GRU which performs the aggregation and structure. Li et al.’s
to produce a representation in compact form. Khoshraftar GRU model [51] encodes an input graph into a fixed-size vec-
et al. [50] presented a dynamic graph embedding method tor representation, which is then fed into a sequence decoder
via LSTM to convert a large graph into a low-dimensional to generate the summary as the output. The model effectively
representation. The model captures temporal changes with captures the structural information and dependencies among
LSTM using temporal walks and then transfers the learned pa- the nodes and edges in the input graph, which is crucial for
rameters into node2vec [28] to incorporate the local structure producing a coherent and informative graph summary.
of each graph. Similarly, Ma et al. [49] introduced a dynamic
RecGNN model that relies on a graph LSTM to model the B. ConvGNN-based Approaches
dynamic information in an evolving graph while reducing the
The general idea of ConvGNN-based approaches is to
graph’s dimensionality and learning manifold structures. Node
generalize the CNNs on graph-structured data [34]. The pri-
information is continuously updated by: recording the time
mary distinction between a ConvGNN and a RecGNN is
intervals between edges; recording the sequence of edges; and
the way information is propagated. While ConvGNNs apply
coherently transmitting information between nodes. Another
various weights at each timestep, RecGNNs apply the same
work by Goyal et al. [40] also presents a method for learning
weight matrices in an iterative manner until an equilibrium is
temporal transitions in dynamic graphs. This framework is
reached [69].
based on a deep architecture that mainly consists of dense
In other words, ConvGNN models are a form of neural
and recurrent layers. Model size and the number of weights to
network architecture that supports graph structures and ag-
be trained can be a problem during training, but the authors
gregates node information from the neighbourhood of each
overcome this issue with a uniform sampling of nodes.
node in a convolutional manner. ConvGNN models have
2) GRU-based Approaches: GRUs are a variant of graph
demonstrated a strong expressive capacity for learning graph
LSTMs that include a gated RNN structure and have fewer
representations, resulting in superior performance with graph
training parameters than a standard graph LSTM. The key
summarization [69].
distinction between a GRU and an LSTM is the number of
ConvGNN-based approaches fall into two categories:
gates in each model. GRU units are less complex with only
spectral-based and spatial-based methods [34].
two gates, “reset” and “update” [58].
1) Spectral-based Approaches: Spectral-based methods de-
scribe graph convolutions based on spectral graph theory and
rvt = σ(Wr [ht−1 t
v , xv ] + br ) (11)
graph signal filtering. In spectral graph theory, the multiplica-
tion of the graph with a filter (the convolution) is defined in
zvt = σ(Wz [ht−1 t
v , xv ] + bz ) (12) a Fourier domain [70].
Although the computation contains well-defined transla-
tional properties, it is relatively expensive, and the filters are
Cvt = tanh(WC [rvt × ht−1 t
v , xv ] + bC ) (13)
not generally localized. Since the level of complexity grows
with the scale of the graphs, one solution is to only check a
htv = (1 − zvt ) × ht−1
v + zvt × Cvt (14) limited number of neighbours using Chebyshev’s theory [71].
The Chebyshev polynomials Tk (x) are defined recursively as:
In these equations,rvt is a reset gate and zvt is update gate.
ht−1
v is the output of the model at the previous time step.
Tk (x) = 2xTk−1 (x) − Tk−2 (x) (15)
Similar to LSTM, Cvt computes the new candidate value, and
htv updated hidden state for node v at time t using the update where T0 (x) = 1 and T1 (x) = x. Here, x represents the
gate and the new candidate values. bx , Wx are biases and variable of the Chebyshev polynomial. k is a non-negative
weights for respective gates. integer that determines the degree of the polynomial which
Again, by repeating this process over several time steps, the is the order of the Chebyshev polynomial. It is a polynomial
model learns the dependencies that exist between the nodes in function of degree k, and its value depends on the values of
the graph, allowing it to construct a final hidden state that sum- x and k as defined by the recurrence relation mentioned in
marizes all the graph’s information. The adaptability of GRU Equation 15.
8 IEEE TRANSACTIONS ON ARTIFICIAL INTELLIGENCE, VOL. 00, NO. 0, MONTH 2023

TABLE II
C OMPARATIVE ANALYSIS OF SELECTED C ONV GNN- BASED APPROACHES FOR G RAPH S UMMARIZATION .

Model Performance Training

Ref. Approach Evaluation Advantages Limitations
Name Metrics Data
Overcoming limitations of graph Memory requirement, limited
Kipf et laplacian regularization, less to directed edges, does not
GCN Spectral Node Classification Accuracy Labeled
al. [59] complex and better scalability, support edge features, limiting
improved predictive performance. assumptions.
Limited expressiveness, loss
Ease of implementation, of hierarchical representations,
Wu et Accuracy, Micro
SGC Spectral Node Classification Labeled applicability to large-scale graphs, limitations in handling complex
al. [60] F1-score
memory efficiency. graphs due to its simplified
nature.
Applicable to different types of
Deng et Accuracy, Micro graphs, preservation of local and Lack of node label preservation,
GraphZoom Spectral Node Classification Labeled
al. [61] F1-score global structure, scalable to huge limited support for large graphs.
graphs.
Robustness to graph irregularities,
general applicability (can be Limited to undirected graphs,
Rossi et
SIGN Spectral Node Classification Accuracy Labeled applied to various graph-based dependent on the choice of
al. [62]
tasks), faster training and operator combinations.
inference.
Accuracy, AUC, Hierarchical graph convolution,
Jiang et Complex optimization, limited
Hi-GCN Spectral Graph Classification Sensitivity and Labeled contribution to neuroscience,
al. [63] comparison with prior works.
Specificity consideration of correlation.
Scalability, inductive learning, Assumption of homophily, limited
Hamilton et
GraphSAGE Spatial Node Classification Micro F1-score Labeled flexibility in using aggregation global context, over-smoothing,
al. [64]
strategies. hyperparameter sensitivity.
Ease of sampling implementation,
Limited comparison with the
Chen et Labeled, retains model accuracy despite
FastGCN Spatial Node Classification Micro F1-score state-of-the-art, opportunities to
al. [65] Unlabeled using importance sampling to
reduce sampling variance remain.
speed up training.
Efficient training on large
Sampling overhead, pre-
Zeng et Accuracy, Micro graphs by introducing ”neighbor
GraphSAINT Spatial Node Classification Labeled processing requirements, limited
al. [66] F1-score explosion”, low training
generalization to unseen graphs.
complexity.
Limited application, dependency
Provides interpretability, on data quality, lack of
Yan et Accuracy, Micro
GroupINN Spatial Node Classification Labeled parameter reduction, capturing generalizability to different
al. [67] F1-score
complex relationships. datasets or tasks outside the scope
of the neuroscience domain.
Limited contribution of negative
Accuracy, AUC, Multi-view brain network connectives, influence of
Wen et
MVS-GCN Spatial Graph Classification Sensitivity and Labeled embedding, interpretability, hyperparameters, challenges
al. [68]
Specificity effective graph structure learning. in interpreting complex brain
networks.

This theory has led to several studies that explore the idea Convolutional Network) model, which has since become a
of applying approximation to the spectral graph convolution. popular choice for various graph-related tasks. Given an undi-
For example, Defferrard et al. [72] generalized the CNNs to rected graph with an adjacency matrix A, they computed the
graphs by designing localised convolutional filters on graphs. normalized graph Laplacian L e as follows:
Their main idea was to leverage the spectral domain of graphs
and use Chebyshev polynomial approximation to efficiently e = I − D−1/2 AD−1/2
L (17)
compute localized filters as follows:
where I is the identity matrix, and D is the diagonal degree
K−1
X matrix, with Dii representing the sum of the weights of the
Z= θk Tk (L)X
e (16) edges connected to node i.
k=0
In this work, instead of directly computing graph convolu-
where L represents a graph Laplacian, X is the node feature tions using high-order Chebyshev polynomials, as done in the
matrix, θ is the graph convolutional operator with a filter pa- previous work by Defferrard et al. [72], Kipf et al. proposed
rameter, L e = 2L/I −I, and
e is a scaled Laplacian defined as L using a simple first-order approximation of graph filters. They
K is the order of the Chebyshev polynomial approximation. defined the graph convolution operation as [59]:
The θk parameters are the learnable filter coefficients. They
also introduced a graph summarization procedure that groups e −1/2 A
H (l+1) = σ(D eDe −1/2 H (l) W (l) ) (18)
similar vertices together and a graph pooling procedure that
focuses on producing a higher filter resolution. This work where H (l) represents the hidden node features at layer l.
has been used as the basis of several studies that draw on W (l) is the weight matrix for the layer l. Ae = A + I is
Chebyshev polynomials to speed up convolution computations. the adjacency matrix with self-loops added. And, D e is the
As a variant, Kipf et al. [59] introduced several simpli- diagonal degree matrix of A. e Here, the normalized graph
fications to the original framework to improve the model’s Laplacian D e −1/2 A
eDe −1/2 is used to aggregate information
classification performance and scalability to large networks. from neighboring nodes. Hence, the propagation can be written
These simplifications formed the basis of the GCN (Graph as follows:
N. SHABANI et al.: A COMPREHENSIVE SURVEY ON GRAPH SUMMARIZATION WITH GRAPH NEURAL NETWORKS 9

  of aggregation. For example, the mean aggregation can be

1 defined as:
(l+1) (l)
X
hi = σ q · hj · W (l)  (19)
j∈Ni D
e ii D
e jj
(l) 1 X (l)
AGGREGATE({hj , ∀j ∈ Ni }) = hj (21)
(l)
|Ni |
j∈Ni
where hi is the feature vector of node i at layer l. Ni is the
set of neighbors of node i in the graph. W (l) is the weight (l)
where hj is the feature vector of node j at layer l and |Ni |
matrix for the l-th layer. σ(·) is the activation function applied is the number of sampled neighbors of node i.
element-wise. (l+1)
The updated feature hi of node i is obtained by applying
More recently, several successful variations of the spectral a learnable weight matrix W (l+1) to the aggregated feature
method have been proposed, e.g., S-GCN [60] and later representation:
SIGN [62]. S-GCN is a simplified GCN model but does
not come with any performance compromises in terms of
(l+1) (l)
graph summarization. The idea behind the S-GCN model is to hi = σ(W (l+1) · AGGREGATE({hj , ∀j ∈ Ni })) (22)
first convert large convolutional filters into smaller ones and (l+1)
then remove the final non-linear layers. Inspired by previous where hi is the updated feature vector of node i in
ConvGNN models, Rossi et al. [62] subsequently proposed the (l + 1)-th layer. W (l+1) is the weight matrix for the
SIGN, which scales ConvGNNs to extremely large graphs by (l + 1)-th layer. And, σ(·) is the activation function (e.g.,
combining various amendable graph convolutional filters for ReLU) applied element-wise. The resulting vector has twice
faster training and sampling purposes. the dimensionality of the input feature vector, as it contains
Another prominent line of research in spectral-based Con- both the original node features and the aggregated neighbor
vGNN approaches revolves around transforming graph objects, features. GraphSAGE is a more flexible model that allows for
e.g., embedding. For example, Jiang et al. [63] introduced different types of aggregator functions to be used. This makes
a hierarchical ConvGNN for graph embedding. This team it a good choice for graphs with heterogeneous node types,
built upon a spectral ConvGNN to provide an effective rep- where different types of information need to be aggregated
resentation learning scheme for end-to-end graph classifi- differently. GraphSAGE has also been shown to perform well
cation tasks. More specifically, they proposed a framework on tasks such as graph classification and node classification,
for learning graph feature embeddings while also taking the both of which are closely tied to the task of graph summa-
network architecture and relationships between subjects into rization.
account. Deng et al. [61] introduced a multilevel framework Chen et al. [65] introduced FASTGCN, a node-wise sam-
to enhance the scalability and accuracy of embedding in an pling approach that utilizes importance sampling for bet-
unsupervised manner. The model initially generates a new, ter summaries. By sampling only a fraction of nodes and
efficient graph that contains information about both the node leveraging importance weights, FASTGCN approximates the
properties and the topology of the original graph. It then graph convolution operation while maintaining a high level
repeatedly aggregates the nodes with high spectral similarity, of performance. This results in faster convergence during
breaking the graph down into numerous smaller graphs. training, making it particularly suitable for large-scale graph-
2) Spatial-based Approaches: Spatial-based methods work based tasks. Later, Huang et al. [74] and Zeng et al. [66]
on the local neighborhood of nodes, aggregating node repre- proposed layer-wise and graph-wise sampling methods, re-
sentations from neighboring nodes to understand their proper- spectively, to further improve performance. Huang et al. [74]
ties. ConvGNNs of this kind imitate the convolution operations focused on addressing redundancy in node-wise sampling,
of CNNs by defining convolutions directly in the graph do- while Zeng et al.’s GraphSAINT [66] aimed to correct bias
main. Unlike spectral-based approaches, which are relatively and variance in minibatch estimators when sampling subgraphs
expensive and time-consuming to compute, the structure of the for training. GraphSAINT is particularly useful for graph-
spatial-based approaches is simple and has generated cutting- based tasks where dealing with large-scale graphs can be
edge outcomes with graph summarization challenges [73]. computationally challenging. Its sampling-based approach and
As a closely-related approach to Kipf and Welling’s minibatch correction mechanism make it a powerful tool
model [59], GraphSAGE extends their framework to the for scalable and accurate graph summarization. In a recent
inductive setting [64]. GraphSAGE was the first approach to variation, Li et al. [75] introduced bipartite GraphSAGE,
introduce node-wise sampling coupled with minibatch training tailored for bipartite graphs containing different node types
for node embeddings using spatial convolutions. The updated and inter-layer edges. This framework involves a user-item
propagation rule, which uses a mean aggregator function, is graph supporting both user and item embedding, with nodes
formulated as follows [64]: embedded into two separate feature spaces — one for user-
related information and the other for item-related information.
(l+1) (l)
hi = σ(W (l+1) · AGGREGATE({hj , ∀j ∈ Ni })) (20) Many aggregation-based methods have also been introduced
to summarize graphs without sacrificing too much infor-
where a mean aggregator function,AGGREGATE(·), combines mation within the original graphs. The summarized graph
the features of node i and its sampled neighbors. The aggrega- can then be used to assist in further network analysis and
tion function can be a mean, max-pooling, or any other form graph mining tasks. For instance, Yan et al. [67] introduced
10 IEEE TRANSACTIONS ON ARTIFICIAL INTELLIGENCE, VOL. 00, NO. 0, MONTH 2023

a novel approach called GroupINN, which enhances Con- where G′ represents the reconstructed graph, which can consist
vGNN through summarization, leading to faster training, data of either reconstructed features, graph structure, or both. A is
denoising, and improved interpretability. They employed an the adjacency matrix, and X is the input node feature matrix.
end-to-end neural network model with a specialized node fE serves as the graph encoder, responsible for transforming
grouping layer, which effectively summarizes the graph by the graph and node features into a condensed representation.
reducing its dimensionality. Hu et al. [76] took structural Conversely, fD acts as the graph decoder, responsible for
similarity into consideration, aggregating nodes that share a reconstructing the original graph or its components from the
similar structure into hypernodes. The summarized graph is latent representation.
then refined to restore each node’s representation. To this GAEs can be trained using various loss functions, such as
end, a deep hierarchical ConvGNN (H-GCN) architecture mean squared error (MSE) or binary cross-entropy (BCE).
with several coarsening operation layers followed by several They can also be extended to incorporate additional constraints
refinement operation layers performs semi-supervised node or regularization techniques to improve the quality of the
classification. The refinement layers are designed to restore the learned graph representation [88]. For example, for graph
structure of the original graph for graph classification tasks. reconstruction, the goal is to minimize the difference between
The most recent developments in ConvGNNs demonstrate the original adjacency matrix A and the reconstructed adja-
the exciting potential graph summarization holds for a range cency matrix Â. The MSE loss is calculated as follows [78]:
of applications in healthcare and human motion analysis [68], 1 X
[77]. Wen et al. [68], for example, presented a promising LM SE = (Aij − Âij )2 (24)
N × N i,j
approach to diagnosing autism spectrum disorder by pars-
ing brain structure information through a multi-view graph where, N is the number of nodes in the graph, and Aij
convolution network. Dang et al. [77] introduced a new type and Âij are the elements of the original and reconstructed
of graph convolution network, called a multi-scale residual adjacency matrices, respectively.
graph convolution network, that shows superior performance The majority of GAE-based approaches for graph summa-
in predicting human motion compared to other state-of-the-art rization use combined architectures that include ConvGNNs
models. or RecGNNs [78], [89], [80]. For example, Kipf et al. [78]
proposed a variational graph autoencoder (VGAE) for undi-
C. GAE-based Approaches rected graphs based on their previous work on spectral convo-
An autoencoder is a neural network that consists of an lutions [59]. VGAE incorporates a two-layer ConvGNN model
encoder and a decoder. Generally, the encoder transforms based on the variational autoencoder in [89]. The main concept
the input data into a condensed representation, while the of VGAE is to represent the input graph data not as a single
decoder reconstructs the actual input data from the encoder’s point in the latent space but as a probability distribution.
output [85]. Graph autoencoders, or GAEs, are a type of This distribution captures the uncertainty and variability in
GNN that can be applied over graph structures, allowing the the graph’s latent representation. Instead of directly obtaining
model to learn a compact and informative representation of a a fixed latent representation from the encoder, VGAE samples
graph. Lately, GAEs have garnered increasing interest for their a random point from the learned distribution. The encoder in
ability to summarize graphs due to their significant potential VGAE typically consists of two or more graph convolutional
for dimensionality reduction [36]. layers that process the input graph data and produce latent
The structure of the encoder and decoder in a GAE can vary node representations. Each graph convolutional layer can be
depending on the specific implementation and the complexity defined as follows [78]:
of the graph data. Generally, both the encoder and decoder
e −1/2 A
Z (l+1) = σ(D eDe −1/2 Z (l) W (l) ) (25)
are neural network architectures that are designed to process
graph data efficiently [86]. The architecture of the encoder may (l)
where Z represents the latent node representations at layer
include multiple layers of graph convolutions or aggregations, l of the encoder. A
e is the adjacency matrix of the graph with
followed by non-linear activation functions. The output of the added self-loops. D e W (l)
e is the diagonal degree matrix of A.
encoder is a compact and informative representation of the is the weight matrix for the lth layer. σ(·) is the activation
graph in the latent space. On the other hand, the decoder function (e.g., ReLU) applied element-wise.
takes the latent representation obtained from the encoder The VGAE introduces stochasticity to GAEs by sampling
and reconstructs the original graph structure from it. The the latent representation Z from a Gaussian distribution in
decoder’s architecture should mirror the encoder’s architecture the latent space. The mean µ and log-variance log σ 2 of the
in reverse. It transforms the latent representation back into a distribution are obtained from the output of the last graph
graph structure [11]. convolutional layer:
The goal of a GAE is to learn an encoder and decoder that
reduces the gap between the original graph and the reconstruc- µ = Z (L) · W (µ) (26)
tion error of the decoded graph, while also encouraging the 2
log σ = Z (L)
·W (σ)
(27)
latent representation to capture meaningful information about
the graph’s structure [87]: Here, L represents the last layer of the encoder. W (µ) and
W (σ) are learnable weight matrices for obtaining the mean
Z = fE (A, X), G′ = fD (A, Z) (23) and log-variance, respectively.
N. SHABANI et al.: A COMPREHENSIVE SURVEY ON GRAPH SUMMARIZATION WITH GRAPH NEURAL NETWORKS 11

TABLE III
C OMPARATIVE ANALYSIS OF GAE- BASED APPROACHES FOR G RAPH S UMMARIZATION .

Model Performance Training

Ref. Evaluation Advantages Limitations
Name Metrics Data
Limited expressive power,
Generative model, unsupervised
Labeled, dependency on the quality and
Kipf et al. [78] VGAE Link Prediction AP, AUC learning, scalable to large graphs,
Unlabeled quantity of the training data,
variational inference.
graph structure assumptions.
Integration of structure and Performance on recall metric,
Accuracy, F1-
content information, deep limited improvement over other
Wang et al. [79] MGAE Graph Clustering score, Precision, Labeled
marginalized architecture, efficient algorithms, static setting for
Recall, ARI
training procedure. structure and content.
Flexibility in modeling, Marginal improvement with more
Hajiramezanali Labeled, interpretable latent representation, flexible priors, challenges in
VGRNN Link Prediction AP, AUC
et al. [80] Unlabeled incorporation of stochastic latent predicting very sparse graphs,
variables. tractability of direct optimization.
Effective fusion of multi-view Time-consuming, introduction of
Accuracy, AUC, Labeled,
Fan et al. [81] One2Multi Graph Clustering information, joint optimization of noise, limited capacity for deep
NMI, ARI Unlabeled
embedding and clustering. relations.
Effective fusion of multiple views, Parameter sensitivity, dataset
Accuracy, AUC, Labeled,
Cai et al. [82] GRAE Graph Clustering adaptive weight learning, self- dependency, assumption of
NMI, ARI Unlabeled
training clustering. homogeneous graphs.
Analysis on 17 real-world Performance variation across
Average
Linear Link Prediction, Labeled, graphs with various sizes datasets, relevance of benchmark
Salha et al. [83] Precision, AUC-
AE Node Clustering Unlabeled and characteristics, one-hop datasets, limited evaluation of
ROC, AMI
interactions. deeper models.
Improved clustering performance,
Cannot incorporate structural and
controlled trade-off between FR
Mrabah et Accuracy, NMI, content information, trade-off
R-GAE Graph Clustering Labeled and FD, theoretical and empirical
al. [84] ARI between fr and fd, not suitable for
support, organized approach, and
generating graph-specific outputs.
flexibility in integration.

To sample from the Gaussian distribution, the reparameter- problem [82], [79], [81], [84]. For example, Cai et al. [82]
ization trick [90] is used. A random noise vector ϵ is sampled suggested a graph recurrent autoencoder model for use in clus-
from a standard Gaussian distribution (ϵ ∼ N (0, 1)). The tering attributed multi-view graphs. The fundamental concept
sampled latent representation Z is then computed as: behind the approach is to consider both the characteristics
1 that all views have in common and those that make each
Z = µ + ϵ · exp( log σ 2 ) (28) graph view special. To this end, the framework includes two
2
separate models: the Global Graph Autoencoder (GGAE) and
Finally, the decoder maps the sampled latent representation
the Partial Graph Autoencoder (PGAE). The purpose of the
Z back into the graph space. In VGAE, the reconstruction is
GGAE is to learn the characteristics shared by all views, while
typically performed using an inner product between the latent
the PGAE captures the distinct features. The cells are grouped
node representations to predict the adjacency matrix Â:
into clusters using a soft K-means clustering algorithm after
Â = σ(Z · Z T ) (29) the output is obtained. Fan et al. [81] introduced the One2Multi
Graph Autoencoder (OMGAE) for multi-view graph cluster-
Here, σ(·) is the sigmoid activation function to ensure that
ing. OMGAE leverages a shared encoder to learn a common
the predicted adjacency matrix Â is within the range [0, 1].
representation from multiple views of a graph and uses multi-
The loss function in VGAE consists of two terms: a
ple decoders to reconstruct each view separately. Additionally,
reconstruction loss and a kullback-leibler (KL) divergence
OMGAE introduces a new attention mechanism that assigns
loss [91]. The reconstruction loss measures the difference
different weights to each view during the clustering process
between the predicted adjacency matrix Â and the actual
based on their importance. The model is trained to minimize a
adjacency matrix A. The KL divergence loss penalizes the
joint loss function that considers both the reconstruction error
deviation of the learned latent distribution from the standard
and the clustering performance. Mrabah et al. [84] devised a
Gaussian distribution. The overall loss function is the sum of
new graph autoencoder model for attributed graph clustering
these two losses as follows:
called GAE-KL. The model uses a new formulation of the
L = Eq(Z|X,A) [logp(A|Z)] − KL[a(Z|X, A)||p(Z)] (30) objective function, which includes a KL-divergence term, to
learn a disentangled representation of the graph structure and
As an extension to VGAE, Hajiramezanali et al. [80]
the node attributes. The disentangled representation is then
constructed a variational graph RNN by integrating a RecGNN
used to cluster the nodes based on their similarity in terms
and a GAE to model the dynamics of the node attributes and
of both structure and attributes. The authors also introduced
the topological dependencies. The aim of the model is to learn
a new evaluation metric called cluster-based classification
an interpretable latent graph representation as well as to model
accuracy (CCA) to measure clustering performance.
sparse dynamic graphs.
There are also several aggregation-based approaches built Recently, Salha et al. [83] proposed a graph autoencoder
on GAEs. These are generally designed to formulate the architecture that uses one-hop linear models to encode and
challenges with graph clustering tasks as a summarization decode graph information. The approach simplifies the model
12 IEEE TRANSACTIONS ON ARTIFICIAL INTELLIGENCE, VOL. 00, NO. 0, MONTH 2023

while still achieving high performance with graph summariza- internal processes in the GAT equations, as shown in Equation
tion tasks, such as node clustering and graph classification. 34.
Uniquely, this paper presents a direction for designing graph
autoencoder models that balances performance with simplicity. eij = aT LeakyReLU (W.[hi ||hj ]) (34)
As another variation of GAT, Xie et al. [93] proposed a
D. GAT-based Approaches novel multi-view graph attention network named MGAT, to
support low-dimensional representation learning based on an
The idea of an attention mechanism was first proposed by attention mechanism in a multi-view manner. The authors
Bahdanau and his colleagues in 2014 [98]. The goal was to focus on a view-based attention approach that not only ag-
allow for modelling long-term dependencies in sequential data gregates view-based node representations but also integrates
and to improve the performance of autoencoders. Essentially, various types of relationships into multiple views.
attention allows the decoder to concentrate on the most rele- Tu et al. [95] explored the benefits of using graph sum-
vant part of the input sequence with the most relevant vectors marization and refining bipartite user-item graphs for recom-
receiving the highest weights. Graph attention networks or mendation tasks. They applied a conditional attention mecha-
GATs [92] are based on the same idea. They use attention- nism to task-based sub-graphs to determine user preferences,
based neighborhood aggregation, assigning different weights which emphasizes the potential of summarizing and enhancing
to the nodes in a neighborhood. This type of model is one of knowledge graphs to support recommender systems. Salehi et
the most popular GNN models for node aggregation, largely al. [94] defined a model based on an autoencoder architecture
because it reduces storage complexity along with the number with a graph attention mechanism that learns low-dimensional
of nodes and edges. The key formulation for a GAT is: representations of graphs. The model compresses the informa-
X tion in the input graph into a fixed-size latent vector, which
hi = σ( αij W hj ) (31)
serves as a summary of the entire graph. Through the use of
i∈N (j)
attention, the model is able to discern and prioritize critical
where hi is the hidden feature vector of node ui , N (j) is nodes and edges within the graph, making it more effective at
the set of neighbouring nodes of ui , hj is the hidden state of capturing the graph’s structural and semantic properties.
neighbouring node uj , W is a weight matrix, and αij is the More recent works on GATs conducted by Chen et al. [96]
attention coefficient that measures the importance of node uj and Li et al. [97] demonstrate the potential of graph attention
to node ui .The attention coefficients are computed as: networks for summarizing and analyzing complex graph data
in various domains. Chen et al. proposed a multi-view graph
exp(eij )
αij = sof tmaxj (eij ) = P (32) attention network for travel recommendations. The model
k∈Ni exp(eik ) takes several different types of user behaviors into account,
where eij is a scalar energy value computed as: such as making hotel reservations, booking flights, and leaving
restaurant reviews, and, in the process, learns an attention
eij = LeakyReLU (aT .[W hi ||W hj ]) (33) mechanism to weigh the importance of different views for a
recommendation. Li et al. developed a multi-relational graph
where a is a learnable parameter vector, and || denotes attention network for knowledge graph completion. The model
concatenation. The LeakyReLU function introduces non- integrates an attention mechanism and edge-type embeddings
linearity into the model and helps prevent vanishing gradients. to capture the complex semantic relations between entities in
The softmax function normalizes the energy values across all a knowledge graph.
neighboring nodes so that the attention coefficients sum to one.
By computing attention coefficients for neighboring nodes,
V. G RAPH R EINFORCEMENT L EARNING
GATs are able to selectively focus on the most important
parts of the graph for each node. This allows the model to Reinforcement learning (RL) is a mathematical model based
adaptively adjust to different graph structures and tasks. The on sequential decisions that allows an agent to learn via trial
attention mechanism also means GATs can incorporate node and error in an interactive setting through feedback on its
and edge features into the model, making them well-suited to actions. Due to the success and fast growth of reinforcement
summarization tasks, such as node classification with complex learning in interdisciplinary fields, scholars have recently
graphs [99]. been inspired to investigate reinforcement learning models
Today, GATs are considered to be one of the most advanced for graph-structured data, i.e., graph reinforcement learning
models for learning with large-scale graphs. However, recently or GRL [100]. GRL is largely implemented based on the
Brody et al. [39] argued that GATs do not actually compute Bellman theory [101], where the environment is represented
dynamic attention; rather, they only compute a restricted form as a graph, nodes represent states, edges represent possible
of static attention. To support their claim, they introduced transitions between states, and rewards are associated with
GATv2, a new version of this type of attention network, which specific state-action pairs or nodes. The key components of
is capable of expressing problems that require computing GRL are as follows [100]:
dynamic attention. Focusing on the importance of dynamic • Environment (graph): The graph G represents the en-
weights, these authors argue that the problem of only support- vironment in which the agent operates. It is defined as
ing static attention can be fixed by changing the sequence of G = (V, E), where V is the set of nodes representing
N. SHABANI et al.: A COMPREHENSIVE SURVEY ON GRAPH SUMMARIZATION WITH GRAPH NEURAL NETWORKS 13

TABLE IV
C OMPARATIVE ANALYSIS OF GAT- BASED APPROACHES FOR G RAPH S UMMARIZATION .

Model Performance Training

Ref. Evaluation Advantages Limitations
Name Metrics Data
Adaptive attention mechanism
for focusing on relevant nodes, Computational complexity for
Velivckovic et Accuracy, Micro Labeled, the ability to capture long- large graphs, memory-intensive
GAT Node Classification
al. [92] F1-score Unlabeled range dependencies in graphs, training, susceptible to over-
scalability to handle large graph smoothing.
structures.
Lack of consideration for
End-to-end multi-view graph
temporal dynamics, limited
embedding framework, attention-
scalability to complex graph
Node Classification, Micro F1-score, based integration of node
Xie et al. [93] MGAT Labeled networks, overfitting risk with
Link Prediction AUC information, effective and efficient
large regularization terms, lack of
performance, and generalizability
comparison with other multi-view
to complex graph networks.
embedding methods.
Inductive learning capability, a Dependency on graph structure,
Transductive and flexible and unified architecture, lack of label information, and the
Salehi et al [94] GATE Inductive Node Accuracy Labeled efficiency and scalability, and need for unified architectures for
Classification comprehensive quantitative and both transductive and inductive
qualitative evaluation. tasks.
Addresses the limitations of Problem and dataset dependence,
the GAT model, more robust to difficulty in predicting best
Node Classification, Accuracy, ROC- Labeled,
Brody et al [39] GAT v2 noise, can handle more complex architecture, performance gap
Link Prediction AUC Unlabeled
interactions between nodes, between theoretical and practical
improved accuracy. models.
Effective knowledge graph
distillation, knowledge Sampling bias, time complexity,
Recommender Hit@k,
Tu et al [95] KCAN Labeled graph refinement, significant explainability, scalability,
Systems NDCG@k, AUC
improvements, preserving local hyperparameter sensitivity.
preference.
Leveraging to address data Limited consideration of factors,
MV- Recommender Recall@k, Labeled, sparsity, multi-view graph complexity and scalability,
Chen et al [96]
GAN Systems NDCG@k Unlabeled embedding, view-level attention limited consideration of multiple
mechanism, interpretability. modalities.
Selective aggregation of Complexity and redundant
informative features, effective computation, sampling useful
Graph Classification,
MR, MRR, Labeled, fusion of entity and relation incorrect training examples,
Li et al [97] MRGAT Node Classification,
Hits@k Unlabeled features, interpretability, and exploiting other background
Link Prediction
consideration of computational knowledge, influence of attention
efficiency. head.

states and E is the set of edges representing possible π(a|s) = Probability of taking action a in state s.
transitions between states. • Value Function (V ) and Q-function (Q): In Graph RL, the
• State (s): In GRL, a state s corresponds to a specific node value function V(s) and the Q-function Q(s, a) represent
in the graph. Each node may have associated attributes the expected cumulative reward the agent can obtain start-
or features that provide information about the state. ing from a particular state (node) and following a policy
• Action (a): An action a corresponds to a decision or π, or by taking action a in state s and then following
move that the agent can make when in a particular policy π, respectively. The Q-learning algorithm can be
state (node). In graph-based environments, actions can be formulated as:
related to traversing edges between nodes or performing
some operation on a node. Q(st , at ) ← Q(st , at ) + α[Rt+1 +
• Transition Model (T ): The transition model defines the γmaxa Q(st+1 , a) − Q(st , at )] (35)
dynamics of the graph, specifying the probability of where at each timestep t, the state st interacts with the
moving from one state (node) to another by taking a environment using a behavior policy based on the Q-
specific action (edge). table values. It takes action a, receives a reward R, and
T (s, a, s′ ) = P (s′ |s, a), where s is the current state transitions to a new state st+1 based on the environment’s
(node), a is the action (edge), and s′ is the next state feedback. This process is used to update the Q-table
(node). iteratively, continually incorporating information from the
• Reward Function (R): The reward function defines the new state st+1 until reaching the termination time t.
immediate reward the agent receives after taking a par-
The primary objective in GRL is to acquire a policy that
ticular action in a given state (node).
maximizes the expected Q-function Q(s, a) over a sequence
R(s, a) = Expected immediate reward received when
of actions, the target policy is defined as [102]:
taking action a in state s.
• Policy (π): Similar to standard RL, the policy in Graph π ∗ = arg max Q(s, a)
π
RL is a strategy that the agent uses to decide which action X
to take in each state (node). = arg max Eπ,T [ γ k rt+k |st = s, at = a] (36)
π
k=0
14 IEEE TRANSACTIONS ON ARTIFICIAL INTELLIGENCE, VOL. 00, NO. 0, MONTH 2023

where Eπ,T [·] denotes the expectation with respect to both the In another study, Wickman et al. [108] recently presented
policy π and the distribution of transitions T (i.e., state tran- a graph sparsification framework, SparRL, empowered by a
sitions and rewards). The expression k=0 γ k rt+k represents
P
GRL to be used for any edge sparsification assignment with
the sum of discounted rewards obtained in the future starting a specific target for reduction. The model takes an edge
from time step t (the current time step) and continuing for k reduction ratio as its input, and a learning model decides
steps into the future. rt+k is the reward obtained at time step how best to prune the edges. SparRL proceeds in a sequential
t+k after taking action at at time step t and following policy π manner, removing edges from the graph until a total of
thereafter. The discount factor γ is a value between 0 and 1 that edges have been pruned. In another work, Wu et al. [48]
determines the importance of immediate rewards compared to introduced GSGAN, a novel method for graph sparsification
future rewards. It discounts future rewards to make them less in community detection tasks. GSGAN excels at identifying
significant than immediate rewards. Smaller γ values make the crucial relationships not apparent in the original graph and
agent more myopic, whereas larger γ values make the agent enhances community detection effectiveness by introducing
more far-sighted. The overall objective is to find the policy π artificial edges. Employing a generative adversarial network
that maximizes the expected sum of rewards (the Q-function) (GAN) model, GSGAN generates random walks that effec-
starting from state s and taking action a. tively capture the network’s underlying structure. What sets
Achieving this goal involves employing various algorithms, this approach apart is its utilization of reinforcement learning,
like Q-learning, or utilizing a policy gradient method that which enables the method to optimize learning objectives by
updates Q-values or policy parameters based on observed deriving rewards from a specially designed reward function.
rewards and transitions [86]. This reinforcement learning component guides the generator
GRL employs a diverse range of algorithms, and it fre- to create highly informative random walks, ultimately leading
quently utilizes GNNs to efficiently process and learn from to improved performance in community detection tasks. Yan et
data structured as graphs. GNNs play a crucial role in updating al. [111] introduced a GRL approach to summarize geographic
node representations by considering their neighboring nodes, knowledge graphs. To obtain a more thorough understanding
and they are seamlessly integrated into the RL framework to of the summarizing process, the model exploits components
handle tasks specific to graphs with effectiveness. They are with spatial specificity and includes both the extrinsic and the
seamlessly integrated into the graph summarization frame- intrinsic information in the graph. The authors also discuss the
work to effectively handle tasks that involve summarizing effectiveness of spatial-based models and compare the results
graph structures. For instance, Yan et al. [103] introduced of their model with models that include non-spatial entities.
a ConGNN-based neural network specifically designed for Recently, many articles have discussed the potential of using
graph sampling, enabling the automatic extraction of spatial GNN-based GRLs to summarize and analyze complex graph
features from the irregular graph topology of the substrate data in domains like neuroscience and computer vision [109],
network. To optimize the learning agent, they adopt a popular [110]. For example, Zhao et al. [109] suggested a deep
parallel policy gradient training method, enhancing efficiency reinforcement learning scheme guided by a GNN as a way
and robustness during training. Wu et al. [104] tackled the to analyze brain networks. The model uses a reinforcement
problem of graph signal sampling by formulating it as a learning framework to learn a policy for selecting the most
reinforcement learning task in a discrete action space. They informative nodes in the network and combines that with a
use a deep Q-network (DQN) to enable the agent to learn an GNN to learn the node representations. Also, Goyal et al. [110]
effective sampling strategy. To make the training process more presented a GNN-based approach to image classification that
adaptable, they modify the steps and episodes. During each relies on reinforcement learning. The model represents images
episode, the agent learns how to choose a node at each step as graphs and learns graph convolutional filters to extract
and selects the best node at the end of the episode. They also features from the graph representation. They showed that their
redefine the actions and rewards to suit the sampling problem. model outperforms several state-of-the-art methods on bench-
In another work by Wu et al. [105], a reinforced sample mark datasets with both image classification and reinforcement
selection approach for GNNs’ transfer learning is proposed. learning tasks.
The approach uses GRL to guide transfer learning and reduce In Table V, we summarize the key features of representative
the divergence between the source and target domains. GRL-based approaches for graph summarization. Evaluation
There is also a line of GRL research that seeks to use methods, performance metrics, training data, advantages, and
this paradigm to evaluate and improve the quality of graph limitations are compared among different models.
summaries. For example, Amiri et al. [106] introduced a
task-based GRL framework to automatically learn how to
VI. P UBLISHED A LGORITHMS AND DATASETS
generate a summary of a given network. To provide an optimal
solution for finding the best task-based summary, the authors In the following section, we will offer a comprehensive
made use of CNN layers in combination with a reinforcement overview of the benchmark datasets, evaluation metrics, and
learning technique. To improve the quality of the summary, the open-source implementations. These critical components are
authors later proposed NetReAct [107], an interactive learning extensively examined and elaborated upon in Sections IV
framework for graph summarization. The model uses human and V of the literature survey. By delving into these aspects,
feedback in tandem with reinforcement learning to improve we aim to provide a thorough understanding of the landscape
the summaries, while visualizing the document network. covered in the aforementioned sections.
N. SHABANI et al.: A COMPREHENSIVE SURVEY ON GRAPH SUMMARIZATION WITH GRAPH NEURAL NETWORKS 15

TABLE V
C OMPARATIVE ANALYSIS OF SELECTED GRL- BASED APPROACHES FOR G RAPH S UMMARIZATION .

Model Performance Training

Ref. Evaluation Advantages Limitations
Name Metrics Data
Efficient graph sampling, RL
Limitations on performance on
Graph Reconstruc- Labeled, approach, no need for labeled
Wu et al. [104] DQN Accuracy big graphs, sampling set size, the
tion Unlabeled data, potential for automation,
assumptions on the training graph.
improved reconstruction accuracy.
Automatic flexible approach to Computational complexity, task
Influence Maximiza- generating meaningful graph dependency, limited scope of
Amiri et Labeled,
NetGist tion, Community ρnetgist summaries for a given set of graph optimization problems,
al. [106] Unlabeled
Detection tasks, generalization to unseen lack of comparison with existing
Instances. methods.
The simplicity of non-expert feed-
back, sparsity and inconsistency
Incorporating human feedback,
Amiri et Labeled, of human feedback, scalability
NetReAct Graph Clustering ρnetreact meaningful relationships between
al. [107] Unlabeled to larger document datasets,
groups, multi-scale visualization
need for further exploration and
development.
Involves matrix operations and
Task adaptability, learning
can be computationally intensive
Wickman et PageRank, Spearman’s ρ Labeled, efficiency and convergence,
SparRL for large graphs, sampling-based
al. [108] community structure index, ARI Unlabeled flexibility, efficiency, and ease
techniques, limited to static graph
of use.
settings.
Efficient and effective sample Non-differentiable sample
RSS- Labeled, selection, alleviates divergence selection, computational
Wu et al. [105] Transfer Learning AUC-ROC
GNN Unlabeled between source and target domain complexity, generalization to
graphs. new downstream tasks.
Improved classification perfor-
mance, customized aggregation, Performance variation with
BNN- Average Labeled, effective brain network analysis, input types, generalizability,
Zhao et al. [109] Graph Classification
GNN Accuracy, AUC Unlabeled flexibility in meta-policy hyperparameter sensitivity,
application, robustness to different explainability.
input types.
Lack of robustness in coarsened
graph representation, training
Improved image representation, time and gpu utilization,
Goyal et Labeled,
GNRL Graph Classification Topk Accuracy interpretability, efficient training, limited improvement over
al. [110] Unlabeled
and scalability. model-free techniques, lack
of generalizability to other
environments.

A. Datasets C. Open Source Implementations

Both synthetic and real-world datasets are used in the Tables VIII and IX provide a detailed comparison of
development of the field. Synthetic datasets are created by selected approaches, showcasing the outcomes of replicating
models based on manually-designed rules, while real-world established comparable models through two leading GNN-
datasets are collected from actual applications and used to based graph summarization evaluation techniques: node clas-
evaluate the performance of proposed methods for practical sification and link prediction. The evaluation is conducted on
use. The popular real-world datasets are divided into five both static datasets such as Cora and Citeseer, as well as
categories: citation networks, social networks, user-generated dynamic datasets like Enron and Facebook to assess model
networks, bio-informatics networks and image/neuroimages, performance in evolving graph structures. This comparison
and knowledge graphs. Table VI presents an overview of the encompasses model names, platforms, datasets, metrics, hyper-
datasets most commonly utilized in the mentioned categories. parameters, and implementation sources, with all implemen-
Additionally, Figure 4 offers a year-wise development trend tations developed using Python 3.x and popular frameworks
for each, providing valuable insights into their evolution and like PyTorch, PyTorch Geometric, or TensorFlow.
availability over time.
VII. D ISCUSSION AND F UTURE D IRECTIONS
B. Evaluation Metrics This survey has provided a comprehensive examination of
GNNs are typically evaluated through tasks like node clas- GNNs in the context of graph summarization, focusing on
sification, graph classification, graph clustering, recommen- key methodologies such as RecGNN, ConvGNN, GAE, GAT,
dation, and link prediction. To provide an overview of the and the emerging field of GRL. Each of these approaches
evaluation criteria used in each study, we categorized each offers unique perspectives and methodologies for capturing
of the articles we reviewed based on the metrics they used to the complex relationships inherent in graph data.
evaluate their methods. These metrics mostly include accuracy, RecGNN-based approaches can capture a substantial
precision, recall, F1-score, and AUC-ROC. Table VII lists the amount of information during their recursive neighbourhood
most-used evaluation metrics and their calculation formulas or expansions by using recurrent units to identify the long-
descriptions. term dependence across layers. Further, this process is quite
16 IEEE TRANSACTIONS ON ARTIFICIAL INTELLIGENCE, VOL. 00, NO. 0, MONTH 2023

TABLE VI
P UBLISHED DATASETS .
Category Dataset Publications URL
[59], [78], [99], [79], [65],
Cora [74], [60], [61], [76], [94], https://relational.fit.cvut.cz/dataset/CORA
[83], [53], [84], [47]
[59], [78], [99], [79], [74],
Citeseer [61], [76], [94], [83], [53], https://relational.fit.cvut.cz/dataset/CiteSeer
[84], [47]
Citation Networks
[59], [78], [99], [74], [61],
PubMed [76], [94], [83], [53], [84], https://relational.fit.cvut.cz/dataset/PubMed Diabetes
[47]
DBLP [81], [82], [47], [50] https://dblp.uni-trier.de/xml/
ACM [81], [82], [50] http://www.arnetminer.org/openacademicgraph
[64], [65], [74], [60], [61],
Reddit https://github.com/redditarchive/reddit
[66], [62], [83], [56]
IMDB [81], [82] https://datasets.imdbws.com/
Karate [106] http://networkdata.ics.uci.edu/data/karate/
Social Networks Facebook [106], [80], [108], [48] http://snap.stanford.edu/data/ego-Facebook.html
DNC [49] https://github.com/alge24/DyGNN/tree/main/Dataset
UCI [49] https://github.com/alge24/DyGNN/tree/main/Dataset
Twitter [93], [108] http://snap.stanford.edu/data/ego-Twitter.html
Amazon [66], [108] http://snap.stanford.edu/data/amazon-meta.html
Yelp [66], [62], [95], [56] https://www.yelp.com/dataset
Epinions [49] http://snap.stanford.edu/data/soc-Epinions1.html
Taobao [75] https://tianchi.aliyun.com/dataset/649
MovieLens [95], [96] https://grouplens.org/datasets/movielens/
User-generated Networks
Last-FM [95] https://grouplens.org/datasets/hetrec-2011/
Eumail [48] https://snap.stanford.edu/data/email-EuAll.html
Enron [80] https://snap.stanford.edu/data/email-Enron.html
POL. BLOGS [47] https://networks.skewed.de/net/polblogs
[64], [99], [61], [66], [62],
PPI https://github.com/williamleif/GraphSAGE
[83], [56]
MUTAG [45], [46], [94] https://networkrepository.com/Mutag.php
PTC [45] http://www.predictive-toxicology.org/ptc/
https://github.com/snap-
Bio-informatic Networks, ENZymes [45], [46]
stanford/GraphRNN/tree/master/dataset/ENZYMES
Image/Neuroimage
NCI [45], [46] https://cdas.cancer.gov/
Flickr [66], [62] https://shannon.cs.illinois.edu/DenotationGraph/
fMRI [67], [109] https://adni.loni.usc.edu/data-samples/access-data/
ADNI [63] https://adni.loni.usc.edu/data-samples/access-data/
ABIDE [63] https://fcon˙1000.projects.nitrc.org/indi/abide/
Knowledge Graphs [111], [59], [76], [103]
Synthetic Networks [106], [107], [40], [39], [56]

efficient. Notably, RecGNN models can also improve numer- decode graph data, GAEs can generate compact representa-
ical stability during training if they incorporate convolutional tions that preserve essential topological information. However,
filters. However, they may face challenges in long-range the quality of the summarization is heavily dependent on the
dependency modeling due to the vanishing gradient problem, choice of the encoder and decoder, which can be a non-
a common issue in recurrent architectures. trivial design choice. In addition, most GAE-based approaches
ConvGNN-based approaches leverage a more spatial ap- are typically unregularized and mainly focus on minimizing
proach, effectively aggregating local neighborhood informa- the reconstruction error while ignoring the data distribution
tion. This method has been particularly effective in tasks where of the latent space. However, this might lead to poor graph
local structure is highly informative. Nonetheless, the convolu- summarization when working with sparse and noisy real-
tional approach may not fully capture the global context, which world graph data. Although there are a few studies on GAE
can be critical in certain summarization tasks. In addition, regularization [112], more research is needed in this regard.
most existing ConvGNN models for graph summarization GAT-based approaches introduce an attention mechanism
simply presume the input graphs are static. However, in the that allows for the weighting of nodes’ contributions to the
real world, dynamically evolving graphs/networks are more representation. This approach can adaptively highlight impor-
common. For instance, in a social network the number of users, tant features and relationships within the graph. While GATs
their connected friends, and their activities are constantly provide a flexible mechanism that can potentially outperform
changing. To this end, learning ConvGNNs on static graphs other methods, they may also require more computational
may not yield the best results. Hence, more research on resources and can be prone to overfitting on smaller datasets.
dynamic ConvGNN models is needed to increase the quality Given the recent advancements in this area, we expect to see
of summaries with large-scale dynamic graphs. more research in the future on using GATs to create condensed
GAE-based approaches offer a powerful framework for representations of both static and dynamic graphs.
unsupervised learning on graphs. By learning to encode and GRL-based approaches merge reinforcement learning with
N. SHABANI et al.: A COMPREHENSIVE SURVEY ON GRAPH SUMMARIZATION WITH GRAPH NEURAL NETWORKS 17

of graph data and the early stage of deep learning techniques

for graph mining present significant challenges. To address
these hurdles and drive future progress, we have identified
four potential directions for graph summarization with GNNs.

A. Dynamic Graphs
In the real world, the data that graphs represent can evolve
over time, creating changes in a graph’s topology, such as
new edges that appear, nodes that disappear, or attributes that
change over time. These dynamics can cause fundamental
changes to the entire graph. Summarizing dynamic graphs
typically involves boiling the graph down into a series of
Fig. 4. Year-wise development of datasets in the reviewed papers. snapshots taken at various time increments. The model is
then trained over these snapshots, yet a number of challenges
TABLE VII
E VALUATION METRICS . can arise. To date, the approaches developed to tackle these
problems have primarily focused on capturing temporal pat-
Evaluation Metric Formula/Description terns and changes in graph topology. However, these methods
Accuracy tp+tn
tp+tn+f p+f n often struggle with efficiently processing large-scale dynamic
Precision tp graphs and accurately capturing the evolving nature of graph
tp+f p
Recall tp relationships. Future work in this area could focus on develop-
tp+f n
Pn ing more scalable algorithms that can handle larger dynamic
Average Precision i=1 (Ri − Ri−1 ) · Pi graphs without compromising processing speed or accuracy.
F1-score 2 × Recall×P recision
Recall+P recision
Micro F1-score tp
tp+f p+f n B. Task-based Summarization
Specificity tn
tn+f p Graph summarization is crucial as graph sizes grow, aiding
AUC-ROC The Area Under the ROC Curve. in understanding, sensemaking, and analysis. Different tasks,
ARI Adjusted Rand Index, measures the like detecting communities or identifying influential nodes,
similarity between two data clusterings. require tailored summarization strategies. This necessitates
The Normalized Mutual Information developing specific approaches for each unique task to ef-
NMI measures the similarity between two
clusters. fectively identify relevant patterns. Moreover, although task-
Hit@k Number of relevant items in top k based summarization is critical, this field of study has seen
k few successes [106], [107] and is still an underexplored area.
NDCG@k Normalized Discounted Cumulative Gain Open research problems include how to perform task-based
at k.
summarization on streaming and heterogeneous graphs and
Measures the strength and direction of
Spearman’s ρ index the monotonic relationship between two how to leverage human feedback in the learning process.
variables.
ρnetgist Expected ratio. C. Evaluation Benchmarks
ρnetreact Quantifying the ease of identifying The optimal outcome of a graph summarization process is
relevant documents.
1 Pn
a “good” summary of the original graph. However, evaluating
Mean Rank
n i=1 ranki the “goodness” of a summary is an application-specific task
Mean Reciprocal 1 P N 1
Rank N i=1 ranki that depends on the task at hand. For example, sampling-based
methods are evaluated based on the quality of the sampling,
while aggregation-based methods are evaluated based on the
graph models to selectively summarize graphs by learning quality of classification, and so on. Current studies commonly
from reward feedback. This method is promising for decision- use comparisons between their method and one or more
centric summarization tasks like graph compression or key established methods to measure the quality of their results.
substructure identification. Designing customized deep GRL For instance, metrics that have been used in the literature
architectures for the purpose of graph summarization stands include information loss, ease of visualization, and sparsity [2].
to be a promising direction in the future. However, being However, more and different evaluation metrics are required
relatively new, GRL faces challenges in defining rewards and for cases where the validation process becomes complex and
efficient exploration of graph spaces. more elements are involved, such as visualization and multi-
Across all approaches, we observe a trade-off between the resolution summaries.
ability to capture different aspects of graph structure and the
computational efficiency. Furthermore, each method’s perfor- D. Generative Models
mance can vary significantly depending on the characteristics In the context of generative models, graph summarization
of the dataset in use and the particular summarization task can be used to generate new graphs that have similar proper-
being addressed. Our analysis has revealed that the complexity ties to the original ones. Generative models, such as VAEs,
18 IEEE TRANSACTIONS ON ARTIFICIAL INTELLIGENCE, VOL. 00, NO. 0, MONTH 2023

TABLE VIII
C OMPARABLE MODELS FROM PUBLISHED LITERATURE USING N ODE C LASSIFICATION FOR EVALUATIONS . S/D: S TATIC /DYNAMIC , NL: N UMBER OF
L AYERS , AF: ACTIVATION F UNCTIONS , DR: D ROPOUT RATE , LR: L EARNING R ATE , WD: W EIGHT D ECAY.

Model Platform Dataset Metrics (%) Hyperparameters Repo.

Accuracy F1-score
PyTorch- Cora (S) 81.32±0.01 81.11±0.01
GCN [59] NL, AF, DR, LR, WD, Optimizer LINK
Geometric Citeseer (S) 69.30±0.01 67.10±0.01
PyTorch- Cora (S) 80.20±0.00 80.30±0.00 NL, AF, DR, LR, WD, Optimizer,
GraphSAGE [64] LINK
Geometric Citeseer (S) 69.80±0.02 69.70±0.02 Sampling size, Aggregator type
Node Classification

PyTorch- Cora (S) 97.16±0.50 96.36±0.50 NL, AF, DR, LR, WD, Optimizer,
GraphSAINT [66] LINK
Geometric Citeseer (S) 91.90±0.40 89.66±0.30 Sampling Technique, Sample Size
PyTorch- Cora (S) 79.40±0.05 79.45±0.06 NL, AF, DR, LR, WD, Optimizer,
FastGCN [65] LINK
Geometric Citeseer (S) 67.70±0.09 66.88±0.12 Layer-wise Importance Sampling
PyTorch- Cora (S) 81.98±0.00 82.00±0.01 NL, AF, DR, LR, WD, Optimizer,
GAT [99] LINK
Geometric Citeseer (S) 67.70±0.02 67.44±0.02 Attention Mechanism, nHeads, Atten. DR
PyTorch- Cora (S) 80.90±0.01 81.00±0.02 NL, AF, DR, LR, WD, Optimizer,
GAT v2 [39] LINK
Geometric Citeseer (S) 67.01±0.03 66.30±0.04 Attention Mechanism, nHeads, Atten. DR
Cora (S) 83.10±0.02 83.02±0.01 NL, AF, LR, Optimizer, Lambda(λ),
GATE [94] Tensorflow LINK
Citeseer (S) 71.55±0.02 71.88±0.02 Weight Sharing
Cora (S) 82.44±0.50 81.98±0.50 NL, AF, LR, Optimizer, Channels,
H-GCN [76] Tensorflow LINK
Citeseer (S) 71.84±0.60 70.96±0.60 Coarsening Layers, Num. Channel

TABLE IX
C OMPARABLE MODELS FROM PUBLISHED LITERATURE USING L INK P REDICTION FOR EVALUATIONS . S/D: S TATIC /DYNAMIC , NL: N UMBER OF
L AYERS , AF: ACTIVATION F UNCTIONS , DR: D ROPOUT RATE , LR: L EARNING R ATE , WD: W EIGHT D ECAY.

Model Platform Dataset Metrics (%) Hyperparameters Repo.

AUC-ROC AP
Cora (S) 89.55±0.02 90.98±0.01 DR, LR, Optimizer, Encoder, Latent Space
GAE [78] PyTorch LINK
Citeseer (S) 90.53±0.02 91.80±0.02 Dim., Regularization
Cora (S) 89.27±0.02 91.24±0.01 DR, LR, Optimizer, Encoder,
VGAE [78] PyTorch LINK
Citeseer (S) 91.58±0.03 92.50±0.02 Regularization, Loss Terms (KL)
Link Prediction

Tensorflow, Cora (S) 81.81±0.00 85.42±0.00 Regularization, Architectures, Down-Proj, LINK,

NetGAN [47]
Pytorch Citeseer (S) 92.33±0.00 92.76±0.00 Temp Annealing, RW Length, RW Params LINK
Tensorflow, Cora (S) 88.96±1.10 89.88±1.00 DR, LR, Epochs, Optimizer, LINK,
Linear AE [83]
PyTorch Citeseer (S) 90.88±1.55 92.44±1.78 Dimensionality d, Hidden Layer Dim. LINK
Tensorflow, Cora (S) 89.77±0.98 90.84±0.95 DR, LR, Epochs, Optimizer, LINK,
Linear VAE [83]
PyTorch Citeseer (S) 91.25±0.90 92.66±0.78 Dimensionality d, Hidden Layer Dim. LINK
Enron (D) 92.58±0.25 92.66±0.50 LR, Epochs, Hidden Layer, GCN Layers,
VGRNN [80] PyTorch LINK
Facebook (D) 87.67±0.60 87.88±0.70 Noise Dimension, Early Stopping
Enron (D) 93.14±0.50 93.20±0.50 LR, Epochs, Hidden Layer, GCN Layers,
SI-VGRNN [80] PyTorch LINK
Facebook (D) 88.04±1.00 88.98±1.23 Noise Dimension, Early Stopping

Graph Transformers, Graph Adversarial Networks (GANs), To advance research in this field, we also outlined sev-
and Graph Auto-Regressive Models, offer effective approaches eral frequently-used benchmarking tools, including datasets,
for graph summarization. These models can learn patterns in open-source codes, and techniques for generating summarized
graph data and generate new graph summaries by sampling graphs. In addition, we identified four promising directions
from learned distributions or sequentially generating nodes and for future research based on our findings from the survey. We
edges. Researchers are continually exploring new techniques strongly believe that using GNNs for graph summarization is
to pushing the boundaries of model architectures, scalability, not just a passing trend. Rather, it has a bright future in a wide
controllability, and interpretability [47], [108], [48]. The field range of applications across different domains.
presents exciting opportunities for innovation and has the As a potential area of focus for our future work, we
potential to transform various domains through more efficient endeavor to delve into the capabilities of GNN-based gener-
and accurate graph summarization techniques. ative models including VGAEs [78] and GANs [47], [48], to
push the boundaries of graph summarization and generation.
VIII. C ONCLUSION Additionally, we will explore the potential of GRL [108] to
New advancements in deep learning with multi-layer deep create new graphs based on their summarized representations.
neural networks have made it possible to quickly and ef- By addressing challenges and expanding the frontiers of
fectively produce a condensed representation of a large and graph synthesis, we envision empowering data analysts and
complex graph. In this paper, we surveyed the technical researchers with powerful tools for efficient and insightful
trends and the most current research in graph summarization analysis of complex graph-structured data.
with GNNs. We provided an overview of different graph
summarization techniques and categorized and described the
current GNN-based approaches of graph summarization. We R EFERENCES
also discussed a new line of research focusing on RL methods [1] C. C. Aggarwal and H. Wang, Managing and mining graph data.
to evaluate and improve the quality of graph summaries. Springer, 2010, vol. 40.
N. SHABANI et al.: A COMPREHENSIVE SURVEY ON GRAPH SUMMARIZATION WITH GRAPH NEURAL NETWORKS 19

[2] Y. Liu, T. Safavi, A. Dighe, and D. Koutra, “Graph summariza- [27] M. P. Boobalan, D. Lopez, and X. Z. Gao, “Graph clustering using k-
tion methods and applications: A survey,” ACM Computing Surveys neighbourhood attribute structural similarity,” Applied soft computing,
(CSUR), vol. 51, no. 3, pp. 1–34, 2018. vol. 47, pp. 216–223, 2016.
[3] Š. Čebirić, F. Goasdoué, H. Kondylakis, D. Kotzinos, I. Manolescu, [28] A. Grover and J. Leskovec, “node2vec: Scalable feature learning for
G. Troullinou, and M. Zneika, “Summarizing semantic graphs: a networks,” in Proceedings of the 22nd ACM SIGKDD International
survey,” The VLDB Journal, vol. 28, no. 3, pp. 295–327, 2019. Conference on Knowledge Discovery and Data Mining, 2016, pp. 855–
[4] D. Gibson, R. Kumar, and A. Tomkins, “Discovering large dense 864.
subgraphs in massive graphs,” in Proceedings of the 31st International [29] M. Kurant, M. Gjoka, C. T. Butts, and A. Markopoulou, “Walking on a
Conference on Very Large Data Bases, 2005, pp. 721–732. graph with a magnifying glass: stratified sampling via weighted random
[5] K. LeFevre and E. Terzi, “Grass: Graph structure summarization,” walks,” in Proceedings of the ACM SIGMETRICS Joint International
in Proceedings of the 2010 SIAM International Conference on Data Conference on Measurement and Modeling of Computer Systems, 2011,
Mining. SIAM, 2010, pp. 454–465. pp. 281–292.
[6] P. Zhao, X. Li, D. Xin, and J. Han, “Graph cube: on warehousing and [30] A. D. Stivala, J. H. Koskinen, D. A. Rolls, P. Wang, and G. L. Robins,
olap multidimensional networks,” in SIGMOD International Confer- “Snowball sampling for estimating exponential random graph models
ence on Management of Data, 2011, pp. 853–864. for large networks,” Social Networks, vol. 47, pp. 167–188, 2016.
[7] B. Kulis, S. Basu, I. Dhillon, and R. Mooney, “Semi-supervised graph [31] M. Hajiabadi, “Efficient graph summarization of large networks,” Ph.D.
clustering: a kernel approach,” in Proceedings of the 22nd International dissertation, University of Victoria, 2022.
Conference on Machine Learning, 2005, pp. 457–464.
[32] K. A. Kumar and P. Efstathopoulos, “Utility-driven graph summariza-
[8] B. Karrer and M. E. Newman, “Stochastic blockmodels and community
tion,” Proceedings of the VLDB Endowment, vol. 12, no. 4, pp. 335–
structure in networks,” Physical Review E, vol. 83, no. 1, p. 016107,
347, 2018.
2011.
[33] S. Dumbrava, A. Bonifati, A. N. R. Diaz, and R. Vuillemot, “Ap-
[9] P. Hu and W. C. Lau, “A survey and taxonomy of graph sampling,”
proximate querying on property graphs,” in Scalable Uncertainty
arXiv preprint arXiv:1308.5865, 2013.
Management: 13th International Conference, SUM 2019, Compiègne,
[10] C. Doerr and N. Blenn, “Metric convergence in social network sam-
France, December 16–18, 2019, Proceedings 13. Springer, 2019, pp.
pling,” in Proceedings of the 5th ACM Workshop on HotPlanet, 2013,
250–265.
pp. 45–50.
[11] J. Zhou, G. Cui, S. Hu, Z. Zhang, C. Yang, Z. Liu, L. Wang, C. Li, [34] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and S. Y. Philip, “A
and M. Sun, “Graph neural networks: A review of methods and comprehensive survey on graph neural networks,” IEEE transactions
applications,” AI Open, vol. 1, pp. 57–81, 2020. on neural networks and learning systems, vol. 32, no. 1, pp. 4–24,
[12] J. Chen, Y. Saad, and Z. Zhang, “Graph coarsening: from scientific 2020.
computing to machine learning,” SeMA Journal, pp. 1–37, 2022. [35] G. Dong, M. Tang, Z. Wang, J. Gao, S. Guo, L. Cai, R. Gutierrez,
[13] L.-C. Zhang, “Graph sampling: An introduction,” The Survey Statisti- B. Campbel, L. E. Barnes, and M. Boukhechba, “Graph neural net-
cian, vol. 83, pp. 27–37, 2021. works in iot: a survey,” ACM Transactions on Sensor Networks, vol. 19,
[14] X. Liu, M. Yan, L. Deng, G. Li, X. Ye, and D. Fan, “Sampling no. 2, pp. 1–50, 2023.
methods for efficient training of graph convolutional networks: A [36] H. Cai, V. W. Zheng, and K. C.-C. Chang, “A comprehensive survey
survey,” IEEE/CAA Journal of Automatica Sinica, vol. 9, no. 2, pp. of graph embedding: Problems, techniques, and applications,” IEEE
205–234, 2021. Transactions on Knowledge and Data Engineering, vol. 30, no. 9, pp.
[15] Ü. Çatalyürek, K. Devine, M. Faraj, L. Gottesbüren, T. Heuer, H. Mey- 1616–1637, 2018.
erhenke, P. Sanders, S. Schlag, C. Schulz, D. Seemaier et al., “More [37] F. Liu, S. Xue, J. Wu, C. Zhou, W. Hu, C. Paris, S. Nepal, J. Yang, and
recent advances in (hyper) graph partitioning,” ACM Computing Sur- P. S. Yu, “Deep learning for community detection: progress, challenges
veys, vol. 55, no. 12, pp. 1–38, 2023. and opportunities,” arXiv preprint arXiv:2005.08225, 2020.
[16] S. A. Bhavsar, V. H. Patil, and A. H. Patil, “Graph partitioning [38] L. Wu, P. Cui, J. Pei, L. Zhao, and L. Song, “Graph neural networks,”
and visualization in graph mining: a survey,” Multimedia Tools and in Graph Neural Networks: Foundations, Frontiers, and Applications.
Applications, vol. 81, no. 30, pp. 43 315–43 356, 2022. Springer, 2022, pp. 27–37.
[17] L. Yue, X. Jun, Z. Sihang, W. Siwei, G. Xifeng, Y. Xihong, L. Ke, [39] S. Brody, U. Alon, and E. Yahav, “How attentive are graph attention
T. Wenxuan, L. X. Wang et al., “A survey of deep graph clustering: Tax- networks?” arXiv preprint arXiv:2105.14491, 2021.
onomy, challenge, and application,” arXiv preprint arXiv:2211.12875, [40] P. Goyal, S. R. Chhetri, and A. Canedo, “dyngraph2vec: Captur-
2022. ing network dynamics using dynamic graph representation learning,”
[18] C. Lee and D. J. Wilkinson, “A review of stochastic block models Knowledge-Based Systems, vol. 187, p. 104816, 2020.
and extensions for graph clustering,” Applied Network Science, vol. 4, [41] B. Huang and K. M. Carley, “Inductive graph representation learning
no. 1, pp. 1–50, 2019. with recurrent graph neural networks,” CoRR, abs/1904.08035, 2019.
[19] Y. Xie, B. Yu, S. Lv, C. Zhang, G. Wang, and M. Gong, “A survey on [42] S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,”
heterogeneous network representation learning,” Pattern recognition, Neural Computation, vol. 9, no. 8, pp. 1735–1780, 11 1997. [Online].
vol. 116, p. 107936, 2021. Available: https://doi.org/10.1162/neco.1997.9.8.1735
[20] S. M. Kazemi, R. Goel, K. Jain, I. Kobyzev, A. Sethi, P. Forsyth, and
[43] K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares,
P. Poupart, “Representation learning for dynamic graphs: A survey,”
H. Schwenk, and Y. Bengio, “Learning phrase representations using
The Journal of Machine Learning Research, vol. 21, no. 1, pp. 2648–
rnn encoder-decoder for statistical machine translation,” arXiv preprint
2720, 2020.
arXiv:1406.1078, 2014.
[21] M. Xu, “Understanding graph embedding methods and their applica-
[44] X. Lai, P. Yang, K. Wang, Q. Yang, and D. Yu, “Mgrnn: Structure
tions,” SIAM Review, vol. 63, no. 4, pp. 825–853, 2021.
generation of molecules based on graph recurrent neural networks,”
[22] X. Wang, D. Bo, C. Shi, S. Fan, Y. Ye, and S. Y. Philip, “A survey
Molecular Informatics, vol. 40, no. 10, p. 2100091, 2021.
on heterogeneous graph embedding: methods, techniques, applications
and sources,” IEEE Transactions on Big Data, 2022. [45] A. Taheri, K. Gimpel, and T. Berger-Wolf, “Learning graph repre-
[23] W. Jin, L. Zhao, S. Zhang, Y. Liu, J. Tang, and N. Shah, “Graph con- sentations with recurrent neural network autoencoders,” KDD Deep
densation for graph neural networks,” arXiv preprint arXiv:2110.07580, Learning Day, 2018.
2021. [46] Y. Jin and J. F. JaJa, “Learning graph-level representations with
[24] B. Mayer and B. Perozzi, “Scaling heterogeneous recurrent neural networks,” arXiv preprint arXiv:1805.07683, 2018.
graph sampling and gnns with google cloud dataflow,” [47] A. Bojchevski, O. Shchur, D. Zügner, and S. Günnemann, “Netgan:
https://cloud.google.com/blog/products/ai-machine-learning/ Generating graphs via random walks,” in International conference on
scaling-heterogeneous-graph-sampling-gnns-google-cloud-dataflow, machine learning. PMLR, 2018, pp. 610–619.
2022, accessed: April 4, 2023. [48] H.-Y. Wu and Y.-L. Chen, “Graph sparsification with generative ad-
[25] R. Interdonato, M. Magnani, D. Perna, A. Tagarelli, and D. Vega, versarial network,” in 2020 IEEE International Conference on Data
“Multilayer network simplification: approaches, models and methods,” Mining (ICDM). IEEE, 2020, pp. 1328–1333.
Computer Science Review, vol. 36, p. 100246, 2020. [49] Y. Ma, Z. Guo, Z. Ren, J. Tang, and D. Yin, “Streaming graph
[26] P. S. Chodrow, N. Veldt, and A. R. Benson, “Generative hypergraph neural networks,” in Proceedings of the 43rd International ACM SIGIR
clustering: From blockmodels to modularity,” Science Advances, vol. 7, Conference on Research and Development in Information Retrieval,
no. 28, p. eabh1303, 2021. 2020, pp. 719–728.
20 IEEE TRANSACTIONS ON ARTIFICIAL INTELLIGENCE, VOL. 00, NO. 0, MONTH 2023

[50] S. Khoshraftar, S. Mahdavi, A. An, Y. Hu, and J. Liu, “Dynamic [73] A. Ajit, K. Acharya, and A. Samanta, “A review of convolutional
graph embedding via lstm history tracking,” in 2019 IEEE International neural networks,” in International Conference on Emerging Trends in
Conference on Data Science and Advanced Analytics (DSAA). IEEE, Information Technology and Engineering. IEEE, 2020, pp. 1–5.
2019, pp. 119–127. [74] W. Huang, T. Zhang, Y. Rong, and J. Huang, “Adaptive sampling
[51] Y. Li, D. Tarlow, M. Brockschmidt, and R. Zemel, “Gated graph towards fast graph representation learning,” Advances in Neural In-
sequence neural networks,” arXiv preprint arXiv:1511.05493, 2015. formation Processing Systems, vol. 31, 2018.
[52] A. Taheri, K. Gimpel, and T. Berger-Wolf, “Learning to represent the [75] Z. Li, X. Shen, Y. Jiao, X. Pan, P. Zou, X. Meng, C. Yao, and J. Bu,
evolution of dynamic graphs with recurrent models,” in Companion “Hierarchical bipartite graph neural networks: Towards large-scale e-
Proceedings of The 2019 World Wide Web Conference, 2019, pp. 301– commerce applications,” in IEEE 36th International Conference on
307. Data Engineering (ICDE). IEEE, 2020, pp. 1677–1688.
[53] K. Ge, J.-Q. Zhao, and Y.-Y. Zhao, “Gr-gnn: Gated recursion-based [76] F. Hu, Y. Zhu, S. Wu, L. Wang, and T. Tan, “Hierarchical graph
graph neural network algorithm,” Mathematics, vol. 10, no. 7, p. 1171, convolutional networks for semi-supervised node classification,” arXiv
2022. preprint arXiv:1902.06667, 2019.
[54] L. R. Medsker and L. Jain, “Recurrent neural networks,” Design and [77] L. Dang, Y. Nie, C. Long, Q. Zhang, and G. Li, “Msr-gcn: Multi-scale
Applications, vol. 5, pp. 64–67, 2001. residual graph convolution networks for human motion prediction,” in
[55] X. Liang, X. Shen, J. Feng, L. Lin, and S. Yan, “Semantic object Proceedings of the IEEE/CVF International Conference on Computer
parsing with graph lstm,” in Computer Vision–ECCV 2016: 14th Vision, 2021, pp. 11 467–11 476.
European Conference, Amsterdam, The Netherlands, October 11–14, [78] T. N. Kipf and M. Welling, “Variational graph auto-encoders,” arXiv
2016, Proceedings, Part I 14. Springer, 2016, pp. 125–143. preprint arXiv:1611.07308, 2016.
[56] C.-Y. Zhang, Z.-L. Yao, H.-Y. Yao, F. Huang, and C. P. Chen, “Dynamic [79] C. Wang, S. Pan, G. Long, X. Zhu, and J. Jiang, “Mgae: Marginalized
representation learning via recurrent graph neural networks,” IEEE graph autoencoder for graph clustering,” in Proceedings of the 2017
Transactions on Systems, Man, and Cybernetics: Systems, vol. 52, ACM on Conference on Information and Knowledge Management,
no. 3, pp. 468–479, 2022. 2017, pp. 889–898.
[57] S. Li, K. W. Wong, C. C. Fung, and D. Zhu, “Improving question [80] E. Hajiramezanali, A. Hasanzadeh, K. Narayanan, N. Duffield,
answering over knowledge graphs using graph summarization,” in M. Zhou, and X. Qian, “Variational graph recurrent neural networks,”
International Conference on Neural Information Processing. Springer, Advances in Neural Information Processing Systems, vol. 32, 2019.
2021, pp. 489–500. [81] S. Fan, X. Wang, C. Shi, E. Lu, K. Lin, and B. Wang, “One2multi
[58] K. Zarzycki and M. Ławryńczuk, “Advanced predictive control for gru graph autoencoder for multi-view graph clustering,” in proceedings of
and lstm networks,” Information Sciences, vol. 616, pp. 229–254, 2022. the web conference 2020, 2020, pp. 3070–3076.
[59] T. N. Kipf and M. Welling, “Semi-supervised classification with graph [82] E. Cai, J. Huang, B. Huang, S. Xu, and J. Zhu, “Grae: Graph recurrent
convolutional networks,” arXiv preprint arXiv:1609.02907, 2016. autoencoder for multi-view graph clustering,” in 2021 4th International
[60] F. Wu, A. Souza, T. Zhang, C. Fifty, T. Yu, and K. Weinberger, “Sim- Conference on Algorithms, Computing and Artificial Intelligence, 2021,
plifying graph convolutional networks,” in International Conference on pp. 1–9.
Machine Learning. PMLR, 2019, pp. 6861–6871. [83] G. Salha, R. Hennequin, and M. Vazirgiannis, “Simple and effective
[61] C. Deng, Z. Zhao, Y. Wang, Z. Zhang, and Z. Feng, “Graphzoom: A graph autoencoders with one-hop linear models,” in Machine Learning
multi-level spectral approach for accurate and scalable graph embed- and Knowledge Discovery in Databases: European Conference, ECML
ding,” arXiv preprint arXiv:1910.02370, 2019. PKDD 2020, Ghent, Belgium, September 14–18, 2020, Proceedings,
[62] E. Rossi, F. Frasca, B. Chamberlain, D. Eynard, M. Bronstein, and Part I. Springer, 2021, pp. 319–334.
F. Monti, “Sign: Scalable inception graph neural networks,” arXiv [84] N. Mrabah, M. Bouguessa, M. F. Touati, and R. Ksantini, “Rethinking
preprint arXiv:2004.11198, vol. 7, p. 15, 2020. graph auto-encoder models for attributed graph clustering,” IEEE
[63] H. Jiang, P. Cao, M. Xu, J. Yang, and O. Zaiane, “Hi-gcn: a hierarchical Transactions on Knowledge and Data Engineering, 2022.
graph convolution network for graph embedding learning of brain [85] W. H. L. Pinaya, S. Vieira, R. Garcia-Dias, and A. Mechelli, “Autoen-
network and brain disorders prediction,” Computers in Biology and coders,” in Machine Learning. Academic Press, 2020, pp. 193–208.
Medicine, vol. 127, p. 104096, 2020. [86] Z. Zhang, P. Cui, and W. Zhu, “Deep learning on graphs: A survey,”
[64] W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation learn- IEEE Transactions on Knowledge and Data Engineering, vol. 34, no. 1,
ing on large graphs,” in Advances in Neural Information Processing pp. 249–270, 2020.
Systems. MIT Press, 2017, pp. 1024–1034. [87] Z. Hou, X. Liu, Y. Dong, C. Wang, J. Tang et al., “Graph-
[65] J. Chen, T. Ma, and C. Xiao, “Fastgcn: fast learning with graph mae: Self-supervised masked graph autoencoders,” arXiv preprint
convolutional networks via importance sampling,” arXiv preprint arXiv:2205.10803, 2022.
arXiv:1801.10247, 2018. [88] G. Salha, R. Hennequin, and M. Vazirgiannis, “Keep it simple: Graph
[66] H. Zeng, H. Zhou, A. Srivastava, R. Kannan, and V. Prasanna, autoencoders without graph convolutional networks,” arXiv preprint
“Graphsaint: Graph sampling based inductive learning method,” arXiv arXiv:1910.00942, 2019.
preprint arXiv:1907.04931, 2019. [89] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv
[67] Y. Yan, J. Zhu, M. Duda, E. Solarz, C. Sripada, and D. Koutra, preprint arXiv:1312.6114, 2013.
“Groupinn: Grouping-based interpretable neural network for classifi- [90] D. P. Kingma, T. Salimans, and M. Welling, “Variational dropout and
cation of limited, noisy brain data,” in Proceedings of the 25th ACM the local reparameterization trick,” Advances in neural information
SIGKDD International Conference on Knowledge Discovery & Data processing systems, vol. 28, 2015.
Mining, 2019, pp. 772–782. [91] T. Kim, J. Oh, N. Y. Kim, S. Cho, and S.-Y. Yun, “Comparing kullback-
[68] G. Wen, P. Cao, H. Bao, W. Yang, T. Zheng, and O. Zaiane, “Mvs-gcn: leibler divergence and mean squared error loss in knowledge distilla-
A prior brain structure learning-guided multi-view graph convolution tion,” in 30th International Joint Conference on Artificial Intelligence
network for autism spectrum disorder diagnosis,” Computers in Biology (IJCAI-21). IJCAI, 2021, pp. 2628–2635.
and Medicine, vol. 142, p. 105239, 2022. [92] P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Lio,
[69] S. Zhang, H. Tong, J. Xu, and R. Maciejewski, “Graph convolutional and Y. Bengio, “Graph attention networks,” arXiv preprint
networks: a comprehensive review,” Computational Social Networks, arXiv:1710.10903, 2017.
vol. 6, no. 1, pp. 1–23, 2019. [93] Y. Xie, Y. Zhang, M. Gong, Z. Tang, and C. Han, “Mgat: Multi-view
[70] R. Levie, F. Monti, X. Bresson, and M. M. Bronstein, “Cayleynets: graph attention networks,” Neural Networks, vol. 132, pp. 180–189,
Graph convolutional neural networks with complex rational spectral 2020.
filters,” IEEE Transactions on Signal Processing, vol. 67, no. 1, pp. [94] A. Salehi and H. Davulcu, “Graph attention auto-encoders,” in 2020
97–109, 2018. IEEE 32nd International Conference on Tools with Artificial Intelli-
[71] D. K. Hammond, P. Vandergheynst, and R. Gribonval, “Wavelets gence (ICTAI). IEEE, 2020, pp. 989–996.
on graphs via spectral graph theory,” Applied and Computational [95] K. Tu, P. Cui, D. Wang, Z. Zhang, J. Zhou, Y. Qi, and W. Zhu,
Harmonic Analysis, vol. 30, no. 2, pp. 129–150, 2011. “Conditional graph attention networks for distilling and refining knowl-
[72] M. Defferrard, X. Bresson, and P. Vandergheynst, “Convolutional neu- edge graphs in recommendation,” in Proceedings of the 30th ACM
ral networks on graphs with fast localized spectral filtering,” Advances International Conference on Information & Knowledge Management,
in Neural Information Processing Systems, vol. 29, 2016. 2021, pp. 1834–1843.
N. SHABANI et al.: A COMPREHENSIVE SURVEY ON GRAPH SUMMARIZATION WITH GRAPH NEURAL NETWORKS 21

[96] L. Chen, J. Cao, Y. Wang, W. Liang, and G. Zhu, “Multi-view graph Amin Beheshti holds B.S. (1st Hons.) and M.S. de-
attention network for travel recommendation,” Expert Systems with grees (1st Hons.) in computer science and engineer-
Applications, vol. 191, p. 116234, 2022. ing, and a Ph.D. in computer science from UNSW
[97] Z. Li, Y. Zhao, Y. Zhang, and Z. Zhang, “Multi-relational graph at- Sydney, Australia. Amin is a Full Professor of data
tention networks for knowledge graph completion,” Knowledge-Based science at Macquarie University. He is currently
Systems, vol. 251, p. 109262, 2022. the Director of the Centre for Applied Artificial
[98] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by Intelligence and the Head of the Data Science Re-
jointly learning to align and translate,” arXiv preprint arXiv:1409.0473, search Laboratory, School of Computing, Macquarie
2014. University. He is a leading author of several authored
[99] P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Lio, Y. Bengio books in data, social, and process analytics, co-
et al., “Graph attention networks,” stat, vol. 1050, no. 20, pp. 10– authored with other high-profile researchers.
48 550, 2017.
[100] N. Mingshuo, C. Dongming, and W. Dongqi, “Reinforcement learning
on graph: A survey,” arXiv preprint arXiv:2204.06127, 2022. Quan Z. Sheng received his Ph.D. degree in
[101] R. Bellman and R. E. Kalaba, Selected papers on mathematical trends computer science from the University of New South
in control theory. Dover Publications, 1964. Wales, Sydney, NSW, Australia. He is currently a
[102] M. Nie, D. Chen, and D. Wang, “Reinforcement learning on graphs: full Professor and Head of the School of Comput-
A survey,” IEEE Transactions on Emerging Topics in Computational ing, at Macquarie University, Sydney. His research
Intelligence, 2023. interests include big data analytics, service-oriented
[103] Z. Yan, J. Ge, Y. Wu, L. Li, and T. Li, “Automatic virtual network em- computing, and the Internet of Things. Microsoft
bedding: A deep reinforcement learning approach with graph convolu- Academic ranked Prof. Michael Sheng as one of
tional networks,” IEEE Journal on Selected Areas in Communications, the Most Impactful Authors in Services Computing
vol. 38, no. 6, pp. 1040–1057, 2020. (ranked Top 5 All-Time) and in the Web of Things
[104] M. Wu, Q. Zhang, Y. Gao, and N. Li, “Graph signal sampling (ranked Top 20 All-Time).
with deep q-learning,” in 2020 International Conference on Computer
Information and Big Data Applications (CIBDA), 2020, pp. 450–453. Jin Foo is the Staff Data Scientist at Prospa,
[105] B. Wu, X. Liang, X. Zheng, J. Wang, and X. Zhou, “Reinforced sample Australia’s top online lender to small businesses and
selection for graph neural networks transfer learning,” in 2022 IEEE was previously Data Science Lead at Woolworths
International Conference on Bioinformatics and Biomedicine (BIBM), Group; specialising in identity resolution, hyper-
2022, pp. 1281–1288. personalised offers, propensity modelling, sequential
[106] S. E. Amiri, B. Adhikari, A. Bharadwaj, and B. A. Prakash, “Netgist: bin packing and time-series analysis. Jin is cur-
Learning to generate task-based network summaries,” in IEEE Interna- rently a 2nd year Master of Research student at
tional Conference on Data Mining (ICDM). IEEE, 2018, pp. 857–862. the School of Computing, Macquarie University,
[107] S. E. Amiri, B. Adhikari, J. Wenskovitch, A. Rodriguez, M. Dowling, Sydney, Australia. His research focuses on anomaly
C. North, and B. A. Prakash, “Netreact: Interactive learning for network detection with word embeddings, hashing algorithms
summarization,” arXiv preprint arXiv:2012.11821, 2020. and graph networks.
[108] R. Wickman, X. Zhang, and W. Li, “Sparrl: Graph sparsification via
deep reinforcement learning,” arXiv preprint arXiv:2112.01565, 2021. Venus Haghighi is currently a Ph.D. student
[109] X. Zhao, J. Wu, H. Peng, A. Beheshti, J. J. Monaghan, D. McAlpine, in computer science at the School of Computing,
H. Hernandez-Perez, M. Dras, Q. Dai, Y. Li et al., “Deep reinforcement Macquarie University, Sydney, NSW, Australia. The
learning guided graph neural networks for brain network analysis,” focus of her research is to enhance classic GNN
Neural Networks, vol. 154, pp. 56–67, 2022. models and explore robust graph learning paradigms
[110] N. Goyal and D. Steiner, “Graph neural networks for image classifi- to detect and mitigate the camouflage behavior of
cation and reinforcement learning using graph representations,” arXiv malicious actors in both static and dynamic net-
preprint arXiv:2203.03457, 2022. works. Her research interests include graph-based
[111] B. Yan, K. Janowicz, G. Mai, and R. Zhu, “A spatially explicit anomaly detection, graph neural networks, graph-
reinforcement learning model for geographic knowledge graph sum- based fraud detection, and graph data mining.
marization,” Transactions in GIS, vol. 23, no. 3, pp. 620–640, 2019.
[112] S. Pan, R. Hu, G. Long, J. Jiang, L. Yao, and C. Zhang, “Adversarially
regularized graph autoencoder for graph embedding,” arXiv preprint Ambreen Hanif is a 2nd year Ph.D. student at the
arXiv:1802.04407, 2018. School of Computing, Macquarie University NSW,
Australia. After completing her Master’s degree,
Nasrin Shabani received a Master of Research she decided to continue her research as a Ph.D.
degree in Computer Science with First Class Hon- candidate. Her research interests lie in the field of
ours from the Macquarie University, Sydney, NSW, Explainable Artificial Intelligence (XAI) for Deep
Australia. She is currently pursuing a Ph.D. in Com- Neural Networks, Data Provenance, and Storytelling
puter Science at the same institution. Her research with XAI. Specifically, she aims to develop novel
interests lie at the intersection of graph mining, methods to enhance the interpretability and trans-
graph summarization, and deep learning. Through parency of deep neural networks.
her work, she aims to develop novel algorithms and
techniques that can extract meaningful insights and
patterns from complex graph data structures. Maryam Shahabikargar received her Master of
Research degree in computer science from the Mac-
quarie University, Sydney, Australia. After complet-
Jia Wu (Senior Member, IEEE) is currently the ing her Master’s degree, she decided to continue her
Research Director of the Centre for Applied Artifi- research as a Ph.D. candidate in computer science at
cial Intelligence and the Director of Higher Degree Macquarie University. Her current research interests
Research in the School of Computing at Macquarie include not only NLP and graph embeddings but also
University, Sydney, Australia. Dr Wu received his link prediction and anomaly detection. She aims to
Ph.D. degree in computer science from the Univer- develop a model that combines her research interests
sity of Technology Sydney, Australia. His current to tackle a specific problem in the field of finance.
research interests include data mining and machine
learning. Since 2009, he has published 100+ refereed
journal and conference papers, including TKDE,
TKDD, KDD, ICDM, WWW, and NeurIPS.

OmniStudio Documentation
100% (8)
OmniStudio Documentation
866 pages
GEMweb Plus 500 User Guide - 5.2.0
No ratings yet
GEMweb Plus 500 User Guide - 5.2.0
100 pages
FM Modelling With Algorithms
No ratings yet
FM Modelling With Algorithms
117 pages
Department Management System
63% (8)
Department Management System
33 pages
2022 - Chuan Shi, Xiao Wang, Cheng Yang - Advances in Graph Neural Networks-Springer
No ratings yet
2022 - Chuan Shi, Xiao Wang, Cheng Yang - Advances in Graph Neural Networks-Springer
207 pages
A Gentle Introduction To Graph Neural Network
100% (1)
A Gentle Introduction To Graph Neural Network
122 pages
11 AttackingCommonServices
100% (1)
11 AttackingCommonServices
110 pages
Online Agriculture Products Store: COEPD - Traditional Development - 100 Marks - Pass 95% Instructions To Follow
50% (6)
Online Agriculture Products Store: COEPD - Traditional Development - 100 Marks - Pass 95% Instructions To Follow
8 pages
Sim Et Al. - 2024 - Learning To Approximate Adaptive Kernel Convolution On Graphs
No ratings yet
Sim Et Al. - 2024 - Learning To Approximate Adaptive Kernel Convolution On Graphs
9 pages
GraphX & Graph Analytics
No ratings yet
GraphX & Graph Analytics
61 pages
Graph Based Clustering
No ratings yet
Graph Based Clustering
78 pages
Graph Neural Networks Methods Applications and Opp
No ratings yet
Graph Neural Networks Methods Applications and Opp
35 pages
2 Attention Based Graph Summarization For Large - Compressed
No ratings yet
2 Attention Based Graph Summarization For Large - Compressed
12 pages
Unit III GNN
No ratings yet
Unit III GNN
56 pages
A Comprehensive Survey On Deep Graph Representation Learning
No ratings yet
A Comprehensive Survey On Deep Graph Representation Learning
85 pages
Learning Methods
No ratings yet
Learning Methods
70 pages
Graph Based Data Science
No ratings yet
Graph Based Data Science
37 pages
Pascal Triangle
No ratings yet
Pascal Triangle
39 pages
Graph Learning A Survey
No ratings yet
Graph Learning A Survey
19 pages
Graph Convolutional Networks: A Comprehensive Review: Open Access Research
No ratings yet
Graph Convolutional Networks: A Comprehensive Review: Open Access Research
23 pages
Unit I Graph Theory and Concepts
No ratings yet
Unit I Graph Theory and Concepts
35 pages
2 Attention Based Graph Summarization For Large - Compressed
No ratings yet
2 Attention Based Graph Summarization For Large - Compressed
16 pages
Self Attention Graph Pooling
No ratings yet
Self Attention Graph Pooling
10 pages
Utility-Based Graph Summarization New and Improved - Hajiabadi
No ratings yet
Utility-Based Graph Summarization New and Improved - Hajiabadi
13 pages
A Survey On Network Embedding
No ratings yet
A Survey On Network Embedding
21 pages
SG Sketch
No ratings yet
SG Sketch
14 pages
Lec 32
No ratings yet
Lec 32
25 pages
Bacciu 2020
No ratings yet
Bacciu 2020
62 pages
Graph Learning A Survey
No ratings yet
Graph Learning A Survey
19 pages
A Survey On Network Embedding
No ratings yet
A Survey On Network Embedding
20 pages
GNN Foundations Frontiers and Applications Chapter6
No ratings yet
GNN Foundations Frontiers and Applications Chapter6
21 pages
GRL Unit 3
No ratings yet
GRL Unit 3
14 pages
A Survey of Graph Neural Networks in Various Learning Paradigms Methods, Applications, and Challenges
No ratings yet
A Survey of Graph Neural Networks in Various Learning Paradigms Methods, Applications, and Challenges
70 pages
Knowledge Graphs
No ratings yet
Knowledge Graphs
37 pages
Module Visualization
No ratings yet
Module Visualization
11 pages
2020 - William L. Hamilton - Graph Representation Learning-Morgan & Claypool
No ratings yet
2020 - William L. Hamilton - Graph Representation Learning-Morgan & Claypool
161 pages
29256-Article Text-33310-1-2-20240324
No ratings yet
29256-Article Text-33310-1-2-20240324
9 pages
Efficient Semantic Summary Graphs For - 2022 - International Journal of Informa
No ratings yet
Efficient Semantic Summary Graphs For - 2022 - International Journal of Informa
18 pages
Defence Transcription
No ratings yet
Defence Transcription
4 pages
Improving Graph Neural Network Expressivity Via Subgraph Isomorphism Counting
No ratings yet
Improving Graph Neural Network Expressivity Via Subgraph Isomorphism Counting
12 pages
Graph Representation Learning
No ratings yet
Graph Representation Learning
141 pages
Deep Learning On Graphs: A Survey: Ziwei Zhang, Peng Cui and Wenwu Zhu, Fellow, IEEE
No ratings yet
Deep Learning On Graphs: A Survey: Ziwei Zhang, Peng Cui and Wenwu Zhu, Fellow, IEEE
24 pages
Machine Learning On Graphs: A Model and Comprehensive Taxonomy
No ratings yet
Machine Learning On Graphs: A Model and Comprehensive Taxonomy
49 pages
GNN Foundations Frontiers and Applications Chapter2
No ratings yet
GNN Foundations Frontiers and Applications Chapter2
10 pages
Adversarial Examples On Graph Data: Deep Insights Into Attack and Defense
No ratings yet
Adversarial Examples On Graph Data: Deep Insights Into Attack and Defense
8 pages
Hierarchical Graph Pooling With Structure Learning
No ratings yet
Hierarchical Graph Pooling With Structure Learning
9 pages
Original GNN
No ratings yet
Original GNN
22 pages
Automated Unsupervised Graph Representation Learning
No ratings yet
Automated Unsupervised Graph Representation Learning
14 pages
The Graph Neural Network Model
No ratings yet
The Graph Neural Network Model
20 pages
23 - AAAI - Substructure Aware Graph Neural Networks
No ratings yet
23 - AAAI - Substructure Aware Graph Neural Networks
9 pages
Paper1 VENUS - Vertex-Centric Streamlined Graph Computation On A Single PC
No ratings yet
Paper1 VENUS - Vertex-Centric Streamlined Graph Computation On A Single PC
12 pages
Paper Graph Mining
No ratings yet
Paper Graph Mining
8 pages
Any Graph
No ratings yet
Any Graph
11 pages
Graph Mining Handout
No ratings yet
Graph Mining Handout
7 pages
Yang 20 A
No ratings yet
Yang 20 A
16 pages
Enadpool: The Edge-Node Attention-Based Differentiable Pooling For Graph Neural Networks
No ratings yet
Enadpool: The Edge-Node Attention-Based Differentiable Pooling For Graph Neural Networks
9 pages
Graph Neural Networks: A Review of Methods and Applications
No ratings yet
Graph Neural Networks: A Review of Methods and Applications
22 pages
Books Doubtnut Question Bank
No ratings yet
Books Doubtnut Question Bank
64 pages
GNNS
No ratings yet
GNNS
7 pages
Graph Visualization and Navigation in Information Visualization: A Survey
No ratings yet
Graph Visualization and Navigation in Information Visualization: A Survey
20 pages
Representation Learning On Graphs: Methods and Applications
No ratings yet
Representation Learning On Graphs: Methods and Applications
23 pages
Toyota Prius Display Information
No ratings yet
Toyota Prius Display Information
18 pages
Graph Summarization
No ratings yet
Graph Summarization
13 pages
Local Graph Sparsi Cation For Scalable Clustering
No ratings yet
Local Graph Sparsi Cation For Scalable Clustering
12 pages
SQL Interview Questions
No ratings yet
SQL Interview Questions
20 pages
Consumer Behavior: Submitted To: Prof. Aradhita Deb
No ratings yet
Consumer Behavior: Submitted To: Prof. Aradhita Deb
16 pages
Graph Neural Networks: A Review of Methods and Applications
No ratings yet
Graph Neural Networks: A Review of Methods and Applications
20 pages
Servlet
No ratings yet
Servlet
36 pages
Transfer Learning For Deep Learning On Graph-Structured Data
No ratings yet
Transfer Learning For Deep Learning On Graph-Structured Data
7 pages
1.3 Describing Distributions With Numbers
No ratings yet
1.3 Describing Distributions With Numbers
45 pages
Player Prev - Log
No ratings yet
Player Prev - Log
8 pages
ICS Certification Roadmap
No ratings yet
ICS Certification Roadmap
4 pages
3.5.7 Lab - Create A Python Unit Test - ILM
No ratings yet
3.5.7 Lab - Create A Python Unit Test - ILM
9 pages
SQL
No ratings yet
SQL
5 pages
Lecture.2.21-22-Network Models.
No ratings yet
Lecture.2.21-22-Network Models.
30 pages
Netgear C7000v2 QSG EN
No ratings yet
Netgear C7000v2 QSG EN
18 pages
NOC Video Walls Solutions - 4!10!2021 Rالبريد1
No ratings yet
NOC Video Walls Solutions - 4!10!2021 Rالبريد1
2 pages
IR Case Study Final Presentation
No ratings yet
IR Case Study Final Presentation
12 pages
Brain Heaters Questions and Answers
No ratings yet
Brain Heaters Questions and Answers
15 pages
SIP Diversion Header
No ratings yet
SIP Diversion Header
7 pages
Single-Page Applications With BootsFaces
No ratings yet
Single-Page Applications With BootsFaces
10 pages
END Returns Form
No ratings yet
END Returns Form
1 page
Lab 2
No ratings yet
Lab 2
5 pages
AWS-842TPB: Workstation With 10.4" LCD and PCI Backplane
No ratings yet
AWS-842TPB: Workstation With 10.4" LCD and PCI Backplane
2 pages
Swe2021 Software-Configuration-Management TH 1.0 47 Swe2021
No ratings yet
Swe2021 Software-Configuration-Management TH 1.0 47 Swe2021
2 pages
AS Maths Pure Checklist
No ratings yet
AS Maths Pure Checklist
2 pages
Reaction Summary: Sls - Support Reaction For Foundation Design
No ratings yet
Reaction Summary: Sls - Support Reaction For Foundation Design
1 page
JanusGraph Essentials: Definitive Reference for Developers and Engineers
From Everand
JanusGraph Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Graph Data Modeling and Analytics with Neo4j: Definitive Reference for Developers and Engineers
From Everand
Graph Data Modeling and Analytics with Neo4j: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Comprehensive Guide to Glue for Scientific Data Exploration: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to Glue for Scientific Data Exploration: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet

A Comprehensive Survey On Graph Summarization With Graph Neural Networks - Shabani

Uploaded by

A Comprehensive Survey On Graph Summarization With Graph Neural Networks - Shabani

Uploaded by

IEEE TRANSACTIONS ON ARTIFICIAL INTELLIGENCE, VOL. 00, NO.

A Comprehensive Survey on Graph Summarization

In this survey, we review developments in graph sum-

• The first deep GNN-based graph summarization sur-

Fig. 2. A timeline of graph summarization and reviewed techniques.

RecGNN ConvGNN GAT GAE

Fig. 3. GNN Architechtures [34], [35].

Model Performance Training

otv = σ(Wo [ht−1 t

Model Performance Training

  of aggregation. For example, the mean aggregation can be

Model Performance Training

Model Performance Training

Model Performance Training

A. Datasets C. Open Source Implementations

of graph data and the early stage of deep learning techniques

Model Platform Dataset Metrics (%) Hyperparameters Repo.

Model Platform Dataset Metrics (%) Hyperparameters Repo.

Tensorflow, Cora (S) 81.81±0.00 85.42±0.00 Regularization, Architectures, Down-Proj, LINK,

You might also like