0% found this document useful (0 votes)
48 views

GraphSigProc Part I v18 NowFnT

Uploaded by

ljubisa4840
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views

GraphSigProc Part I v18 NowFnT

Uploaded by

ljubisa4840
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Data Analytics on Graphs – Part I: Graphs and Spectra on Graphs

Ljubiša Stankovića , Danilo Mandicb , Miloš Dakovića , Miloš Brajovića , Bruno Scalzob , Anthony G. Constantinidesb
a University of Montenegro, Podgorica, Montenegro
b Imperial College London, London, United Kingdom

Abstract
The area of Data Analytics on graphs promises a paradigm shift, as we approach information processing of new classes of
data which are typically acquired on irregular but structured domains (social networks, various ad-hoc sensor networks).
Yet, despite the long history of Graph Theory, current approaches mostly focus on the optimization of graphs themselves,
rather than on directly inferring learning strategies, such as detection, estimation, statistical and probabilistic inference,
clustering and separation from signals and data acquired on graphs. To fill this void, we first revisit graph topologies
from a Data Analytics point of view, to establish a taxonomy of graph networks through a linear algebraic formalism
of graph topology (vertices, connections, directivity). This serves as a basis for spectral analysis of graphs, whereby
the eigenvalues and eigenvectors of graph Laplacian and adjacency matrices are shown to convey physical meaning
related to both graph topology and higher-order graph properties, such as cuts, walks, paths, and neighborhoods.
Through a number of carefully chosen examples, we demonstrate that the isomorphic nature of graphs enables both
the basic properties of data observed on graphs and their descriptors (features) to be preserved throughout the data
analytics process, even in the case of reordering of graph vertices, where classical approaches fail. Next, to illustrate
the richness and flexibility of estimation strategies performed on graph signals, spectral analysis of graphs is introduced
through eigenanalysis of mathematical descriptors of graphs and in a generic way. Finally, benefiting from enhanced
degrees of freedom associated with graph representations, a framework for vertex clustering and graph segmentation is
established based on graph spectral representation (eigenanalysis) which demonstrates the power of graphs in various
data association tasks, from image clustering and segmentation trough to low-dimensional manifold representation. The
supporting examples demonstrate the promise of Graph Data Analytics in modeling structural and functional/semantic
inferences. At the same time, Part I serves as a basis for Part II and Part III which deal with theory, methods and
applications of processing Data on Graphs and Graph Topology Learning from data.

Contents 3.2.2 Decomposition of graph product ad-


jacency matrices . . . . . . . . . . . 13
1 Introduction 2 3.2.3 Decomposition of matrix powers and
polynomials . . . . . . . . . . . . . . 14
2 Graph Definitions and Properties 3 3.3 Eigenvalue Decomposition of the graph Lapla-
2.1 Basic Definitions . . . . . . . . . . . . . . . 3 cian . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Some Frequently Used Graph Topologies . . 5 3.3.1 Properties of Laplacian eigenvalue de-
2.3 Properties of Graphs and Associated Matrices 7 composition . . . . . . . . . . . . . . 15
3.3.2 Fourier analysis as a special case of
3 Spectral Decomposition of Graph Matrices 10
the Laplacian spectrum . . . . . . . 17
3.1 Eigenvalue Decomposition of the Adjacency
Matrix . . . . . . . . . . . . . . . . . . . . . 10 4 Vertex Clustering and Mapping 18
3.1.1 Properties of the characteristic and 4.1 Clustering based on graph topology . . . . . 18
minimal polynomial . . . . . . . . . 11 4.1.1 Minimum graph cut . . . . . . . . . 18
3.2 Spectral Graph Theory . . . . . . . . . . . . 11 4.1.2 Maximum-flow minimum-cut approach 19
3.2.1 The DFT basis functions as a special 4.1.3 Normalized (ratio) minimum cut . . 20
case of eigenvectors of the adjacency 4.1.4 Volume normalized minimum cut . . 21
matrix . . . . . . . . . . . . . . . . 13 4.1.5 Other forms of the normalized cut . 21
4.2 Spectral methods for graph clustering . . . 21
Email addresses: [email protected] (Ljubiša Stanković), 4.2.1 Smoothness of Eigenvectors on Graphs 22
[email protected] (Danilo Mandic), [email protected] 4.2.2 Spectral Space and Spectral Similar-
(Miloš Daković), [email protected] (Miloš Brajović), ity of Vertices . . . . . . . . . . . . . 22
[email protected] (Bruno Scalzo),
[email protected] (Anthony G. Constantinides)
4.2.3 Indicator vector . . . . . . . . . . . . 24

Preprint submitted to Elsevier September 9, 2022


4.2.4 Bounds on the minimum cut . . . . 27 in a space or a transform domain. Examples include mod-
4.2.5 Indicator vector for normalized graph ern Data Analytics for e.g. social network modeling or
Laplacian . . . . . . . . . . . . . . . 28 in smart grid – data domains which are typically irreg-
4.3 Spectral clustering implementation . . . . . 28 ular and, in some cases, not even related to the notions
4.3.1 Clustering based on only one (Fiedler) of time or space, where ideally, the data sensing domain
eigenvector . . . . . . . . . . . . . . 28 should also reflect domain-specific properties of the consid-
4.3.2 “Closeness”of the segmented and orig- ered system/network; for example, in social or web related
inal graphs . . . . . . . . . . . . . . 30 networks, the sensing points and their connectivity may
4.3.3 Clustering based on more than one be related to specific individuals or topics, and their links,
eigenvector . . . . . . . . . . . . . . 30 whereby the processing on irregular domains requires the
4.4 Vertex Dimensionality Reduction Using the consideration of data properties other than time or space
Laplacian Eigenmaps . . . . . . . . . . . . . 34 relationships. In addition, even for the data sensed in well
4.4.1 Euclidean distances in the space of defined time and space domains, the new contextual and
spectral vectors. . . . . . . . . . . . 35 semantic-related relations between the sensing points, in-
4.4.2 Examples of graph analysis in the troduced through graphs, promise to equip problem def-
spectral space . . . . . . . . . . . . . 37 inition with physical relevance, and consequently provide
4.5 Pseudo-inverse of Graph Laplacian-Based Map- new insights into analysis and enhanced data processing
pings . . . . . . . . . . . . . . . . . . . . . . 39 results.
4.5.1 Commute time mapping . . . . . . . 40 In applications which admit the definition of the data
4.5.2 Diffusion (Random Walk) Mapping . 43 domain as a graph (social networks, power grids, vehicu-
4.6 Summary of Embedding Mappings . . . . . 45 lar networks, brain connectivity), the role of classic tem-
poral/spatial sampling points is assumed by graph ver-
5 Graph Sampling Strategies 45 tices – the nodes – where the data values are observed,
while the edges between vertices designate the existence
6 Conclusion 46 and nature of vertex connections (directionality, strength).
In this way, graphs are perfectly well equipped to exploit
7 Appendix: Power Method for Eigenanalysis 46
the fundamental relations among both the measured data
8 Appendix: Algorithm for Graph Laplacian and the underlying graph topology; this inherent abil-
Eigenmaps 47 ity to incorporate physically relevant data properties has
made GSP and GML key technologies in the emerging
field of Big Data Analytics (BDA). Indeed, in applica-
1. Introduction tions defined on irregular data domains, Graph Data An-
alytics (GDA) has been proven to offer a quantum step
Graph signal processing is a multidisciplinary research
forward from the classical time (or space) series analyses
area, of which the roots can be traced back to the 1970s [1,
[27, 28, 29, 30, 31, 32, 33, 34, 35], including the following
2, 3], but which has witnessed a rapid resurgence. The re- aspects
cent developments, in response to the requirements posed
by radically new classes of data sources, typically embark • Graph-based data processing approaches can be ap-
upon the classical results on “static” graph topology opti- plied not only to technological, biological, and social
mization, to treat graphs as irregular data domains, which networks, but also they can lead to both improve-
make it possible to address completely new paradigms of ments of the existing and even to the creation of
“information processing on graphs”and “signal processing radically new methods in classical signal processing
on graphs”. This has already resulted in advanced and and machine learning [36, 37, 38, 39, 40, 41, 42, 43].
physically meaningful solutions in manifold applications
• The involvement of graphs makes it possible for the
[4, 5, 6, 7, 8]. For example, while the emerging areas of
classical sensing domains of time and space (which
Graph Machine Learning (GML) and Graph Signal Pro-
may be represented as a linear or circular graph)
cessing (GSP) do comprise the classic methods of opti-
to be structured in a more advanced way, e.g., by
mization of graphs themselves [9, 10, 11, 12, 13, 14, 15],
considering the connectivity of sensing points from a
significant progress has been made towards redefining ba-
signal similarity or sensor association point of view.
sic data analysis paradigms (spectral estimation, proba-
bilistic inference, filtering, dimensionality reduction, clus- The first step in graph data analytics is to decide on
tering, statistical learning), to make them amenable for the properties of the graph as a new signal/information
direct estimation of signals on graphs [16, 17, 18, 19, 20, domain. However, while the data sensing points (graph
21, 22, 23, 24, 25, 26]. Indeed, this is a necessity in nu- vertices) may be well defined by the application itself, that
merous practical scenarios where the signal domain is not is not the case with their connectivity (graph edges), where
designated by equidistant instants in time or a regular grid
• In the case of the various computer, social, road,
transportation and electrical networks, the vertex
2
connectivity is often naturally defined, resulting in V, where the symbol × denotes a direct product operator.
an exact underlying graph topology. Examples of graph topologies with N = 8 vertices, with
• In many other cases, the data domain definition in V = {0, 1, 2, 3, 4, 5, 6, 7}
a graph form becomes part of the problem definition
itself, as is the case with, e.g., graphs for sensor net- are presented in Fig. 1, along with the corresponding
works, in finance or smart cities. In such cases, a edges. The vertices are usually depicted as points (circles)
vertex connectivity scheme needs to be determined and the edges as lines that connect the vertices. More for-
based on the properties of the sensing positions or mally, a line between the vertices m and n indicates the
from the acquired data, as e.g. in the estimation of existence of an edge between vertices m and n, that is,
the temperature field in meteorology [44]. (m, n) ∈ B, so that, for example, the graph from Fig. 1(b)
This additional aspect of the definition of an appropri- can be described as
ate graph structure is of crucial importance for a mean-
V = {0, 1, 2, 3, 4, 5, 6, 7}
ingful and efficient application of the GML and GSP ap-
proaches. B ⊂ {0, 1, 2, 3, 4, 5, 6, 7} × {0, 1, 2, 3, 4, 5, 6, 7}
With that in mind, this monograph was written in re- B = {(0,1),(1,2),(2,0),(2,3),(2,4),(2,7),(3,0),
sponse to the urgent need of multidisciplinary data analyt- (4,1),(4,2),(4,5),(5,7),(6,3),(6,7),(7,2),(7,6)}.
ics communities for a seamless and rigorous transition from
classical data analytics to the corresponding paradigms
which operate directly on irregular graph domains. To this 0 1
end, we start our approach from a review of basic defini-
tions of graphs and their properties, followed by a physical 2
intuition and step-by-step introduction of graph spectral
3
analysis (eigen-analysis). Particular emphasis is on eigen-
decomposition of graph matrices, which serves as a basis 4
for mathematical formalisms in graph signal and informa-
tion processing. As an example of the ability of GML and
GSP to generalize standard methodologies for graphs, we 6 5
7
elaborate upon a step-by-step introduction of Graph Dis-
crete Fourier Transform (GDFT), and show that it simpli- (a)
fies into standard Discrete Fourier Transform (DFT) for 0 1
directed circular graphs; this also exemplifies the generic
nature of graph approaches. Finally, spectral vertex anal- 2
ysis and spectral graph segmentation are elucidated as the
3
basis for the understanding of relations among distinct but
physically meaningful regions in graphs; this is demon- 4
strated on examples of regional infrastructure modeling,
brain connectivity, clustering, and dimensionality reduc-
6 5
tion. 7

(b)
2. Graph Definitions and Properties
Graph theory has been established for almost three Figure 1: Basic graph structures. (a) Undirected graph and (b)
centuries as a branch in mathematics, and has become Directed graph.
a staple methodology in science and engineering areas in-
cluding chemistry, operational research, electrical and civil Regarding the directionality of vertex connections, a
engineering, social networks, and computer sciences. The graph can be undirected and directed, as illustrated re-
beginning of graph theory applications in electrical engi- spectively in Fig. 1(a) and Fig. 1(b).
neering can be traced back to the mid-XIX century with Definition: A graph is undirected if the edge connecting a
the introduction of Kirchoff’s laws. Fast forward two cen- vertex m to a vertex n also connects the vertex n to the
turies or so, the analytics of data acquired on graphs has vertex m, for all m and n.
become a rapidly developing research paradigm in Signal In other words, for an undirected graph, if (n, m) ∈ B
Processing and Machine Learning [4, 5, 6, 7]. then also (m, n) ∈ B, as in the case, for example, with
edges (1, 2) and (2, 1) in Fig. 1(a). For directed graphs,
2.1. Basic Definitions in general, this property does not hold, as shown in Fig.
Definition: A graph G = {V, B} is defined as a set of 1(b). Observe, for example, that the edge (2, 1) does not
vertices, V, which are connected by a set of edges, B ⊂ V × exist, although the edge (1, 2) connects vertices 1 and 2.
3
Therefore, undirected graphs can be considered as a special 0.23
0 1
case of directed graphs. 0.
74
0.
24 0.35
For a given set of vertices and edges, a graph can be 2

0.23
formally represented by its adjacency matrix, A, which 0.26
3 0.2
describes the vertex connectivity; for N vertices A is an 4

N × N matrix. 4

0.32
4
Definition: The elements Amn of the adjacency matrix 0.1

0.5
1
A assume values Amn ∈ {0, 1}. The value Amn = 0 is 6 0.32 0.15 5
assigned if the vertices m and n are not connected with an 7
edge, and Amn = 1 if these vertices are connected, that is
( Figure 2: Example of a weighted graph.
def 1, if (m, n) ∈ B
Amn =
0, if (m, n) ∈
/ B. the original and renumerated graphs, A1 and A2 respec-
Therefore, the respective adjacency matrices, Aun and tively, is straightforwardly defined using an appropriate
Adir , for the undirected and directed graphs from Fig. 1(a) permutation matrix, P, in the form
and (b) are given by
A2 = P A1 PT . (3)

0

0 1 1 1 0 0 0 0
 Recall that each row and in each column of a permutation
1 1 0 1 0 1 0 0 0 matrix has exactly one nonzero element equal to unity.
In general, the edges can also convey information about
 
2 1 1 0 1 1 0 0 0
a relative importance of their connection, through a weighted
 
3 1 0 1 0 0 0 1 0
Aun =  , (1) graph.
4 
 0 1 1 0 0 1 0 1
5 
 0 0 0 0 1 0 0 1 Remark 3: The set of weights, W, corresponds morpho-
6  0 0 0 1 0 0 0 1 logically to the set of edges, B, so that a weighted graph
7 0 0 0 0 1 1 1 0 represents a generic extension of an unweighted graph. It
0 1 2 3 4 5 6 7 is commonly assumed that edge weights are nonnegative
real numbers; therefore, if weight 0 is associated with a
nonexisting edge, then the graph can be described by a
0

0 1 0 0 0 0 0 0
 weight matrix, W, similar to the description by the adja-
1  0 0 1 0 0 0 0 0 cency matrix A.
 
2 
 1 0 0 1 1 0 0 1 Definition: A nonzero element in the weight matrix W,
Adir = 3 
 1 0 0 0 0 0 0 0. (2) Wmn ∈ W, designates both an edge between the vertices m
4 
 0 1 1 0 0 1 0 0 and n and the corresponding weight. The value Wmn = 0
5 
 0 0 0 0 0 0 0 1 indicates no edge connecting the vertices m and n. The
6  0 0 0 1 0 0 0 1 elements of a weight matrix are nonnegative real numbers.
7 0 0 1 0 0 0 1 0 Fig. 2 shows an example of a weighted undirected
Adjacency matrices not only fully reflect the structure graph, with the corresponding weight matrix given by
arising from the topology of data acquisition, but also they
admit the usual feature analysis through linear algebra, 0

0 0.23 0.74 0.24 0 0 0 0

and can be sparse, or exhibit some other interesting and 1 0.23 0 0.35 0 0.23 0 0 0
useful matrix properties.
 
2 0.74 0.35 0 0.26 0.24 0 0 0
 
Remark 1: The adjacency matrix of an undirected graph 3 0.24 0 0.26 0 0 0 0.32 0
W=  .
4  0 0.23 0.24 0 0 0.51 0 0.14
is symmetric, that is,  
5  0 0 0 0 0.51 0 0 0.15
 
A = AT . 6  0 0 0 0.32 0 0 0 0.32
7 0 0 0 0 0.14 0.15 0.32 0
0 1 2 3 4 5 6 7
Since a graph is fully determined by its adjacency ma- (4)
trix, defined over a given set of vertices, any change in In this sense, the adjacency matrix, A, can be consid-
vertex ordering will cause the corresponding changes in ered as a special case of the weight matrix, W, whereby
the adjacency matrix. all nonzero weights are equal to unity. It then follows that
the weight matrix of undirected graphs is also symmetric
Remark 2: Observe that a vertex indexing scheme does
not change the graph itself (graphs are isomorphic do- W = WT , (5)
mains), so that the relation between adjacency matrices of
4
while, in general, for directed graphs this property does For practical reasons, it is often advantageous to use
not hold. the normalized Laplacian, defined as
Definition: A degree matrix, D, for an undirected graph def
is a diagonal matrix with elements, Dmm , which are equal LN = D−1/2 (D − W)D−1/2 = I − D−1/2 WD−1/2 . (9)
to the sum of weights of all edges connected to the vertex
m, that is, the sum of elements in its m-th row Remark 5: For undirected graphs, the normalized Lapla-
cian matrix is symmetric, and has all diagonal values equal
def
N
X −1 to 1, with its trace equal to the number of vertices N .
Dmm = Wmn . Other interesting properties, obtained through Lapla-
n=0
cian normalization, shall be described later in the various
Remark 4: For an unweighted and undirected graph, the application contexts.
value of the element Dmm is equal to the number of edges One more form of the graph Laplacian is the so called
connected to the m-th vertex. random-walk Laplacian, defined as
Vertex degree centrality. The degree centrality of a def
LRW = D−1 L = I − D−1 W. (10)
vertex is defined as the number of vertices connected to
the considered vertex with a single edge, and in this way The random-walk graph Laplacian is rarely used, since
it models the importance of a given vertex. For undirected it has lost the symmetry property of the original graph
and unweighted graphs, the vertex degree centrality of a Laplacian for undirected graphs, LRW 6= LTRW .
vertex is equal to the element, Dmm , of the degree matrix. Vertex-weighted graphs. Most of the applications of
Example 1: For the undirected weighted graph from Fig. 2, graph theory are based on edge-weighted graphs, where
the degree matrix is given by edge-weighting is designated by the weight matrix, W.
Note that weighting can be also introduced into graphs
0

1.21 0 0 0 0 0 0 0
 based on vertex-weighted approaches (although rather rarely),
1  0 0.81 0 0 0 0 0 0  whereby a weight is assigned to each vertex of a graph.
To this end, we can use a diagonal matrix, V, to define
 
2  0 0 1.59 0 0 0 0 0 
 
3  0 0 0 0.82 0 0 0 0 
. the vertex weights vi , i = 0, 1, . . . , N − 1, with one pos-
D=  (6)
4  0
 0 0 0 1.12 0 0 0  sible (the Chung/Langlands, [45]) version of the vertex-
5  0 0 0 0 0 0.66 0 0  weighted graph Laplacian, given by
 
6  0 0 0 0 0 0 0.64 0 
7 0 0 0 0 0 0 0 0.61 def
LV = V1/2 LV1/2 . (11)
0 1 2 3 4 5 6 7
Observe that for V = D−1 , the vertex-weighted graph
Another important descriptor of graph connectivity is
Laplacian in (11) reduces to the standard edge-weighted
the graph Laplacian matrix, L, which combines the weight
normalized graph Laplacian in (9).
matrix and the degree matrix.
Definition: The graph Laplacian matrix is defined as 2.2. Some Frequently Used Graph Topologies
def
When dealing with graphs, it is useful to introduce a
L = D − W, (7) taxonomy of graph topologies, as follows.
where W is the weight matrix P and D the diagonal degree 1. Complete graph. A graph is complete if there ex-
matrix with elements Dmm = n Wmn . The elements of ists an edge between every pair of its vertices. There-
a Laplacian matrix are therefore nonnegative real numbers fore, the adjacency matrix of a complete graph has
at the diagonal positions, and nonpositive real numbers at elements Amn = 1 for all m 6= n, and Amm = 0, that
the off-diagonal positions. is, no self-connections are present. Fig. 3(a) gives an
For an undirected graph, the Laplacian matrix is sym- example of a complete graph.
metric, taht is, L = LT . For example, the graph Laplacian
for the weighted graph from Fig. 2 is given by
  2. Bipartite graph. A graph for which the vertices, V,
1.21 −0.23 −0.74 −0.24 0 0 0 0 can be partitioned into two disjoint subsets, E and H,
−0.23 0.81 −0.35 0 −0.23 0 0 0

−0.74 −0.35 1.59 −0.26 −0.24
 whereby V = E ∪H and E ∩H = ∅, such that there are
0 0 0

−0.24
 no edges between the vertices within the same subset
0 −0.26 0.82 0 0 −0.32 0
L=
 0 −0.23 −0.24 0 1.12 −0.51
.
0 −0.14 E or H, is referred to as a bipartite graph. Fig. 3(b)

 0 0 0 0 −0.51 0.66

0 −0.15 gives an example of a bipartite undirected graph with
E = {0, 1, 2} and H = {3, 4, 5, 6}, whereby all edges
 
 0 0 0 −0.32 0 0 0.64 −0.32
0 0 0 0 −0.14 −0.15 −0.32 0.61 designate only connections between the sets E and H.
(8) Observe also that the graph in Fig. 3(b) is a complete
5
bipartite graph, since all possible edges between the
0 1 sets E and H are present.
3
For convenience of mathematical formalism, if vertex
0
7 2 ordering is performed in a such way that all vertices
4 belonging to E are indexed before the vertices be-
1 longing to H, then the resulting adjacency matrix
5 can be written in a block form
6 3
2
 
0 AEH
6
A= , (12)
AHE 0
5 4
where the submatrices AEH and AHE define the re-
(a) Complete graph (b) Bipartite graph
spective connections between the vertices belonging
to the disjoint sets E and H. Observe that for an
0 1 undirected bipartite graph, AEH = ATHE . Bipartite
1 graphs are also referred to as Kuratowski graphs,
7 2
denoted by KNE ,NH , where NE and NH are the re-
7 2 spective numbers of vertices in the sets E and H.
It is important to mention that a complete bipar-
0
6 3
tite graph with three vertices in each of the sets, H
6 3
and E, is referred to as the first Kuratowski graph,
denoted by K3,3 , which may be used to define con-
5 4 ditions for a graph to be planar (more detail is given
5 4 in the sequel).
(c) Regular graph (d) Star graph Multipartite graph. A generalization of the con-
cept of bipartite graph is a multipartite (M -partite)
graph for which the vertices are partitioned into M
0
0 1 subsets, whereby each edge connects only vertices
that belong to one of M different subsets.
1
7 2
3. Regular graph. An unweighted graph is said to be
2
regular (or J -regular) if all its vertices exhibit the
same degree of connectivity, J , which is defined as
6 3
3 the number of edges connected to each vertex is J .
An example of a regular graph with J = 4 is given
5 4
4
in Fig. 3(c). From (7) and (9), the Laplacian and
the normalized Laplacian of a J -regular graph are
(e) Circular graph (f) Path graph
1
L=J I−A and LN = I − A. (13)
0 J
0 1

1
7 2 4. Planar graph. A graph that can be drawn on a
two-dimensional plane without the crossing of any
2 of its edges is called planar.
6 3 For example, if the edges (0, 2), (2, 4), (4, 6), and
3 (6, 0) in the regular graph from Fig. 3(c) are plotted
as arches outside the circle defined by the vertices, all
5 4 instances of edge crossing will be avoided and such
4
(g) Directed circular graph (h) Directed path graph graph presentation will be planar.

Figure 3: Typical graph topologies. (a) Complete graph with 8 ver-


5. Star graph. This type of graph has one central
tices. (b) Complete bipartite graph. (c) Regular graph whereby each
vertex is connected to 4 vertices. (d) Star graph. (e) Circular graph. vertex that is connected to all other vertices, with
(f) Path graph. (g) Directed circular graph. (h) Directed path graph. no other edges present. An example of star graph is
given in Fig. 3(d). Observe that a star graph can be

6
considered as a special case of a complete bipartite 2.3. Properties of Graphs and Associated Matrices
graph, with only one vertex in the first set, E. The The notions from graph analysis that are most relevant
vertex degree centrality for the central vertex of a to the processing of data on graphs are:
star graph with N vertices is therefore N − 1.
M1 : Symmetry: For an undirected graph, the matrices
A, W, and L are all symmetric.
6. Circular (ring) graph. A graph is said to be cir-
cular if the degree of its every vertex is J = 2. This M2 : A walk between a vertex m and a vertex n is a con-
graph is also a regular graph with J = 2. An ex- nected sequence of edges and vertices that begins at
ample of a circular graph with 8 vertices is given in the vertex m and ends at the vertex n. Edges and
Fig. 3(e). vertices can be included in a walk more than once-
There is also more than one walk between vertices
m and n.
7. Path graph. A series of connected vertices defines
a path graph, whereby the first and the last vertex The length of a walk is equal to the number of in-
are of connectivity degree J = 1, while all other cluded edges in unweighted graphs. The number of
vertices are of the connectivity degree J = 2. An walks of length K, between a vertex m and a ver-
tex n, is equal to the value of the mn-th element of
example of a path graph with 5 vertices is presented
in Fig. 3(f). the matrix AK , which can be proved through math-
ematical induction, as follows [46]:
(i) The elements, Amn , of the adjacency matrix A,
8. Directed circular graph. A directed graph is said by definition, indicate the existence of a walk of
to be circular if each vertex is related to only one length K = 1 (an edge, in this case) between the
predecessor vertex and only one successor vertex. An vertices m and n in a graph;
example of a circular directed graph with 8 vertices
is given in Fig. 3(g), with the adjacency matrix (ii) Assume that the elements of matrix AK−1 are
equal to the number of walks of length K−1, between
two arbitrary vertices m and n;
 
0 0 0 0 0 0 0 0 1 (iii) The number of walks of length K between two

 1 1 0 0 0 0 0 0 0 vertices, m and n, is then equal to the number of

 2 0 1 0 0 0 0 0 0 all walks of length K − 1, between the vertex m and
 3 0 0 1 0 0 0 0 0 an intermediate vertex s, s ∈ V, which itself is indi-
A=  . (14)

 4 0 0 0 1 0 0 0 0 cated by the element at the position ms of the matrix
5 
 0 0 0 0 1 0 0 0 AK−1 , according to (ii), for all s for which there is
6  0 0 0 0 0 1 0 0 an edge from vertex s to the destination vertex n. If
7 0 0 0 0 0 0 1 0 an edge between the intermediate vertex s and the
0 1 2 3 4 5 6 7 final vertex n exists, then Asn = 1. This means that
the number of walks of length K between the ver-
Remark 6: The adjacency matrix of any directed tices m and n is obtained as the inner product of the
or undirected circular graph is a circulant matrix. m-th row of AK−1 with the n-th column in A, to
yield the element mn of matrix AK−1 A = AK .
Example 2: Consider the vertex 0 and the vertex 4 in
9. Directed path graph. A directed path graph con- the graph from Fig. 4, and only the walks of length K =
sists of a series of vertices connected in only one di- 2. The adjacency matrix for this graph is given in (1).
rection, whereby the first and the last vertex do not There are two such walks (0 → 1 → 4 and 0 → 2 → 4),
have a respective predecessor or successor. An ex- so that the element A204 in the first row and the fifth
ample of a directed path graph with 5 vertices is column of matrix A2 , is equal to 2, as designated in bold
presented in Fig. 3(h). font in the matrix A2 below,

Remark 7: Path and circular graphs (directed and undi- 


0 3 1 2 1 2 0 1 0

rected) are of particular interest in Data Analytics, since 11 3 2 2 1 1 0 1
their domain properties correspond to classical time or

22 2 4 1 1 1 1 1
space domains. Therefore, any graph signal processing A2 =
31 2 1 3 1 0 0 1, (15)
or machine learning paradigm which is developed for path 42 1 1 1 4 1 1 1
and circular graphs is equivalent to its corresponding stan- 50 1 1 0 1 2 1 1
dard time and/or spatial domain paradigm. 6 1 0 1 0 1 1 2 0
7 0 1 1 1 1 1 0 3
0 1 2 3 4 5 6 7

7
0 1 0 1

2 2

3 3
4 4

6 5 6 5
7 7

Figure 4: Walks of length K = 2 from vertex 0 to vertex 4 (thick (a)


blue and brown lines). 0 1

2
thus indicating K = 2 walks between these vertices.
3
4
M3 : The number of walks of length not higher than K,
between the vertices m and n, is given by the mn-th
element of the matrix 6 5
7
BK = A + A2 + · · · + AK , (16)
(a)
that is, by a value in its m-th row and n-th column.
In other words, the total number of walks is equal to Figure 5: The K-neighborhoods of vertex 0 for the graph from Fig.
4, where: (a) K = 1 and (b) K = 2. The neighboring vertices are
the sum of all walks, which are individually modeled shaded.
by Ak , k = 1, 2, . . . , K, as stated in property M2 .

M6 : The distance, rmn , between two vertices m and n in


M4 : The K-neighborhood of a vertex is defined as a set of an unweighed graph is equal to the minimum path
vertices that are reachable from this vertex in walks length between these vertices. For example, for the
whose length is up to K. For a vertex m, based on graph in Fig. 4, the distance between vertex 1 and
the property M2 , the K-neighborhood is designated vertex 5 is r15 = 2.
by the positions and the numbers of non-zero ele-
ments in the m-th row of matrix BK in (16). The
M7 : The diameter, d, of a graph is equal to the largest
K-neighborhoods of vertex 0 for K = 1 and K = 2
distance (number of edges) between all pairs of its
are illustrated in Fig. 5.
vertices, that is, d = maxm,n∈V rmn . For example,
the diameter of a complete graph is d = 1, while the
M5 : A path is a special kind of walk whereby each vertex diameter of the graph in Fig. 4 is d = 3, with one of
can be included only once, whereby the number of the longest paths being 6 → 3 → 2 → 1.
edges included in a path is referred to as the path
cardinality or path length, while the path weight is M8 : Vertex closeness centrality. The farness (remote-
defined as the sum of weights along these edges. ness) of a vertex is equal the
P sum of its distances
An Euler path is a graph path that uses every edge to all other vertices, fn = m6=n rnm . The vertex
of a graph exactly once. An Euler path for an un- closeness is defined then as an inverse to the farness,
weighted graph does exist if and only if at most two cn = 1/fn , and can be interpreted as a measure of
of its vertices are of an odd degree. An Euler path how long it will take for data to sequentially shift
which starts and ends at the same vertex is referred from the considered vertex to all other vertices. For
to as an Euler circuit, and it exists if and only if the example, the vertex farness and closeness for the ver-
degree of every vertex is even. tices n = 2 and n = 5 in Fig. 1(a) are respectively
A Hamiltonian path is a graph path between two f2 = 10, f5 = 14, and c2 = 0.1, c5 = 0.071.
vertices of a graph that visits each vertex in a graph
exactly once, while a cycle that uses every vertex in M9 : Vertex or edge betweenness. Vertex/edge between-
a graph exactly once is called a Hamiltonian cycle. ness of a vertex n or edge (m, n) is equal to the num-
ber of times that this vertex/edge acts as a bridge
along the shortest paths between any other two ver-
tices.
8
0 1 0 1

2 2

3 3
4 4

6 5 6 5
7 (a) 7
0.23
0 1
0.
Figure 7: A disconnected graph which consists of two sub-graphs.
74
24 0.35
0. 2
0.23

0.26 0.2 components). Back to mathematical formalism, such


3 4
disjoint graphs impose a block-diagonal form on the
4 adjacency matrix, A, and the Laplacian, L. For M
0.32

0.1
4 disjoint components (subgraphs) of a graph, these
0.5

matrices take the form


1

6 0.32 0.15 5
7 (b)
 
A1 0 · · · 0
 0 A2 · · · 0 
Figure 6: Concept of the spanning tree for graphs. (a) A spanning
A= . . ..  (17)
 
tree for the unweighted graph from Fig. 1(a). (b) A spanning tree . . . .
 . . . . 
for the weighted graph from Fig. 2, designated by thick blue edges.
The graph edges in thin blue lines are not included in this spanning 0 0 · · · AM
tree.
 
L1 0 · · · 0
 0 L2 · · · 0 
L= . . ..  . (18)
 
M10 : Spanning Tree and Minimum Spanning Tree. The  .. .. . .. . 
spanning tree of a graph is a subgraph that is tree- 0 0 · · · LM
shaped and connects all its vertices together. A tree
does not have cycles and cannot be disconnected. Note that this block diagonal form is obtained only if
The cost of the spanning tree represents the sum of the vertex numbering follows the subgraph structure.
the weights of all edges in the tree. The minimum
spanning tree is a spanning tree for which the cost Example 3: Consider a graph derived from Fig. 1(a) by
removing some edges, as shown in Fig. 7. The adjacency
is minimum among all possible spanning trees in a
matrix for this graph is given by
graph. Spanning trees are typically used in graph
clustering analysis.
 
In the classical literature on graph theory, it is com- 0 0 1 1 1 0 0 0 0
1 1 0 1 0 0 0 0 0
monly assumed that the values of edge weights in

 
2 1 1 0 1 0 0 0 0 
weighted graphs are proportional to the standard 3

1 0 1 0 0 0 0 0


vertex distance, rmn . However, this is not the case A=   (19)
4 0 0 0 0 0 1 0 1 
in data analytics on graphs, where the edge weights 5

0 0 0 0 1 0 0 1


are typically defined as a function of vertex distance,
 
6 0 0 0 0 0 0 0 1 
for example, through a Gaussian kernel, Wmn ∼ 7 0 0 0 0 1 1 1 0
2
exp(−rmn ), or some other data similarity metric. 0 1 2 3 4 5 6 7
The cost function to minimize for the Minimum Span-
with the corresponding Laplacian in the form
ning Tree (MST) can then be defined as a log-sum
of distances, rmn = −2 ln Wmn . A spanning tree for 
3 −1 −1 −1 0 0 0 0

the graph from Fig. 2 is shown in Fig. 6. The cost  −1 2 −1 0 0 0 0 0 
for this spanning tree, calculated as a sum of all dis-
 
 −1 −1 3 −1 0 0 0 0 
 
tances (log-weights), rmn , is 15.67.  −1 0 −1 2 0 0 0 0 
L=  .
 0 0 0 0 2 −1 0 −1  
 0
 0 0 0 −1 2 0 −1  
M11 : An undirected graph is called connected if there exists  0 0 0 0 0 0 1 −1 
a walk between each pair of its vertices. 0 0 0 0 −1 −1 −1 3
(20)
Observe that, as elaborated above, these matrices are in
M12 : If the graph is not connected, then it consists of two a block-diagonal form with the two constituent blocks
or more disjoint but locally connected subgraphs (graph clearly separated. Therefore, for an isolated vertex in a

9
graph, the corresponding row and column of the matrices 2a 2b
A and L will be zero-valued.
2 4a 4b

M13 : For two graphs defined on the same set of vertices, 4


with the corresponding adjacency matrices A1 and 1 ⊗ a b = 1a 1b
A2 , the summation operator produces a new graph, 3
for which the adjacency matrix is given by 0 3a 3b

A = A1 + A2 .
0a 0b
To maintain the binary values in the resultant ad-
jacency matrix, Amn ∈ {0, 1}, a logical (Boolean) Figure 8: Kronecker (tensor) product of two graphs.
summation rule, e.g., 1 + 1 = 1, may be used for
matrix addition. In this monograph, the arithmetic 3
summation rule is assumed in data analytics algo-
5
rithms, as for example, in equation (16) in property
M3 .
2  a b c =
4
M14 : The Kronecker (tensor) product of two disjoint graphs
G1 = (V1 , B1 ) and G2 = (V2 , B2 ) yields a new graph 1
G = (V, B) where V = V1 × V2 is a direct product
of the sets V1 and V2 , and ((n1 , m1 ), (n2 , m2 )) ∈ B
only if (n1 , n2 ) ∈ B1 and (m1 , m2 ) ∈ B2 . 3a 3b 3c

The adjacency matrix A of the resulting graph G is 5a 5b 5c


then equal to the Kronecker product of the individ-
ual adjacency matrices A1 and A2 , that is 2a 2b 2c

A = A1 ⊗ A2 . 4a 4b 4c

1a 1b 1c
An illustration of the Kronecker product for two sim-
ple graphs is given in Fig. 8.
Figure 9: Cartesian product of two graphs.

M15 : The Cartesian product (graph product) of two dis-


3. Spectral Decomposition of Graph Matrices
joint graphs G1 = (V1 , B1 ) and G2 = (V2 , B2 ) gives
a new graph G = G1 G2 = (V, B), where V = V1 × As a prerequisite for the optimization and data analyt-
V2 is a direct product of the sets V1 and V2 , and ics on graphs, we next introduce several intrinsic connec-
((m1 , n1 ), (m2 , n2 )) ∈ B, only if tions between standard linear algebraic tools and graph
topology [11, 12, 13, 14, 27, 30, 31, 32].
m1 = m2 and (n1 , n2 ) ∈ B2 or
n1 = n2 and (m1 , m2 ) ∈ B1 . 3.1. Eigenvalue Decomposition of the Adjacency Matrix
Like any other general matrix, graph description ma-
The adjacency matrix of a Cartesian product of two trices can be analyzed using eigenvalue decomposition. In
graphs is then given by the Kronecker sum this sense, a column vector u is an eigenvector of the ad-
jacency matrix A if
def
A = A1 ⊗ IN2 + IN1 ⊗ A2 = A1 ⊕ A2 ,
Au = λu, (21)
where A1 and A2 are the respective adjacency matri-
ces of graphs G1 , G2 , while N1 and N2 are the corre- where the constant λ, that corresponds to the eigenvector
sponding numbers of vertices in G1 and G2 , with IN1 u, is called the eigenvalue.
and IN2 being the identity matrices of orders N1 and The above relation can be equally written as (A −
N2 . The Cartesian product of two simple graphs is λI)u = 0, and a nontrivial solution for u does exist if
illustrated in Fig. 9. Notice that a Cartesian product det|A − λI|= 0.
of two graphs that reside in a two-dimensional space
can be considered as a three-dimensional structure In other words, the problem turns into that of finding zeros
of vertices and edges (cf. tensors [47]). of det|A − λI| as roots of a polynomial in λ, called the
10
characteristic polynomial of matrix A, which is given by This property, together with P3 , follows from the
Faddeev–LeVerrier algorithm to calculate the coeffi-
P (λ) = det|A − λI|= λN + c1 λN −1 + c2 λN −2 + · · · + cN . cients of the characteristic polynomial of a square
(22) matrix, A, as c1 = −tr{A}, c2 = − 21 (tr{A2 } −
Remark 8: The order of the characteristic polynomial of tr{A2 }), and so on. Since tr{A} = 0 and the di-
graphs has the physical meaning of the number of vertices, agonal elements of A2 are equal to the number of
N , within a graph while the eigenvalues represent the roots edges connected to each vertex (vertex degree), the
of the characteristic polynomial, that is, P (λ) = 0. total number of edges is equal to tr{A2 }/2 = −c2 .
In general, for a graph with N vertices, its adjacency P5 : The degree of the minimal polynomial, Nm , is strictly
matrix has N eigenvalues, λ0 , λ1 , . . . , λN −1 . Some eigen- larger than the graph diameter, d.
values may also be repeated, which indicates that zeros of
algebraic multiplicity higher than one exist in the charac- Example 4: Consider a connected graph with N vertices
teristic polynomial. The total number of roots of a char- and only two distinct eigenvalues, λ0 and λ1 . The order
acteristic polynomial, including their multiplicities, must of minimal polynomial is then Nm = 2, while the diam-
eter of this graph is d = 1, which indicates a complete
be equal to its degree, N , whereby
graph.
• The algebraic multiplicity of an eigenvalue, λk , is
equal to its multiplicity when considered as a root of Example 5: For the graph from Fig. 1(a), the characteristic
the characteristic polynomial; polynomial of its adjacency matrix, A, defined in (1), is given
by
• The geometric multiplicity of an eigenvalue, λk , rep-
resents the number of linearly independent eigenvec- P (λ) = λ8 − 12λ6 − 8λ5 + 36λ4 + 36λ3 − 22λ2 − 32λ − 8,
tors that can be associated with this eigenvalue.
with the eigenvalues
The geometric multiplicity of an eigenvalue is always
λ ∈ {−2, −1.741, −1.285, −0.677, −0.411, 1.114, 1.809, 3.190}.
equal or lower than its algebraic multiplicity.
Denote the distinct eigenvalues in (22) by µ1 , µ2 , . . . , With all the eigenvalues different, the minimal polynomial is
µNm , and their corresponding algebraic multiplicities by equal to the characteristic polynomial, Pmin (λ) = P (λ).
p1 , p2 , . . . , pNm , where p1 + p2 + · · · + pNm = N is equal to Example 6: The adjacency matrix for the disconnected graph
the order of the considered matrix/polynomial and Nm ≤ from Fig. 7 is given in (19), and its characteristic polynomial
N is the number of distinct eigenvalues. The characteristic has the form
polynomial can now be rewritten in the form
P (λ) = λ8 − 9λ6 − 6λ5 + 21λ4 + 26λ3 + 3λ2 − 4λ
p1 p2 pNm
P (λ) = (λ − µ1 ) (λ − µ2 ) · · · (λ − µNm ) .
with the eigenvalues

Definition: The minimal polynomial of the considered ad- λ ∈ {−1.5616, −1.4812, −1, −1, 0, 0.3111, 2.1701, 2.5616}.
jacency matrix, A, is obtained from its characteristic poly-
Observe that the eigenvalue λ = −1 is of multiplicity higher
nomial by reducing the algebraic multiplicities of all eigen-
than 1 (multiplicity of 2), so that the corresponding minimal
values to unity, and has the form polynomial becomes
Pmin (λ) = (λ − µ1 )(λ − µ2 ) · · · (λ − µNm ). Pmin (λ) = λ7 − λ6 − 8λ5 + 2λ4 + 19λ3 + 7λ2 − 4λ.

3.1.1. Properties of the characteristic and minimal poly- Although this graph is disconnected, the largest eigenvalue of
nomial its adjacency matrix, λmax = 2.5616, is of multiplicity 1. Re-
lation between the graph connectivity and the multiplicity of
P1 : The order of the characteristic polynomial is equal
eigenvalues will be discussed later.
to the number of vertices in the considered graph.
P2 : For λ = 0, P (0) = det(A) = −λ0 (−λ1 ) · · · (−λN −1 ). 3.2. Spectral Graph Theory
P3 : The sum of all the eigenvalues is equal to the sum If all the eigenvalues of A are distinct (of algebraic
of the diagonal elements of the adjacency matrix, A, multiplicity 1), then the N equations in the eigenvalue
that is, its trace, tr{A}. For the characteristic poly- problem in (21), that is, Auk = λk uk , k = 0, 1, . . . , N − 1,
nomial of the adjacency matrix, P (λ), this means can be written in a compact form as one matrix equation
that the value of c1 in (22) is c1 = tr{A} = 0. with respect to the adjacency matrix, as

P4 : The coefficient c2 in P (λ) in (22) is equal to the AU = UΛ


number of edges multiplied by −1.
or
A = UΛU−1 , (23)
11
where Λ = diag(λ0 , λ1 , . . . , λN −1 ) is the diagonal matrix
with the eigenvalues on its diagonal and U is a matrix
composed of the eigenvectors, uk , as its columns. Since
the eigenvectors, u, are obtained by solving a homoge-
neous system of equations, defined by (21) and in the form
(A − λI)u = 0, one element of the eigenvector u can be 0 1 2 3 4
arbitrarily chosen. The common choice is to enforce unit 5 6 7
2
energy, kuk k2 = 1, for every k = 0, 1, . . . , N − 1.
Remark 9: For an undirected graph, the adjacency matrix 0
1
A is symmetric, that is A = AT . Any symmetric matrix 2 4
3
(i) has real-valued eigenvalues; (ii) is diagonalizable; and 0
1 2 3
4
5
6
7 5
(iii) has orthogonal eigenvectors, and hence 6 7

U−1 = UT . 0
1
2 4
3
1 2 5 6
0 3 4 7 5
T
Remark 10: For directed graphs, in general, A 6= A . 6 7

Recall that a square matrix is diagonalizable if all its


0
eigenvalues are distinct (this condition is sufficient, but not 1
2 4
necessary) or if the algebraic multiplicity of each eigen- 0 2 7
3

1 3 4 5 6
value is equal to its geometrical multiplicity. 7
5
6
For some directed graphs, the eigenvalues of their ad-
jacency matrix may be with algebraic multiplicity higher 0
1
than one, and the matrix A may not be diagonalizable. 2
3 4
In such cases, the algebraic multiplicity of the considered 0 1 5 6 7
2 3 4
eigenvalue is higher than its geometric multiplicity and the 6 7
5

Jordan normal form may be used in decomposition.


Definition: The set of the eigenvalues of an adjacency ma- 0
1

trix is called the graph adjacency spectrum. 0 2 3 5


3
2 4

1 4 6 7
Remark 11: The spectral theory of graphs studies proper- 7
5
6
ties of graphs through the eigenvalues and eigenvectors of
their associated adjacency and graph Laplacian matrices. 0
1
Example 7: For the graph presented in Fig. 1(a), the graph 3
2 4
0 3 6 7
adjacency spectrum is given by λ ∈ {−2, −1.741, −1.285, −0.677, 1 2 4 5 5
−0.411, 1.114, 1.809, 3.190}, and is shown in Fig. 10(top). 6 7

Example 8: The vertices of the graph presented in Fig. 1(a)


0
are randomly reordered, as shown in Fig. 11. Observe that the 1
2
graph adjacency spectrum, given in the same figure, retains the 4 5 6 7
3 4

same values, with vertex indices of the eigenvectors reordered 0 1 2 3 5


7
in the same way as the graph vertices, while the eigenvalues 6

(spectra) retain the same order as in the original graph in Fig.


10. By a simple inspection we see that, for example, the eigen- 0
1
vector elements at the vertex index position n = 0 in Fig. 10 3
2 4

are now at the vertex index position n = 3 in all eigenvectors 0 1 2 3 4 5 6 7 5


in Fig. 11. 6 7

Remark 12: A unique feature of graphs is that vertex


reindexing does not alter the eigenvalues of the adjacency Figure 10: Eigenvalues, λk , for spectral indices (eigenvalue number)
matrix, while the corresponding eigenvectors of the rein- k = 0, 1, . . . , N − 1, and elements of the corresponding eigenvectors,
uk (n), as a function of the vertex index n = 0, 1, . . . , N − 1, for the
dexed adjacency matrix contain the same elements as the adjacency matrix, A, of the undirected graph presented in Fig. 1(a).
original eigenvectors, but reordered according to the ver- The distinct eigenvectors are shown both on the vertex index axis,
tex renumbering. This follows from the properties of the n, (left) and on the graph itself (right).
permutation matrix, as in equation (3).

12
3.2.1. The DFT basis functions as a special case of eigen-
vectors of the adjacency matrix
For continuity with standard spectral analysis, we shall
first consider directed circular graphs, as this graph topol-
0 1 2 3 4
ogy encodes the standard time and space domains.
5 6 7 Eigenvalue decomposition for the directed circular graph
in Fig. 3(g), assuming N vertices, follows from the defini-
tion Auk = λk uk , and the form of the adjacency matrix in
3
2 (14). Then, the elements of vector Auk are uk (n − 1), as
4
1 3 6
5 1
effectively matrix A here represents a shift operator, while
0 2 4 5 7 0 the elements of vector λk uk are λk uk (n), to give
6 7

uk (n − 1) = λk uk (n), (24)
3
2

0 2 4 6
5
4 1 where uk (n) are the elements of the eigenvector uk for
1 3 5 7 0 given vertex indices n = 0, 1, . . . , N − 1, and k is the index
6 7
of an eigenvector, k = 0, 1, . . . , N − 1. This is a first-
order linear difference equation, whose general form for a
3
2 discrete signal x(n) is x(n) = ax(n − 1), for which the
3 4 7
5
4 1 solution is
0 1 2 5 6 0
1
uk (n) = √ ej2πnk/N and λk = e−j2πk/N ,
7
6
(25)
N
3
2
4
with k = 0, 1, . . . , N −1. It is straightforward to verify that
5 1
0 2 3 6 7 this solution satisfies the difference equation (24). Since
1 4 5 0
6 7 the considered graph is circular, the eigenvectors also ex-
hibit circular behavior, that is, uk (n) = uk (n + N ). For
3 convenience, a unit energy condition is used to find the
2

5
4 1
constants within the general solution of this first-order lin-
0
1 2
3 4 5
6 7
ear difference equation. Observe that the eigenvectors in
0
6 7 (25) correspond exactly to the standard harmonic basis
functions in DFT.
3
2 Remark 13: Classic DFT analysis may be obtained as a
3 5 6 7
5
4 1 special case of the graph spectral analysis in (25), when
0 1 2 4 0 considering directed circular graphs. Observe that for cir-
7
6
cular graphs, the adjacency matrix plays the role of a shift
operator, as seen in (24), with the elements of Auk equal
3
2 to uk (n − 1). This property will be used to define the shift
4
0 1 6 7
5 1
operator on a graph in the following sections.
2 3 4 5 0
6 7
3.2.2. Decomposition of graph product adjacency matrices
3 We have already seen in Fig. 8 and Fig. 9 that complex
2

5
4 1
graphs, for example those with a three-dimensional vertex
0 1 2 3 4 5 6 7
space, may be obtained as a Kronecker (tensor) product
0
6 7 or a Cartesian (graph) product of two disjoint graphs G1
and G2 . Their respective adjacency matrices, A1 and A2 ,
are correspondingly combined into the adjacency matrices
Figure 11: Eigenvalues, λk , for spectral indices (eigenvalue number)
k = 0, 1, . . . , N − 1, and elements of the corresponding eigenvectors, of the Kronecker graph product, A⊗ = A1 ⊗ A2 and the
uk (n), as a function of the vertex index n = 0, 1, . . . , N − 1, for the Cartesian graph product, A⊕ = A1 ⊕ A2 , as described in
adjacency matrix, A, of the undirected graph presented in Fig. 1(a) properties M14 and M15 .
with index reordering according to the scheme [0, 1, 2, 3, 4, 5, 6, 7] → Graph Kronecker product. For the eigendecomposi-
[3, 2, 4, 5, 1, 0, 6, 7]. The distinct eigenvectors are shown both on the
vertex index axis, n, (left) and on the graph itself (right). Compare tion of the Kronecker product of matrices A1 and A2 , the
with the results for the original vertex ordering in Fig. 10. following holds

A⊗ = A1 ⊗ A2 = (U1 Λ1 UH H
1 ) ⊗ (U2 Λ2 U2 )

= (U1 ⊗ U2 )(Λ1 ⊗ Λ2 )(U1 U2 )H ,

13
or in other words, the eigenvectors of the adjacency ma-
trix of the Kronecker product of graphs are obtained by
a Kronecker product of the eigenvectors of the adjacency
(A ) (A )
matrices of individual graphs, as uk+lN1 = uk 1 ⊗ ul 2 ,
k = 0, 1, 2, . . . , N1 − 1, l = 0, 1, 2, . . . , N2 − 1.
Remark 14: The eigenvectors of the individual graph ad-
(A ) (A )
jacency matrices, uk 1 and uk 2 , are of much lower di-
mensionality than those of the adjacency matrix of the
resulting graph Kronecker product. This property can be
used to reduce computational complexity when analyzing
data observed on this kind of graph. Recall that the eigen-
values of the resulting graph adjacency matrix are equal
to the product of the eigenvalues of adjacency matrices of
Figure 12: Graph Cartesian product of two planar circular un-
the constituent graphs, G2 and G2 , that is, weighted graphs, with N = 8 vertices, produces a three-dimensional
(A1 ) (A2 )
torus topology.
λk+lN1 = λk λl .

under the assumption that U−1 exists. For an arbitrary


Graph Cartesian product. The eigendecomposition of natural number, m, the above result generalizes straight-
the adjacency matrix of the Cartesian product of graphs, forwardly to
whose respective adjacency matrices are A1 and A2 , is of Am = UΛm U−1 . (26)
the form Further, for any matrix function, f (A), that can be writ-
ten in a polynomial form, given by
A⊕ = A1 ⊕ A2 = (U1 ⊗ U2 )(Λ1 ⊕ Λ2 )(U1 U2 )H .
(A ) (A ) (A1 ) (A2 )
f (A) = h0 A0 + h1 A1 + h2 A2 + · · · + hN −1 AN −1 ,
with uk = uk 1 ⊗ uk 2 and λk+lN1 = λk + λl ,
k = 0, 1, 2, . . . , N1 − 1, l = 0, 1, 2, . . . , N2 − 1. its eigenvalue decomposition is, in general, given by
Remark 15: The Kronecker product and the Cartesian
product of graphs share the same eigenvectors of their f (A) = Uf (Λ)U−1 .
adjacency matrices, while their spectra (eigenvalues) are
This is self-evident from the properties of eigendecompo-
different.
sition of matrix powers, defined in (26), and the linearity
Example 9: The basis functions of classic two-dimensional of the matrix multiplication operator, U(h0 A0 + h1 A1 +
(image) 2D-DFT follow from the spectral analysis of a Carte- h2 A2 + · · · + hN −1 AN −1 )U−1 .
sian graph product which is obtained as a product the circular
directed graph from Fig. 3 with itself. Since from (25),√the 3.3. Eigenvalue Decomposition of the graph Laplacian
eigenvector elements of each graph are uk (n) = ej2πnk/N / N ,
then the elements of the resulting basis functions are given by Spectral analysis for graphs can also be performed based
on the graph Laplacian, L, defined in (7). For convenience,
1 j2πmk/N j2πnl/N we here adopt the same notation for the eigenvalues and
uk+lN (m + nN ) = e e ,
N eigenvectors of the graph Laplacian, as we did for the ad-
for k = 0, 1, . . . , N − 1, l = 0, 1, . . . , N − 1, m = 0, 1, . . . , N − 1, jacency matrix A, although the respective eigenvalues and
and n = 0, 1, . . . , N − 1. Fig. 12 illustrates the Cartesian eigenvectors are not directly related. The Laplacian of an
product of two circular undirected graphs with N1 = N2 = 8. undirected graph can be therefore written as
Remark 16: Cartesian products of graphs may be used
L = UΛUT or LU = UΛ,
for multidimensional extensions of vertex spaces and graph
data domains, whereby the resulting eigenvectors (basis where Λ = diag(λ0 , λ1 , . . . , λN −1 ) is a diagonal matrix of
functions) can be efficiently calculated using the eigenvec- Laplacian eigenvalues and U is the orthonormal matrix of
tors of the original graphs, which are of lower dimension- its eigenvectors (in columns), with U−1 = UT . Note that
ality. the Laplacian of an undirected graph is always diagonal-
izable, since its matrix L is real and symmetric.
3.2.3. Decomposition of matrix powers and polynomials Then, every eigenvector, uk , k = 0, 1, . . . , N − 1, of a
From the eigendecomposition of the adjacency matrix graph Laplacian, L, satisfies
A in (23), eigenvalue decomposition of the squared adja-
cency matrix, AA = A2 , is given by Luk = λk uk . (27)

A2 = UΛU−1 UΛU−1 = UΛ2 U−1 ,


14
Definition: The set of the eigenvalues, λk , k = 0, 1, . . . , N −
1, of the graph Laplacian is referred to as the graph spec-
trum or graph Laplacian spectrum (cf. graph adjacency
spectrum based on A).
Example 10: The Laplacian spectrum of the undirected graph 0 1 2 3 4 5 6 7

from Fig. 2, is given by

λ ∈ {0, 0.29, 0.34, 0.79, 1.03, 1.31, 1.49, 2.21}, 0


1
2
and shown in Fig. 13, along with the corresponding eigenvec- 3 4

tors. The Laplacian spectrum of the disconnected graph from 0 1 2 3 4 5 6 7 5


7
Fig. 14, is given by 6

λ ∈ {0, 0, 0.22, 0.53, 0.86, 1.07, 1.16, 2.03}, 0


1
2 4
and is illustrated in Fig. 15. The disconnected nature of this 4 5 6 7
3

graph is indicated by the zero-valued eigenvalue of algebraic 0 1 2 3 5


6 7
multiplicity 2, that is, λ0 = λ1 = 0.
Remark 17: Observe that when graph-component (sub- 0
1
graph) based vertex indexing is employed, then even though 3
2 4
1 4 5
the respective graph spectra for the connected graph in 0 2 3 6 7 5
Fig. 13 and the disconnected graph Fig. 15 are similar, for 6 7

a given spectral index the eigenvectors of a disconnected


graph take nonzero values on only one of the individual 0
1
disconnected graph components. 3
2 4
3 4 5 6
0 1 2 7 5
7
3.3.1. Properties of Laplacian eigenvalue decomposition 6

L1 : The graph Laplacian matrix is defined in (7) in such


0
a way that the sum of elements in its each row (col- 1
2
umn) is zero. As a consequence, this enforces the 0 2 3 7
3 4

inner products of every row of L with any constant 1 4 5 6


7
5
6
vector, u, to be zero-valued, that is, Lu = 0 = 0 · u,
for any constant vector u. This means that at least 0
one eigenvalue of the Laplacian is zero, λ0 = 0, and 2
1

3 4
its corresponding constant unit √ energy√eigenvector is 0 2 5 6
1 3 4 7
given by u0 = [1, 1, . . . , 1]T / N = 1/ N . 6 7
5

L2 : The multiplicity of the eigenvalue λ0 = 0 of the 0


1
2
graph Laplacian is equal to the number of connected 0 1 3 5 7
3 4

components (connected subgraphs) in the correspond- 2 4 6 5


6 7
ing graph.
This property follows from the fact that the Lapla- 0
1
cian matrix of disconnected graphs can be written 2 4
3
in a block diagonal form, as in (18). The set of 0 1
2
3 4
5 6 7 5
eigenvectors of a block-diagonal matrix is obtained by 6 7

grouping together the sets of eigenvectors of individ-


ual block submatrices. Since each subgraph of a dis-
Figure 13: Eigenvalues, λk , for spectral indices (eigenvalue number)
connected graph behaves as an independent graph, k = 0, 1, . . . , N − 1, and elements of the corresponding eigenvec-
then for each subgraph λ0 = 0 is the eigenvalue of the tors, uk (n), as a function of the vertex index n = 0, 1, . . . , N − 1,
corresponding block Laplacian submatrix, according for the Laplacian matrix, L, of the undirected graph presented in
to property L1 . Therefore, the multiplicity of the Fig. 2. The distinct eigenvectors are shown both on the vertex in-
dex axis, n, (left) and on the graph itself (right). A comparison
eigenvalue λ0 = 0 corresponds to the number of dis- with the eigenvectors of the adjacency matrix in Fig. 10, shows that
joint components (subgraphs) within a graph. for the adjacency matrix the smoothest eigenvector corresponds to
the largest eigenvalue, while for the graph Laplacian the smoothest
This property does not hold for the adjacency ma- eigenvector corresponds to the smallest eigenvalue, λ0 .
trix, since there are no common eigenvalues in the
adjacency matrices for the blocks (subgraphs) or ar-
bitrary graphs, like in the case of λ0 = 0 for the
15
0.23
0 1
0.
74
0.
24 0.35
2

0.26
3
0 1 2 3 4 5 6 7
4
4
0.1

0.5
1
0
1
6 0.32 0.15 5 2
7 3 4

0 1 2 3 4 5 6 7 5
6 7
Figure 14: A disconnected weighted graph which consists of two
sub-graphs.
0
1
2 4
3
graph Laplacian matrix and any graph. In this sense, 0 1 2 3 4 5 6 7 5
the graph Laplacian matrix carries more physical mean- 6 7

ing than the corresponding adjacency matrix.


0
Remark 18: If λ0 = λ1 = 0, then the graph is not 2
1

4
connected. If λ2 > 0, then there are exactly two in- 4 5
3

0 1 2 3 6 7
dividually connected but globally disconnected com- 6 7
5

ponents in this graph. If λ1 6= 0 then this eigenvalue


may be used to describe the so called algebraic con- 0
1
nectivity of a graph, whereby very small values of λ1 2 4
3
indicate that the graph is weakly connected. This 0 1 2
3 4 5 6 7 5
can be used as an indicator of the possibility of graph 6 7

segmentation, as elaborated in Section 4.2.3.


0
1
2 4
3
L3 : As with any other matrix, the sum of the eigenvalues 0 1 2 3
4 5 6
7 5
of the Laplacian matrix is equal to its trace . For the 6 7

normalized Laplacian, the sum of its eigenvalues is


equal to the number of vertices, N , if there are no 0
1
isolated vertices. 3
2 4
1 3
0 2 4 5 6 7 5
6 7
L4 : The coefficient, cN , in the characteristic polynomial
of the graph Laplacian matrix 0
1
2
N N −1 4
P (λ) = det|L−λI|= λ +c1 λ +· · ·+cN −1 λ+cN 4 7
3

0 1 2 3 5 6 5
6 7
is equal to 0, since λ = 0 is an eigenvalue for the
Laplacian matrix.
0
1
For unweighted graphs, the coefficient c1 is equal 3
2 4
0 1 3
to the number of edges multiplied by −2. This is 2 4 5 6 7 5
straightforward to show following the relations from 6 7

property P4 which state that c1 = −tr{L}. For un-


weighted graphs, the diagonal elements of the Lapla-
Figure 15: Eigenvalues, λk , for spectral indices (eigenvalue number)
cian are equal to the corresponding vertex degrees k = 0, 1, . . . , N − 1, and elements of the corresponding eigenvectors,
(number of edges). Therefore, the number of edges uk (n), as a function of the vertex index n = 0, 1, . . . , N − 1, for the
in an unweighted graph is equal to −c1 /2. Laplacian matrix, L, of the undirected graph presented in Fig. 14.
The distinct eigenvectors are shown both on the vertex index axis,
Example 11: The characteristic polynomial of the Lapla- n, (left) and on the graph itself (right). This graph is characterized
cian for the graph from Fig. 1(a) is given by with the zero eigenvalue of algebraic multiplicity 2, that is, λ0 =
λ1 = 0. Observe that for every spectral index, k, the corresponding
P (λ) = λ8 − 24λ7 + 238λ6 − 1256λ5 + 3777λ4 eigenvectors take nonzero values on only one of the disconnected
graph components.
− 6400λ3 + 5584λ2 − 1920λ

16
with the eigenvalues λ ∈ {0, 5.5616, 5, 4, 4, 3, 1, 1}. Ob- L8 : The eigenvalues and eigenvectors of the normalized
serve that the eigenvalues λ = 1 and λ = 4 are of multi- Laplacian of a bipartite graph, with the disjoint sets
plicity higher than one. The minimal polynomial there- of vertices E and H, satisfy the relation, referred to
fore becomes Pmin (λ) = λ6 − 19λ5 + 139λ4 − 485λ3 + as the graph spectrum folding, given by
796λ2 − 480λ.
λk = 2 − λN −k (28)
For the disconnected graph in Fig. 7, the characteristic    
polynomial of the Laplacian is given by uE uE
uk = and uN −k = , (29)
8 7 6 5 4 3 2
uH −uH
P (λ) = λ −18λ +131λ −490λ +984λ −992λ +384λ ,
where uk designates the k-th eigenvector of a bipar-
with the eigenvalues λ ∈ {0, 0, 1, 2, 3, 4, 4, 4}. The eigen- tite graph, uE is its part indexed on the first set of
value λ = 0 is of algebraic multiplicity 2 and the eigen- vertices, E, while uH is the part of the eigenvector
value λ = 4 of algebraic multiplicity 3, so that the mini- uk indexed on the second set of vertices, H.
mal polynomial takes the form
In order to prove this property, we shall write the
5 4 3 2
Pmin (λ) = λ − 10λ + 35λ − 50λ + 24λ adjacency and the normalized Laplacian matrices of
an undirected bipartite graph in their block forms
Since the eigenvalue λ = 0 is of algebraic multiplicity    
2, property L2 indicates that this graph is disconnected, 0 AEH I LEH
A= and L N = .
with two disjoint sub-graphs as its constituent compo- ATEH 0 LTEH I
nents. The eigenvalue relation, LN uk = λk uk , can now be
evaluated as
   
uE + LEH uH u
L5 : Graphs with identical spectra are called isospectral LN uk = T = λk E .
or cospectral graphs. However, isospectral graphs LEH uE + uH uH
are not necessary isomorphic, and construction of From there, we have uE +LEH uH = λk uE and LTEH uE +
isospectral graphs that are not isomorphic is an im- uH = λk uH , resulting in LEH uH = (λk − 1)uE and
portant topic in graph theory. LTEH uE = (λk − 1)uH , to finally yield
   
Remark 19: A complete graph is uniquely deter- uE uE
LN = (2 − λk ) .
mined by its Laplacian spectrum [48]. The Laplacian −uH −uH
spectrum of a complete unweighted graph, with N
This completes the proof.
vertices, is λk ∈ {0, N, N, . . . , N }. Therefore, two
complete isospectral graphs are also isomorphic. Since for the graph Laplacian λ0 = 0 always holds
(see the property L1 ), from λk = 2 − λN −k in (28),
it then follows that the largest eigenvalue is λN = 2,
L6 : For a J -regular graph, as in Fig. 3(c), the eigen- which also proves the property L7 for a bipartite
vectors of the graph Laplacian and the adjacency graph.
matrices are identical, with the following relation for
3.3.2. Fourier analysis as a special case of the Laplacian
the eigenvalues,
spectrum
λk
(L)
= J − λk ,
(A) Consider the undirected circular graph from Fig. 3(e).
Then, from the property L1 , the eigendecomposition rela-
where the superscript L designates the Laplacian and tion for the Laplacian of this graph, Lu = λu, admits a
superscript A the corresponding adjacency matrix. simple form
This follows directly from UT LU = UT (J I − A)U. −u(n − 1) + 2u(n) − u(n + 1) = λu(n). (30)
This is straightforward to show by inspecting the Lapla-
L7 : The eigenvalues of the normalized graph Laplacian, cian for the undirected circular graph from Fig. 3(e), with
LN = I−D−1/2 AD−1/2 , are nonnegative and upper- N = 8 vertices for which the eigenvalue analysis is based
bounded by on
0 ≤ λ ≤ 2. 
2 −1 0 0 0 0 0 −1

u(0)

 −1 2 −1 0 0 0 0 0 
The equality for the upper bound holds if and only if   u(1) 
 

the graph is a bipartite graph, as in Fig. 3(b). This  0 −1 2 −1 0 0 0 0   u(2) 
 
 
will be proven within the next property.  0 0 −1 2 −1 0 0 0 
  u(3)  .
 
Lu =   0
 0 0 −1 2 −1 0 0   u(4) 
 

 0
 0 0 0 −1 2 −1 0 
  u(5) 
 
 0 0 0 0 0 −1 2 −1   u(6) 
−1 0 0 0 0 0 −1 2 u(7)
(31)
17
This directly gives the term −u(n − 1) + 2u(n) − u(n + into nonverlapping vertex subsets, with data in each sub-
1), while a simple inspection of the values u(0) and u(N ) set expected to exhibit relative similarity in same sense,
illustrates the circular nature of the eigenvectors; see also the segmentation of a graph refers to its partitioning into
Remark 6. The solution to the second order difference nonoverlapping graph segments (components).
equation in (30) is uk (n) = cos( 2πknN + φk ), with λk = The notion of vertex similarity metrics and their use
2πk to accordingly cluster the vertices into sets, Vi , of “re-
2(1−cos( N )). Obviously, for every eigenvalue, λk (except
for λ0 and for the last eigenvalue, λN −1 , for an even N ), lated” vertices in graphs, has been a focus of significant
we can choose to have two orthogonal eigenvectors with, research effort in machine learning and pattern recogni-
for example, φk = 0 and φk = π/2. This means that tion; this has resulted in a number of established vertex
most of the eigenvalues are of algebraic multiplicity 2, i.e., similarity measures and corresponding methods for graph
λ1 = λ2 , λ3 = λ4 , and so on. This eigenvalue multiplicity clustering [49]. These can be considered within two main
of two can be formally expressed as categories (i) clustering based on graph topology and (ii)
( spectral (eigenvector-based) methods for graph clustering.
2 − 2 cos(π(k + 1)/N ), for odd k = 1, 3, 5, . . . , Notice that in traditional clustering, a vertex is as-
λk =
2 − 2 cos(πk/N ), for even k = 2, 4, 6, . . . . signed to one cluster only. The type of clustering where
a vertex may belong to more than one cluster is referred
For an odd N , λN −2 = λN −1 , whereas for an even N we to as fuzzy clustering [49, 50], an approach that is not yet
have λN −1 = 2 which is of algebraic multiplicity 1. widely accepted in the context of graphs.
The corresponding eigenvectors u0 , u1 , . . . , uN −1 , then
have the form 4.1. Clustering based on graph topology
 Among many such existing methods, the most popular
sin(π(k + 1)n/N, ) for odd k, k < N − 1
 ones are based on:
uk (n) = cos(πkn/N ), for even k • Finding the minimum set of edges whose removal

cos(πn), for odd k, k = N − 1, would disconnect a graph in some “optimal” or “least

(32) disturbance” way (minimum cut based clustering).

where k = 0, 1, . . . , N − 1 and n = 0, 1, . . . , N − 1. • Designing clusters within a graph based on the dis-


Recall that an arbitrary linear combination of eigenvec- connection of vertices or edges which belong to the
tors u2k−1 and u2k , 1 ≤ k < N/2, is also an eigenvector highest numbers of shortest paths in the graph (ver-
since the corresponding eigenvalues are equal (in this case tex betweenness and edge betweenness based cluster-
their algebraic and the geometric multiplicities are both ing).
equal to 2). With this in mind, we can rewrite the full set • The minimum spanning tree of a graph has been a
of the eigenvectors in an alternative compact form, given basis for a number of widely used clustering methods
by [51, 52].
• Analysis of highly connected subgraphs (HCS) [53]


 1, for k = 0

exp(jπ(k + 1)n/N ), for odd k, k < N − 1 has also been used for graph clustering.
uk (n) =


 exp(−jπkn/N ), for even k, k > 0 • Finally, graph data analysis may be used for ma-

cos(πn), for odd k, k = N − 1, chine learned graph clustering, like for example,
the k-means based clustering methods [54, 55].
where j 2 = −1. It is now clearly evident that, as desired,
this set of eigenvectors is orthonormal, and that the indi- 4.1.1. Minimum graph cut
vidual eigenvectors, uk , correspond to the harmonic basis We shall first briefly review the notion of graph cuts, as
functions within the standard temporal/spatial DFT. spectral methods for graph clustering may be introduced
and interpreted based on the analysis and approximation
of the (graph topology-based) minimum cut clustering.
4. Vertex Clustering and Mapping Definition: Consider an undirected graph which is defined
by a set of vertices, V, and the corresponding set of edge
Definition: Vertex clustering is a type of graph learn- weights, W. Assume next that the vertices are grouped
ing which aims to group together vertices from the set V into k = 2 disjoint subsets of vertices, E ⊂ V and H ⊂ V,
into multiple disjoint subsets, Vi , called clusters. Vertices with E ∪ H = V and E ∩ H = ∅. A cut of this graph, for the
which are clustered into a subset of vertices, Vi , are ex- the given subsets of vertices, E and H, is equal to a sum
pected to exhibit a larger degree of within-cluster mutual of all weights that correspond to the edges which connect
similarity (in some sense) than with the vertices in other the vertices between the subsets, E and H, that is
subsets, Vj , j 6= i. X
While the clustering of graph vertices refers to the pro- Cut(E, H) = Wmn .
cess of identifying and arranging the vertices of a graph m∈E
n∈H

18
E = {0, 1, 2, 3} 4.1.2. Maximum-flow minimum-cut approach
0.23 This approach to the minimum cut problem employs
0 1
0.
74 the framework of flow networks.
0.
24 0.35
2
Definition: A flow network is a directed graph with an

0.23
3
0.26 0.2 arbitrary number of vertices, N ≥ 3, but which involves
4
two given vertices (nodes) called the source vertex, s,
) = 0.7
9 4
ut(E , H and the sink vertex, t, whereby the capacity of edges (arcs)
0.32

C
4 is defined by their weights. The flow (of information, wa-
0.1

0.5
1
0.32 ter, traffic, ...) through an edge cannot exceed its capacity
6 0.15 5
7 (the value of edge weight). For any vertex in the graph the
H = {4, 5, 6, 7}
sum of all input flows is equal to the sum of all its output
flows (except for the source and sink vertices).
Figure 16: A cut for the weighted graph from Fig. 2, with the disjoint
subsets of vertices defined by E = {0, 1, 2, 3} and H = {4, 5, 6, 7}. Problem formulation. The maximum-flow minimum-
The edges between the sets E and H are designated by thin red lines. cut solution to the graph partitioning aims to find the
The cut, Cut(E, H), is equal to the sum of the weights that connect maximum value of flow that can be passed through the
sets E and H, and has the value Cut(E, H) = 0.32+0.24+0.23 = 0.79.
graph (network flow) from the source vertex, s, to the sink
vertex, t. The solution is based on the max-flow min-cut
theorem which states that the maximum flow through a
graph from a given source vertex, s, to a given sink vertex,
Remark 20: For clarity, we shall focus on the case with
t, is equal to the minimum cut, that is, the minimum sum
k = 2 disjoint subsets of vertices. However, the analysis
of those edge weights (capacities) which, if removed, would
can be straightforwardly generalized to k ≥ 2 disjoint sub-
disconnect the source, s from the sink, t (minimum cut ca-
sets of vertices and the corresponding minimum k-cuts.
pacity) [51, 57]. Physical interpretation of this theorem is
Example 12: Consider the graph in Fig. 2, and the sets of obvious, since the maximum flow is naturally defined by
vertices E = {0, 1, 2, 3} and H = {4, 5, 6, 7}, shown in Fig. the graph flow bottleneck between the source and sink ver-
16. Its cut into the two components (sub-graphs), E and H, tices. The capacity of the bottleneck (maximum possible
involves the weights of all edges which exist between these two
flow) will then be equal to the minimum capacity (weight
sets, that is, Cut(E, H) = 0.32 + 0.24 + 0.23 = 0.79. Such edges
values) of the edges which, if removed, would disconnect
are shown by thin red lines in Fig. 16.
the graph into two parts, one containing vertex s and the
Definition: A cut which exhibits the minimum value of other containing vertex t. Therefore, the problem of max-
the sum of weights between the disjoint subsets E and H, imum flow is equivalent to the minimum cut (capacity)
considering all possible divisions of the set of vertices, V, problem, under the assumption that the considered ver-
is referred to as the minimum cut. Finding the minimum tices, s and t, must belong to different disjoint subsets of
cut of a graph in this way is a combinatorial problem. vertices E and H. This kind of cut, with predefined vertices
Remark 21: The number of all possible combinations to s and t, is called the (s, t) cut.
split an even number of vertices, N , into two disjoint sub- Remark 22: In general, if the source and sink vertices are
sets is given by not given, the maximum flow algorithm should be repeated
   
N N

N
 
N
 for all combinations of the source and sink vertices in order
C= + + ··· + + /2. to find the minimum cut of a graph.
1 2 N/2 − 1 N/2
The most widely used approach to solve the minimum-
To depict the computational burden associated with this cut maximum-flow problem is the Ford–Fulkerson method
“brute force” graph cut approach, even for a relatively [51, 57].
small graph with N = 50 vertices, the number of combina- Example 14: Consider the weighted graph from Fig. 2, with
tions to split the vertices into two subsets is C = 5.6 ·1014 . the assumed source and sink vertices, s = 0 and t = 6, as
shown in Fig. 17(a). The Ford–Fulkerson method is based on
Example 13: The minimum cut for the graph from Fig. 16 is
the analysis of paths and the corresponding flows between the
Cut(E, H) = 0.32 + 0.14 + 0.15 = 0.61 source and sink vertex. One such possible path between s and t,
0 → 1 → 4 → 5 → 7 → 6, is designated by the thick line in Fig.
for E = {0, 1, 2, 3, 4,5} and
 H= {6,
 7}. This can be confirmed 17(a). Recall that the maximum flow, for a path connecting the
by considering all 81 + 82 + 83 + 84 /2 = 127 possible cuts, that vertices s = 0 and t = 6, is restricted by the minimum capacity
is, all combinations of the subsets E and H for this small size (equal to the minimum weight) along the considered path. For
graph or by using, for example, the Stoer-Wagner algorithm the considered path 0 → 1 → 4 → 5 → 7 → 6 the maximum
[56]. flow from s = 0 to t = 6 is therefore equal to

max-flow = min{0.23, 0.23, 0.51, 0.15, 0.32} = 0.15,


0→1→4→5→7→6

19
since the minimum weight along this path is that connecting
vertices 5 and 7, W57 = 0.15. The value of this maximum flow is
then subtracted from each capacity (weight) in the considered
path, with the new residual edge capacities (weights) designated
in red in the residual graph in Fig. 17(a). The same procedure
 0.08
0.23 is repeated for the remining possible paths 0 → 3 → 6, 0 →
s 0 1
0.
74
2 → 4 → 7 → 6, and 0 → 2 → 3 → 6, with appropriate
24 0.35
0. 2 corrections to the capacities (edge weights) after consideration


0.23
0.08

of each path. The final residual form of the graph, after zero-
0.26 0.2
3 4 capacity edges are obtained in such a way that no new path with
4 nonzero flow from s to t can be defined, is given in Fig. 17(b).
For example, if we consider the path 0 → 1 → 2 → 3 → 6
0.32

0.3
4
0.1 0.5 (or any other path), in the residual graph, then its maximum

6
1
6 0.
 32  5
0.1 5
flow would be 0, since the residual weight in the edge 3 → 6 is
t 0.17 7 0 equal to 0. The minimum cut has now been obtained as that
(a) which separates the sink vertex, t = 6, and its neighborhood
from the the source vertex, s = 0, through the remaining zero-
E = {0, 1, 2, 3, 4, 5} capacity (zero-weight) edges. This cut is shown in Fig. 17(b),
0.08 and separates the vertices H = {6, 7} from the rest of vertices
s 0 1
0. by cutting the edges connecting vertices 3 → 6, 4 → 7, and
52
0 0.35 5 → 7. The original total weights of these edges are Cut(E, H)
2
0.08

= 0.32 + 0.14 + 0.15 = 0.61.


0.18 0.1 We have so far considered an undirected graph, but since
3 0
the Ford–Fulkerson algorithm is typically applied to directed
C ut 4
(E , H graphs, notice that an undirected graph can be considered as a
0

)= directed graph with every edge being split into a pair of edges
0.61 0
0.3

having the same weight (capacity), but with opposite direc-


6

6 0.03 0 5
7
tions. After an edge is used in one direction (for example, edge
t 5 − 7 in Fig. 17(a)) with a flow equal to its maximum capac-
H = {6, 7} ity of 0.15 in the considered direction, the other flow direction
(b)
(sister edge) becomes 0.30, as shown in Fig. 17(c). The edge
with opposite direction could be used (up the algebraic sum of
0.38
s 0
flows in both directions being equal to the total edge capacity)
1
0.08 to form another path (if possible) from the source to the sink
2 vertex. More specifically, the capacity of an edge (from the
0.08

0.38

pair) in the assumed direction is reduced by the same value of


3 the considered flow, while the capacity of the opposite-direction
edge (from the same pair) is increased by the same flow, and
4
can be used to send the flow in reverse direction if needed. All
0.36

residual capacities for the path from Fig. 17(a) are given in
0.6
6

0.17 0 Fig. 17(c). For clarity, the edge weights which had not been
6 5 changed by this flow are not shown in Fig. 17(c).
t 7
0.47 0.30 (c)
Figure 17: Principle of the maximum flow minimum cut method.
4.1.3. Normalized (ratio) minimum cut
(a) The weighted graph from Fig. 2, with the assumed source ver- A number of optimization approaches may be employed
tex s = 0 and sink vertex t = 6, and a path between these two to enforce some desired properties on graph clusters. One
vertices for which the maximum flow is equal to the minimum ca- such approach is the normalized minimum cut, which is
pacity (weight) along this path, W57 = 0.15. This maximum flow
value, W57 = 0.15, is then subtracted from all the original edge ca- commonly used in graph theory, and is introduced by pe-
pacities (weights) to yield the new residual edge capacities (weights) nalizing the value of Cut(E, H) by an additional term (cost)
which are shown in red. (b) The final edge capacities (weights) af- to enforce the subsets E and H to be simultaneously as
ter the maximum flows are subtracted for all paths 0 → 3 → 6,
large as possible. An obvious form of the normalized cut
0 → 2 → 4 → 7 → 6, and 0 → 2 → 3 → 6, between vertices s = 0
and t = 6, with the resulting minimum cut now crossing only the (ratio cut) is given by [58]
zero-capacity (zero-weight) edges with its value equal to the sum of  1
their initial capacities (weights), shown in Panel (a) in black. (c) A 1 X
directed form of the undirected graph from (a), with the same path
CutN (E, H) = + Wmn , (33)
NE NH
and the residual capacities (weights) given for both directions. m∈E
n∈H

where NE and NH are the respective numbers of vertices in


the sets E and H. Since NE + NH = N , the term N1E + N1H
reaches its minimum for NE = NH = N/2.

20
0 1 4.1.5. Other forms of the normalized cut
In addition to the two presented forms of the normal-
2
ized cut, based on the number of vertices and volume,
3 other frequently used forms are:
4 1. The sparsity of a cut which is defined as
1 X
ρ(E) = Wmn , (35)
6 5 NE NV−E
7 m∈E
n∈V−E

Figure 18: A clustering scheme based on the minimum normalized where V − E is the set difference of V and E. The
cut of the vertices in the graph from Fig. 2 into two vertex clusters, sparsity of a cut, ρ(E), is related to the normalized
E = {0, 1, 2, 3} and H = {4, 5, 6, 7}. This cut corresponds to the
arbitrarily chosen cut presented in Fig. 16. cut as N ρ(E) = CutN (E, H), since H = V − E and
NE + NV−E = N . The sparsity of a graph is then
defined as the minimum sparsity of a cut. It then
Example 15: Consider again Example 12, and the graph from follows that the cut which exhibits minimum sparsity
Fig. 16. For the sets of vertices, E = {0, 1, 2, 3} and H = and the minimum normalized cut in (33) produce the
{4, 5, 6, 7}, the normalized cut is calculated as CutN (E, H) = same set E.
(1/4 + 1/4)0.79 = 0.395. This cut also represents the minimum
normalized cut for this graph; this can be confirmed by checking 2. The edge expansion of a subset, E ⊂ V, is defined by
all possible cut combinations of E and H in this (small) graph.
Fig. 18 illustrates the clustering of vertices according to the 1 X
α(E) = Wmn , (36)
minimum normalized cut. Notice, however, that in general the NE
m∈E
minimum cut and the minimum normalized cut do not produce n∈V−E
the same vertex clustering into E and H.
with NE ≤ N/2. Observe a close relation of edge
Graph separability. Relevant to this section, the mini-
expansion to the normalized cut in (33).
mum cut value admits a physical interpretation as a mea-
sure of graph separability. An ideal separability is possible 3. The Cheeger ratio of a subset, E ⊂ V, is defined as
if the minimum cut is equal to zero, meaning that there
is no edges between subsets E and H. In Example 15, 1 X
φ(E) = Wmn . (37)
the minimum cut value was CutN (E, H) = 0.395, which is min{VE , VV−E }
m∈E
not close to 0, and indicates that the segmentation of this n∈V−E

graph into two subgraphs would not yield a close approx-


The minimum value of φ(E) is called the Cheeger
imation of the original graph.
constant or conductance of a graph [60]. This form
is closely related to the volume normalized cut in
4.1.4. Volume normalized minimum cut
(34).
A more general form of the normalized cut may also
involve vertex weights when designing the size of subsets 4.2. Spectral methods for graph clustering
E and H. ByPdefining, respectively,Pthe volumes of these
This class of methods is a modern alternative to the
sets as VE = n∈E Dnn and VH = n∈H Dnn , and using
classical direct graph topology analysis, whereby vertex
these volumes instead of the numbers of vertices NE and
clustering is based on the eigenvectors of the graph Lapla-
NH in the definition of the normalized cut in (33), we
cian. Practical spectral methods for graph clustering typ-
arrive at [59]
ically employ several smoothest eigenvectors of the graph
 1 1 X Laplacian.
CutV (E, H) = + Wmn , (34) Simplified algorithms for vertex clustering may even
VE VH
m∈E
n∈H employ only one eigenvector, namely the second (Fiedler,
P [61]) eigenvector of the graph Laplacian, u1 , to yield a
where Dnn = m∈V Wmn is the degree of a vertex n. quasi-optimal clustering or partitioning scheme on a graph.
The vertices with a higher degree, Dnn , are considered as These are proven to be efficient in a range of applications,
structurally more important than the vertices with lower including data processing on graphs, machine learning,
degrees. and computer vision [62]. Despite their simplicity, such
The above discussion shows that finding the normalized algorithms are typically quite accurate, and a number of
minimum cut is also a combinatorial problem, for which studies show that graph clustering and cuts based on the
an approximative spectral-based solution will be discussed second eigenvector, u1 , give a good approximation to the
later in this section. optimal cut [63, 64]. Using more than one smooth eigen-
vector in graph clustering and partitioning will increase
the number of degrees of freedom to consequently yield
21
more physically meaningful clustering, when required for In order to illustrate the interpretation of the smoothness
practical applications in data analytics. index in classical time-domain data processing, the time-domain
For an enhanced insight we shall next review the smooth- form of the eigenvectors/basis functions in the real-valued Fourier
ness index, before introducing the notions of graph spec- analysis (32) is also shown in Fig. 19 (middle). In this case,
tral vectors and their distance, followed by the notions of the basis functions can be considered as the eigenvectors of a
directed circular graph, where the vertices assume the role of
similarity and clustering of vertices.
time instants.
Observe that in all three graphs the smooth eigenvectors,
4.2.1. Smoothness of Eigenvectors on Graphs
u0 and u1 , have similar elements on the neighboring vertices (in
Definition: The smoothness of an eigenvector, uk , is intro- the case of a path graph – time instants), and thus may be con-
duced through its quadratic Laplacian form, uTk Luk , with sidered as smooth data on the corresponding graph domains.
the smoothness index equal to the corresponding eigen- Such similarity does not hold for the fast-varying eigenvectors,
u5 (left of Fig. 19) and u30 (middle and right of Fig. 19),
value, λk , that is
which exhibit a much higher smoothness index.
uTk (Luk ) = uTk (λk uk ) = λk . (38) Remark 23: The eigenvector of the graph Laplacian which
To demonstrate physical intuition behind the use of corresponds to λ0 = 0 is constant (maximally smooth for
quadratic form, uTk Luk , as a smoothness metric of uk , any vertex ordering) and is therefore not appropriate as a
consider template for vertex ordering. The next smoothest eigen-
vector is u1 , which corresponds to the eigenvalue λ1 .
uTk Luk = uTk (D − W)uk . Ordering of vertices for smoothest Fiedler vector.
Then, an n-th element of the vector Luk is given by It is natural to order vertices within a graph in such a way
so that the presentation of the sequence of elements of the
N −1 N −1
X X smoothest eigenvector, u1 , as a function of the vertex in-
Wnm uk (n) − Wnm uk (m), dex, n, is also maximally smooth. This can be achieved by
m=0 m=0
sorting (rank ordering) the elements of the Fiedler vector,
since Dnn =
PN −1
Wnm . Therefore, u1 , in a nondecreasing order. Recall from Remark 12 that
m=0
the isomorphic nature of graphs means that the reindexing
N
X −1 N
X −1   of vertices does not change any graph property. The new
uTk Luk = uk (m) Wmn uk (m) − uk (n) order of graph vertices in the sorted u1 then corresponds
m=0 n=0 to the smoothest sequence of elements of this vector along
N
X −1 N
X −1   the vertex index line.
= Wmn u2k (m) − uk (m)uk (n) . (39) A unique feature of graphs, which renders them indis-
m=0 n=0 pensable in modern data analytics on irregular domains, is
Owing to the symmetry of the weight matrix, W (as shown that the ordering of vertices of a graph can be arbitrary, an
in (5)), we can use Wnm = Wmn to replace the full summa- important difference from classical data analytics where the
tion of u2k (n) over m and n with a half of the summations ordering is inherently sequential and fixed [44]. There-
for both u2k (m) and u2k (n), over all m and n. The same fore, in general, any change in data ordering (indexing)
applies for the term u(m)u(n). With that, we can write would cause significant changes in the results of classical
methods, while when it comes to graphs, owing to their
N −1 N −1
1 X X   topological invariance (as shown in Fig. 10 and Fig. 11
uTk Luk = Wmn u2k (m) − uk (m)uk (n) in the previous section), reordering of vertices would au-
2 m=0 n=0
tomatically imply the corresponding reordering of indices
N −1 N −1
1 X X   within each eigenvector, with no implication on the anal-
+ Wmn u2k (n) − uk (n)uk (m)
2 m=0 n=0 ysis results. However, the presentation of data sensed at
N −1 N −1
the graph vertices, along a line of vertex indices, as in Fig.
1 X X 2
10(left), a common case for practical reasons, would bene-

= Wmn uk (n) − uk (m) ≥ 0. (40)
2 m=0 n=0 fit from an appropriate vertex ordering. Notice that vertex
ordering in a graph is just a one-dimensional simplifica-
Obviously, a small uTk Luk = λk implies that all terms tion of an important paradigm in graph analysis, known
Wnm (uk (n) − uk (m))2 ≤ 2λk are also small, thus indicat- as graph clustering [33, 36, 37, 38, 39, 40, 41].
ing close values of uk (m) and uk (n) for vertices m and n
with significant connections, Wmn . The eigenvectors cor- 4.2.2. Spectral Space and Spectral Similarity of Vertices
responding to a small λk are therefore slow-varying and For a graph with N vertices, the orthogonal eigen-
smooth on a graph. vectors of its graph Laplacian form the basis of an N -
Example 16: An exemplar of eigenvectors with a small, a
dimensional space, called the spectral space. In this way,
moderate and a large smoothness index, λk , is given on the the elements, uk (n), of every eigenvector uk , k = 0, 1, 2,
three graphs in Fig. 19. . . . , N − 1, are assigned to the corresponding vertices, n =

22
Figure 19: Illustration of the concept of smoothness of the graph Laplacian eigenvectors for three different graphs: The graph from Fig. 2
(left), a path graph corresponding to classic temporal data analysis (middle), and an example of a more complex graph with N = 64 vertices
(right). (a) Constant eigenvector, u0 (n), shown on the three considered graphs. This is the smoothest possible eigenvector for which the
smoothness index is λ0 = 0. (b) Slow-varying Fiedler eigenvector (the smoothest eigenvector whose elements are not constant), u1 (n), for
the three graphs considered. (c) Fast-varying eigenvectors, for k = 5 (left), and k = 30 (middle and right). Graph vertices are denoted by
black circles, and the values of elements of the eigenvectors, uk (n), by red lines, for n = 0, 1, . . . , N − 1. The smoothness index, λk , is also
given for each case.

0, 1, 2, . . . , N − 1, as shown in Fig. 20(a). This, in turn, about vertex similarity in the spectral space, or about the
means that a set of elements, u0 (n), u1 (n), u2 (n), . . . , uN −1 (n), spectrum based graph cut, segmentation, and vertex clus-
is assigned to every vertex n, as shown in Fig. 20(b). For tering.
every vertex, n, we can then group these elements into an An analogy with classical signal processing would be to
N -dimensional spectral vector assign a vector of harmonic basis function values at a time
instant (vertex) n, to “describe“ this instant, that is, to
def
qn = [u0 (n), u1 (n), . . . , uN −1 (n)], assign the n-th column of the Discrete Fourier transform
matrix to the instant n. This intuition is illustrated in Fig.
which is associated with the vertex n. Since the elements of 20(a) and 20(b).
the first eigenvector, u0 , are constant, they do not convey The spectral vectors shall next be used to define spec-
any spectral difference to the graph vertices. Therefore, tral similarity of vertices.
the elements of u0 are commonly omitted from the spectral
vector for vertex n, to yield Definition: Two vertices, m and n, are called spectrally
similar if their distance in the spectral space is within
qn = [u1 (n), . . . , uN −1 (n)], (41) a small predefined threshold. The spectral similarity be-
tween vertices m and n is typically measured through the
as illustrated in Fig. 20(b).
Euclidean norm of their spectral space distance, given by
Vertex dimensionality in the spectral space. Now
that we have associated a unique spectral vector qn in (41), def
dmn = kqm − qn k2 .
to every vertex n = 0, 1, . . . , N − 1, it is important to note
that this (N − 1)-dimensional representation of every ver- Spectral Manifold. Once graph is characterized by the
tex in a graph (whereby the orthogonal graph Laplacian original (N − 1)-dimensional spectral vectors, the so ob-
eigenvectors, u1 , u2 , . . . , uN −1 , serve as a basis of that tained vertex positions in spectral vertex representation
representation) does not affect the graph itself; this just may reside near some well defined surface (commonly a
means that the additional degrees of freedom introduced hyperplane) called a spectral manifold which is of a re-
through spectral vectors facilitate more sophisticated and duced dimensionality M < (N − 1). The aim of spec-
efficient graph analysis. For example, we may now talk tral vertex mapping is then to map each spectral vertex
23
representation from the original N -dimensional spectral 4.2.3. Indicator vector
vector space to a new spectral manifold which lies in a re- Remark 21 shows that the combinatorial approach to
duced M -dimensional spectral space, to a position closest minimum cut problem is computationally infeasible, as
to its original (N − 1)-dimensional spectral position. This even for a graph with only 50 vertices we have 5.6 · 1014
principle is related to the Principal Component Analysis such potential cuts.
(PCA) method, and this relation will be discussed later To break this Curse of Dimensionality it would be
in this section. An analogy with classical Discrete Fourier very convenient to relate the problem of the minimiza-
Transform analysis would imply a restriction of the spec- tion of the normalized cut in (33) and (34) to that of
tral analysis from the space of N harmonics to the reduced eigenanalysis of graph Laplacian. To this end, we shall
space of the M slowest-varying harmonics (excluding the introduce the notion of indicator vector x on a graph,
constant one). for which the elements take subgraph-wise constant val-
These spectral dimensionality reduction considerations ues within each disjoint subset (cluster) of vertices, with
suggest to restrict the definition of spectral similarity to these constants taking different values for different clus-
only a few lower-order (smooth) eigenvectors in the spec- ters of vertices (subset-wise constant vector ). While this
tral space of reduced dimensionality. For example, if the does not immediately reduce the computational burden
spectral similarity is restricted to the two smoothest eigen- (the same number of combinations remains as in the brute
vectors, u1 and u2 (omitting the constant u0 ), then the force method), the elements of x now uniquely reflect the
spectral vector for a vertex n would become assumed cut of the graph into disjoint subsets E, H ⊂ V.
Establishing a further link with only the smoothest eigen-
qn = [u1 (n), u2 (n)], vector of the graph Laplacian will convert the original,
computationally intractable, combinatorial minimum cut
as illustrated in Fig. 20(c) and Fig. 21(a). If for two
problem into a manageable algebraic eigenvalue problem,
vertices, m and n, the values of u1 (m) are close to u1 (n) for which the computation complexity is of the O(N 3 ) or-
and the values of u2 (m) are close to u2 (n), then these der. By casting the problem into the linear algebra frame-
two vertices are said to be spectrally similar, that is, they work, complexity of calculation can be additionally re-
exhibit a small spectral distance, dmn = kqm − qn k2 .
duced through efficient eigenanalysis methods, such as the
Finally, the simplest spectral description uses only one
Power Method which sequentially computes the desired
(smoothest nonconstant) eigenvector to describe the spec-
number of largest eigenvalues and the corresponding eigen-
tral content of a vertex, so that the spectral vector reduces vectors, at an affordable O(N 2 ) computations per itera-
to a spectral scalar tion, as shown in the Appendix.
However, unlike the indicator vector, x, the smoothest
qn = [qn ] = [u1 (n)].
eigenvector (corresponding to the smallest nonzero eigen-
whereby the so reduced spectral space is a one-dimensional value) of graph Laplacian is not subset-wise constant, and
line. so such solution would be approximate, but computation-
Example 17: The two-dimensional and three-dimensional spec-
ally feasible.
tral vectors, qn = [u1 (n), u2 (n)] and qn = [u1 (n), u2 (n), u3 (n)], Remark 24: The concept of indicator vector can be in-
of the graph from Fig. 2 are shown in Fig. 21, for n = 2 and troduced through the analysis with an ideal minimum cut
n = 6. of a graph, given by
Spectral embedding: The mapping from the reduced
X
Cut(E, H) = Wmn = 0,
dimensionality spectral space back onto the original ver- m∈E
tices is referred to as spectral Embedding. n∈H
We can proceed in two ways with the reduced spectral that is, when considering an already disjoint graph for
vertex space representation: (i) to assign the reduced di- which Cut(E, H) = 0 indicates that there exist no edges
mension spectral vectors to the original vertex positions, between the subsets E and H, that is, Wmn = 0 for m ∈ E,
for example, in the form of vertex coloring, as a basis and n ∈ H. Obviously, this ideal case can be solved with-
for subsequent vertex clustering (Section 4.2.3), or (ii) to out resorting to the combinatorial approach, since this
achieve new vertex positioning in the reduced dimension- graph is already in the form of two disconnected sub-
ality space of eigenvectors (reduced spectral space), using graphs, defined by the sets of vertices E and H. For such
eigenmaps (Section 4.4). Both yield similar information a disconnected graph, the second eigenvalue of the graph
and can be considered as two sides of the same coin [65]. Laplacian is λ1 = 0, as established by the graph Laplacian
For visualization purposes, we will use colors of the RGB property L2 . When λ1 = 0, then
system to represent the spectral vector values in a reduced N −1 N −1  2
(one, two, or three) dimensional spectral space. Vertices
X X
2uT1 Lu1 = Wmn u1 (n) − u1 (m) = 2λ1 = 0,
at the original graph positions will be colored according to m=0 n=0
the spectral vector values.
which follows from (38) and (40). Since all terms in the
last sum are nonnegative, this implies that they must be
24
0 0

1 1

2 2

3 3

4 4

5 5

6 6

7 7

0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
(a) (b)

0 0

1 1

2 2

3 3

4 4

5 5

6 6

7 7

0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
(c) (d)
Figure 20: Illustration of spectral vectors for the graph from Fig. 2, with N = 8 vertices. For an intuitive analogy with the classical Discrete
Fourier Transform, notice that the complex harmonic basis functions within the DFT would play the role of eigenvectors in graph spectral
representation, uk , k = 0, 1, . . . , 7. Then, the spectral vectors, qn , n = 0, 1, . . . , 7, would be analogous to the basis functions of the inverse
Discrete Fourier transform (excluding the first constant element).

zero-valued, that is, the eigenvector u1 is subset-wise con- x(n) = −1/NH for n ∈ H. The membership of a vertex,
stant, with u1 (n) = u1 (m) = c1 for m, n ∈ E and u1 (n) = n, to either the subset E or H of the ideal minimum cut is
u1 (m) = c2 for m, n, ∈ H. Since the eigenvector u1 is or- therefore uniquely defined by the sign of indicator vector
PN −1
thogonal to the constant eigenvector u0 , then n=0 u1 (n) = x = u1 . This form of x is not normalized to unit energy,
0. A possible solution for u1 (n), that satisfies the subset- as its scaling by any constant would not influence solution
wise constant form and has zero mean, is u1 (n) = c1 = for vertex clustering into subsets E or H.
1/NE for n ∈ E and u1 (n) = c2 = −1/NH for n ∈ H. We For a general graph, and following the above reasoning,
can conclude that the problem of finding an ideal mini- we here consider two specific subset-wise constant forms of
mum cut can indeed be solved by introducing an indicator the indicator vector, x, which are based on
vector x = u1 , such that x(n) = 1/NE for n ∈ E and

25
0 1

2
4
3
4 5 6 7
0 1 2 3
5
6
7

0 1

2
4
3
1 4 5
0 2 3 6 7
5
7
6

(a)

0 1

2
4
3
4 5 6 7
0 1 2 3
5
6
7

0 1

2
4
3
1 4 5
0 2 3 6 7
5
7
6

0 1

2
4
3
3 4 5 6
0 1 2 7
5
6
7

(b)
Figure 21: Illustration of the spectral vectors, qn = [u1 (n), u2 (n)] and qn = [u1 (n), u2 (n), u3 (n)], for the Laplacian matrix of the graph
in Fig. 2. (a) Two-dimensional spectral vectors, q2 = [u1 (2), u2 (2)] and q6 = [u1 (6), u2 (6)]. (b) Three-dimensional spectral vectors,
q2 = [u1 (2), u2 (2), u3 (2)] and q6 = [u1 (6), u2 (6), u3 (6)]. For clarity, the spectral vectors are shown on both the vertex index axis and directly
on graph.

(i) The number of vertices in disjoint subgraphs, Before proceeding further with the analysis of these
( two forms of indicator vector (in the next two remarks), it
1
, for n ∈ E is important to note that if we can find the vector x which
x(n) = NE 1 (42) minimizes the normalized cut, CutN (E, H) in (33), then
− NH , for n ∈ H,
the elements of vector x (their signs, sign(x(n)) = 1 for
where NE is the number of vertices in E, and NH is the n ∈ E and sign(x(n)) = −1 for n ∈ H) may be used to
number of vertices in H, and decide whether to associate a vertex, n, to either the set
(ii) The volumes of the disjoint subgraphs, E or H of the minimum normalized cut.
( Remark 25: The normalized cut, CutN (E, H), defined in
1
, for n ∈ E (33), for the indicator vector x with the elements x(n) =
x(n) = VE 1 (43)
− VH , for n ∈ H, 1/NE for n ∈ E and x(n) = −1/NH for n ∈ H, is equal to
the Rayleigh quotient of the matrix L and vector x, that
where the volumes of the sets, VE and VH , are defined as is
the sums of allPvertex degrees, Dnn ,P
in the corresponding xT Lx
CutN (E, H) = T . (44)
subsets, VE = n∈E Dnn and VH = n∈H Dnn . x x
26
To prove this relation we shall rewrite (40) as mink {uTk Luk } = mink {λk }. After neglecting the trivial
solution λ0 = 0, since it produces a constant eigenvector
N −1 N −1
1 X X  2 u0 , we next arrive at mink {λk } = λ1 and x = u1 . Note
xT Lx = Wmn x(n) − x(m) . (45) that this solution yields a general form of vector x that
2 m=0 n=0
minimizes (44). However, such a form does not necessar-
For all vertices m and n within the same subgraph, that is, ily correspond to a subset-wise constant indicator vector,
such that m ∈ E and n ∈ E, the elements of vector x are x.
therefore the same and equal to x(m) = x(n) = 1/NE . In
turn, this means that the terms (x(n) − x(m))2 in (45) are 4.2.4. Bounds on the minimum cut
zero-valued. The same holds for any two vertices belonging In general, the subset-wise constant indicator vector,
to the set H, that is, for m ∈ H and n ∈ H. Therefore, x, may be written as a linear combination of the graph
only the terms corresponding to the edges which define the Laplacian eigenvectors, uk , k = 1, 2, . . . , N − 1, to give
cut, when m ∈ E and n ∈ H, and vice versa, remain in the
sum, and they are constant and equal to (x(n) − x(m))2 = x = α1 u1 + α2 u2 + · · · + αN −1 uN −1 . (49)
(1/NE − (−1/NH ))2 , to yield
This kind of vector expansion onto the set of eigenvectors
 1 1 2 X shall be addressed in more detail in Part 2 of this mono-
xT Lx = + Wmn graph. Note that the constant eigenvector u0 is omitted
NE NH
m∈E
n∈H
since the indicator vector is zero-mean by definition (or-
 1 1  thogonal to a constant vector). The calculation of coeffi-
= + CutN (E, H), (46) cients αi would require the indicator vector (that is, the
NE NH
sets E and H) to be known, leading again to the combina-
where the normalized cut, CutN (E, H), is defined in (33). torial problem of vertex set partitioning. It is interesting
Finally, from the energy of the indicator vector, xT x = e2x , to note that the quadratic form of indicator vector, x,
given by (49) is equal to xT Lx = α12 λ1 + α22 λ2 + · · · +
NE NH 1 1 αN2
−1 λN −1 , and that it assumes the minimum value for
xT x = ||x||22 = e2x = 2 + 2 = + , (47)
NE NH NE NH α1 = 1, α2 = · · · = αN −1 = 0, that is, when x = u1 , which
corresponds to imposing the normalized energy condition,
which proves (44). xT x = α12 + α22 + · · · + αN
2
−1 = 1. In other words, we now
The same analysis holds if the indicator vector is nor- arrive at a physically meaningful bound
malized to unit energy, whereby x(n) = 1/(NE ex ) for n ∈ E
and x(n) = −1/(NH ex ) for n ∈ H, with ex defined in (47) λ1 ≤ xT Lx = CutN (E, H).
as ex = ||x||2 .
Observe that this inequality corresponds to the lower Cheeger
We can therefore conclude that the indicator vector,
bound for the minimum normalized cut in (33).
x, which solves the problem of minimization of the nor-
malized cut, is also a solution to (44). This minimization Remark 26: If the space of approximative solutions for
problem, for the unit energy form of the indicator vector, the indicator vector, x, is relaxed to allow for vectors that
can also be written as are not subset-wise constant (while omitting the constant
eigenvector of the graph Laplacian, u0 ), the approximative
min{xT Lx} subject to xT x = 1. (48) solution becomes x = u1 (as previously shown and illus-
trated in Example 18). The above analysis indicates that
In general, this is again a combinatorial problem, since all this solution is quasi-optimal, however, despite its simplic-
possible combinations of subsets of vertices, E and H, to- ity, the graph cut based on only the second graph Lapla-
gether with the corresponding indicator vectors, x, should cian eigenvector, u1 , typically produces a good approxi-
be considered. mation to the optimal (minimum normalized) cut.
For a moment we shall put aside the very specific (subset- It has been shown that the value of the true normal-
wise constant) form of the indicator vector and consider ized minimum cut in (33), when the indicator vector x
the general minimization problem in (48). This problem is subset-wise constant, is bounded on both sides (upper
can be solved using the method of Lagrange multipliers, and lower) with the constants which are proportional to
with the corresponding cost function the smallest nonzero eigenvalue, uT1 Lu1 = λ1 , of the graph
Laplacian. The simplest form of these bounds (Cheeger’s
L(x) = xT Lx − λ(xT x − 1).
bounds) for the cut defined by (37), has the form [66, 67]
From ∂L(x)/∂xT = 0, it follows that Lx = λx, which is λ1 def p
precisely the eigenvalue/eigenvector relation for the graph ≤ φ(V) = min{φ(E)} ≤ 2λ1 . (50)
2 E⊂V
Laplacian L, the solution of which is λ = λk and x = uk ,
for k = 0, 1, . . . , N − 1. In other words, upon replac- This shows that the eigenvalue λ1 is also a good mea-
ing x by uk in the term min{xT Lx} above, we obtain sure of a graph separability and consequantly the quality of
27
spectral clustering in the sense of a minimum normalized It is important to note that, in general, clustering re-
cut. The value of the minimum normalized cut of a graph sults based on the three forms of eigenvectors,
(also referred to as Cheeger’s constant, conductivity, or
isoperimetric number of a graph) may also be considered (i) the smoothest graph Laplacian eigenvector,
as a numerical measure of the presence of a “bottleneck” (ii) the smoothest generalized eigenvector of the Lapla-
in a graph. cian, and
4.2.5. Indicator vector for normalized graph Laplacian (iii) the smoothest eigenvector of the normalized Lapla-
We shall know address the cut based on normalized cian,
graph Laplacian, in light of the above analysis.
are different. While the method (i) favors the clustering
Remark 27: The volume normalized cut, CutV (E, H),
into subsets with (almost) equal number of vertices, the
defined in (34), is equal to
methods (ii) and (iii) favor subsets with (almost) equal
xT Lx volumes (defined as sums of the vertex degrees in the sub-
CutV (E, H) = , (51) sets). Also note that the methods (i) and (ii) approximate
xT Dx
the indicator vector in different eigenvector subspaces. All
where the corresponding, subset-wise constant, indicator three methods will produce the same clustering result for
vector has the values x(n) = 1/VE for n ∈ E and x(n) = unweighted regular graphs, for which the volumes of sub-
−1/VH for n ∈ H, while the volumes of the sets, VE and sets are proportional to the number of their corresponding
VH , are defined in (34). vertices, while the eigenvectors for all the three Laplacian
The proof is identical that given in Remark 25. For the forms are the same in regular graphs, as shown in (13).
normalized indicator vector, we have xT Dx = 1, so that
the minimization problem in (51) reduces to Generalized eigenvectors of the graph Laplacian
and eigenvectors of the normalized Laplacian. Re-
min{xT Lx} subject to xT Dx = 1. (52) call that the matrix D−1/2 is of a diagonal form, and with
positive elements. Then, the solution to (52) which is equal
If the solution space is restricted to the space of general- to the generalized eigenvector of the graph Laplacian, and
ized eigenvectors of the graph Laplacian, defined by the solution to (53) which is equal to the eigenvector of
Luk = λk Duk , the normalized Laplacian, are related as sign(y) = sign(x)
or sign(v1 ) = sign(u1 ). This indicates that if the sign of
then the solution to (52) becomes the corresponding eigenvector is used for the minimum cut
approximation (clustering), both results are the same.
x = u1 ,
where u1 is the generalized eigenvector of the graph Lapla- 4.3. Spectral clustering implementation
cian that corresponds to the lowest nonzero eigenvalue. Spectral clustering is most conveniently implemented
The eigenvectors of the normalized Laplacian, LN = using only low-dimensional spectral vectors, with the the
D−1/2 LD−1/2 , may also be used in optimal cut approx- simplest case when only a one-dimensional spectral vector
imations since the minimization problem in (51) can be is used as indicator vector. More degrees of freedom can
rewritten using the normalized Laplacian through a change be achieved by clustering schemes which use two or three
of the variable, to yield Laplacian eigenvectors, as discussed next.

x = D−1/2 y, 4.3.1. Clustering based on only one (Fiedler) eigenvector


which allows us to arrive at the following form [63] From the analysis in the previous section, we can con-
clude that only the smoothest eigenvector, u1 , can produce
min{yT D−1/2 LD−1/2 y} = min{yT LN y}, a good (quasi-optimal) approximation to the problem of
minimum normalized cut graph clustering into two sub-
subject to yT y = 1. (53)
sets of vertices, E and H. Within the concept of spectral
If the space of solutions to this minimization problem is vectors, presented in Section 4.2.2, this indicates that the
relaxed to the eigenvectors, vk , of the normalized graph simplest form of spectral vector, qn = u1 (n), based on just
Laplacian, LN , then y = v1 . For more detail on the var- one (the smoothest) Fiedler eigenvector, u1 , can be used
ious forms of the eigenvalues and eigenvectors of graph for efficient spectral vertex clustering. Since the spectral
Laplacian, we refer to Table 1. vector qn = u1 (n) is used as an approximative solution to
It is obvious now from (52) and (53) that the relation the indicator vector for within the minimum normalized
of the form x = D−1/2 y also holds for the corresponding cut definition, its values may be normalized. One such
eigenvectors of the normalized graph Laplacian, vk , and normalization
the generalized eigenvectors of the Laplacian, vk , that is, yn = qn /||qn ||2 (54)

uk = D−1/2 vk .
28
yields a two-level form of the spectral vector 0.5
yn = [u1 (n)/||u1 (n)||2 ] = [sign(u1 (n))],
0
and represents a step before clustering, as proposed in [63].
This is justified based on the original form of the indica- -0.5
tor vector, whose sign indicates the vertex association to
0 1 2 3 4 5 6 7 (a)
either subset E or H. For illustrative representation of the
normalized spectral vector, we may use a simple two-level
colormap and assign one of two colors to each vertex. Such 0.5
a simple algorithm for clustering is given in Algorithm 1
(for an algorithm with more options for clustering and rep- 0
resentation see the Appendix (Algorithm 3) and Remarks
-0.5
30 and 33).
0 1 2 3 4 5 6 7 (b)
Algorithm 1. Clustering using the graph Laplacian.
Input: 0.5
• Graph vertices V = {0, 1, . . . , N − 1}
• Graph Laplacian L 0

-0.5
1: [U, Λ] ← eig(L)
2: yn ← U (2, n) 0 1 2 3 4 5 6 7 (c)
3: E ← {n | yn > 0}, H ← {n | yn ≤ 0}
Output: 0.5
• Vertex clusters E and H
0

Example 18: Consider the graph from Fig. 2 and its Lapla- -0.5
cian eigenvector, u1 , from Fig. 13. The elements of this single
eigenvector, u1 , are used to encode the vertex colormap, as 0 1 2 3 4 5 6 7 (d)
shown in Fig. 23(a). Here, the minimum element of u1 was
used to select the red color (vertex 7), while the white color at Figure 22: Principle of the minimum normalized cut based clustering
vertex 0 was designated by the maximum value of this eigen- and its spectral (graph Laplacian eigenvector) based approximation;
all vectors are plotted against the vertex index n. (a) The ideal indi-
vector. Despite its simplicity, this scheme immediately allows
cator vector for a minimum normalized cut, CutN (E, H), normalized
us to threshold u1 and identify two possible graph clusters, to unit energy. (b) The graph Laplacian eigenvector, u1 . (c) The
{0, 1, 2, 3}, and {4, 5, 6, 7}, as illustrated in Fig. 23(b). The generalized eigenvector of the Laplacian, u1 . (d) The eigenvector of
same result would be obtained if the sign of u1 was used to the normalized Laplacian, v1 . The eigenvectors in (c) and (d) are
color the vertices, and this would correspond to the minimum related as u1 = D−1/2 v1 . In this case, the signs of the indicator
normalized cut clustering in Fig. 18. vector and the eigenvectors, sign(x), sign(u1 ), and sign(v1 ) are the
same in all the four vectors. The signs of these vectors then all may
The true indicator vector, x, for the minimum normal-
be used to define the minimum normalized cut based clustering into
ized cut of this graph is presented in Fig. 22(a). This vec- E and H, that is, the association of a vertex, n, to either the subset
tor is obtained by checking all the 127 possible cut combi- E or subset H .
nations of E and H in this small graph, together with the
corresponding x(n). The signs of the elements of this vec-
tor indicate the way for optimal clustering into the subsets Example 19: Consider the graph from Fig. 2, with the weight
E = {0, 1, 2, 3} and H = {4, 5, 6, 7}, while the minimum cut matrix, W, in (4), and the graph Laplacian eigenvector u1
value is CutN (E, H) = xT Lx = 0.395. Fig. 22(b) shows an (shown in Fig. 13, Fig. 19(b)(left), and Fig. 22(b)). When this
approximation of the indicator vector within the space of the eigenvector is thresholded to only two intensity levels, sign(u1 ),
graph Laplacian eigenvector, u1 . The quadratic form of the two graph clusters are obtained, as shown in Fig. 23 (right).
eigenvector, u1 , is equal to uT1 Lu1 = λ1 = 0.286. As shown in In an ideal case, these clusters may even be considered as inde-
(49), note that the true indicator vector, x, can be decomposed pendent graphs (graph segmentation being the strongest form
into the set of all graph Laplacian eigenvectors, uk , and written of clustering); this can be achieved by redefining the weights as
as their linear combination. Wnm = 0, if m and n are in different clusters, and Wnm = Wnm
The generalized Laplacian eigenvector, u1 = [0.37, 0.24, otherwise [63], for the corresponding disconnected (segmented)
0.32, 0.13, −0.31, −0.56, −0.34, −0.58], which is an approxi-
mation of the indicator vector for the minimum volume nor-
malized cut in (34), is presented in Fig. 22(c). In this case, the
generalized eigenvector indicates the same clustering subsets,
E = {0, 1, 2, 3} and H = {4, 5, 6, 7}. The eigenvector of the
normalized Laplacian, v1 , is shown in Fig. 22(d).

29
0 1 0 1 large and indicates that the segmentation in Example 19 is not
2 2 “close”.
3 3 As an illustration, consider three hypothetical but practi-
4 4 cally relevant scenarios: (i) λ2 = 0 and λ3 = 1, (ii) λ2 = 0 and
λ3 = ε, (iii) λ2 = 1 and λ3 = 1 + ε, where ε is small positive
6 5 6 5 number and close to 0. According to Remark 18, the graph
7 7
(a) (b) in case (i) consists of exactly two disconnected components,
Figure 23: Vertex coloring for the graph from Fig. 2, with
and the subsequent clustering and segmentation is appropri-
its spectrum shown in Fig. 13. (a) The eigenvector, u1 , ate, with δr = 1. For the case (ii), the graph consists of more
of the Laplacian matrix of this graph, given in (8), is nor- than two almost disconnected components and the clustering in
malized and is used to define the red color intensity levels two sets can be performed in various ways, with δr = 1/ε. Fi-
within the colormap for every vertex. For this example, u1 = nally, in the last scenario the relative gap is very small, δr = ε,
[0.42, 0.38, 0.35, 0.15, −0.088, −0.34, −0.35, −0.54]T . The largest thus indicating that the behavior of the segmented graph is not
element of this eigenvector is u1 (0) = 0.42 at vertex 0, which indi-
“close”to the original graph, that is, L̂ is not “close”to L, and
cates that this vertex should be colored by the lowest red intensity
(white), while the smallest element is u1 (7) = −0.54, so that vertex thus any segmentation into two disconnected subgraphs would
7 is colored with the strongest red color intensity. (b) Simplified produce inadequate results.
two-level coloring based on the sign of the elements of eigenvector
u1 . Remark 28: The thresholding of elements of the Fiedler
vector, u1 , of the normalized graph Laplacian, LN = D−1/2 LD−1/2 ,
performed in order to cluster the graph is referred to as
graph, whose weight matrix, Ŵ, is given by the Shi – Malik algorithm [59, 68]. Note that similar re-
sults would have been obtained if clustering was based on
0

0 0.23 0.74 0.24 0 0 0 0
 the thresholding of elements of the smoothest eigenvector
corresponding to the second largest eigenvalue of the nor-
1 0.23 0 0.35 0 0 0 0 0 
malized weight matrix, WN = D−1/2 WD−1/2 (Perona

2 0.74 0.35 0 0.26 0 0 0 0 
– Freeman algorithm [69, 68]). This becomes clear after
 
3  0.24 0 0.26 0 0 0 0 0 
.
Ŵ = 
4 0 0 0 0 0 0.51 0 0.14 
 recalling that the relation between the normalized weight
5 0 0 0 0 0.51 0 0 0.15 
 and graph Laplacian matrices is given by
6 0 0 0 0 0 0 0 0.32 
7 0 0 0 0 0.14 0.15 0.32 0 LN = D−1/2 LD−1/2 = I − D−1/2 WD−1/2 ,
0 1 2 3 4 5 6 7
LN = I − WN . (57)
(55)
The eigenvalues of these two matrices are therefore related
4.3.2. “Closeness”of the segmented and original graphs (L ) (W )
as λk N = 1 − λk N , while they share the same eigenvec-
The issue of how “close” the behavior of the weight tors.
matrix of the segmented graph, Ŵ, in (55) (and the cor-
responding L̂) is to the original W and L, in (4) and (8), 4.3.3. Clustering based on more than one eigenvector
is usually considered within matrix perturbation theory.
More complex clustering schemes can be achieved when
It can be shown that a good measure of the “closeness”
using more than one Laplacian eigenvector. In turn, ver-
is the so-called eigenvalue gap, δ = λ2 −λ1 , [63], that is the
tices with similar values of several slow-varying eigenvec-
difference between the eigenvalue λ1 associated with the
tors, uk , would exhibit high spectral similarity.
eigenvector u1 , which is used for segmentation, and the
The concept of using more than one eigenvec-
next eigenvalue, λ2 , in the graph spectrum of the normal-
tor in vertex clustering and possible subsequent graph
ized graph Laplacian (for additional explanation see Ex-
segmentation was first introduced by Scott and Longuet-
ample 23). For the obvious reason of analyzing the eigen-
Higgins [70]. They used k eigenvectors of the weight ma-
value gap at an appropriate scale, we suggest to consider
trix W to form a new N × k matrix V, for which a fur-
the relative eigenvalue gap
ther row normalization was performed. Vertex clustering
λ2 − λ1 λ1 is then performed based on the elements of the matrix
δr = =1− . (56) VVT .
λ2 λ2
For the normalized weight matrix, WN , the Scott and
The relative eigenvalue gap value range is within the inter- Longuet-Higgins algorithm reduces to the corresponding
val 0 ≤ δr ≤ 1, since the eigenvalues are nonnegative real- analysis with k eigenvectors of the normalized graph Lapla-
valued numbers sorted into a nondecreasing order. The cian, LN . Since WN and LN are related by (57), they thus
value of this gap may be considered as large if it is close have the same eigenvectors.
to the maximum eigengap value, δr = 1. Example 21: Consider two independent normalized cuts of
Example 20: The Laplacian eigenvalues for the graph in Fig. a graph, where the first cut splits the graph into the sets of
23 are λ ∈ {0, 0.29, 0.34, 0.79, 1.03, 1.31, 1.49, 2.21}, with the vertices E1 and H1 , and the second cut further splits all vertices
relative eigenvalue gap, δr = (λ2 − λ1 )/λ2 = 0.15, which is not

30
into the sets E2 and H2 , and define this two-level cut as colormaps were used for each eigenvector. The smallest eigen-
values were λ0 = 0, λ1 = 0.0286, λ2 = 0.0358, λ3 = 0.0899,
CutN 2(E1 , H1 , E2 , H2 ) = CutN (E1 , H1 ) + CutN (E2 , H2 ) λ4 = 0.104, and λ5 = 0.167, so that the largest relative gap
(58) was obtained when u1 , and u2 were used for clustering, with
the corresponding eigenvalue gap of δr = 1 − λ2 /λ3 = 0.6.
where both CutN (Ei , Hi ), i = 1, 2, are defined by (33).
If we now introduce two indicator vectors, x1 and x2 , for Remark 30: k-means algorithm. The above clustering
the two respective cuts, then, from (44) we may write schemes are based on the quantized levels of spectral vec-
xT1 Lx1 xT Lx2 tors. These can be refined using the k-means algorithm,
CutN 2(E1 , H1 , E2 , H2 ) = T
+ 2T . (59) that is, through postprocessing in the form of unsupervised
x1 x1 x2 x2
learning and in the following way,
As mentioned earlier, finding the indicator vectors, x1 and x2 , (i) After an initial vertex clustering is performed by group-
which minimize (59) is a combinatorial problem. However, if ing the vertices into Vi , i = 1, 2, . . . , k nonoverlapping ver-
the space of solutions for the indicator vectors is now relaxed
tex subsets, a new spectral vector centroid, ci , is calculated
from the subset-wise constant form to the space spanned by
as
the eigenvectors of the graph Laplacian, then the approxima-
tive minimum value of the two cuts, CutN 2(E1 , H1 , E2 , H2 ), is ci = meann∈Vi {qn },
obtained for x1 = u1 and x2 = u2 , since u1 and u2 are maxi- for each cluster of vertices Vi ;
mally smooth but not constant (for the proof see (63)-(64) and (ii) Every vertex, n, is then reassigned to its nearest (most
for the illustration see Example 22).
similar) spectral domain centroid, i, where the spectral
For the case of two independent cuts, for convenience, we
distance (spectral similarity) is calculated as kqn − ci k2 .
may form the indicator N × 2 matrix Y = [x1 , x2 ], so that the
corresponding matrix of the solution (within the graph Lapla- This two-step algorithm is iterated until no vertex changes
cian eigenvectors space) to the two normalized cuts minimiza- clusters. Finally, all vertices in one cluster are colored
tion problem, has the form based on the corresponding common spectral vector ci (or
visually, a color representing ci ).
Q = [u1 , u2 ]. Clustering refinement using the k-means algorithm is
The rows of this matrix, qn = [u1 (n), u2 (n)], are the spectral illustrated later in Example 29.
vectors which are assigned to each vertex, n. Example 23: Graphs represent quite a general mathematical
The same reasoning can be followed for the cases of three formalism, and we will here provide only one possible physi-
or more independent cuts, to obtain an N × M indicator ma- cal interpretation of graph clustering. Assume that each ver-
trix Y = [x1 , x2 , . . . , xM ] with the corresponding eigenvector tex represents one out of the set of N images, which exhibit
approximation, Q, the rows of which are the spectral vectors both common elements and individual differences. If the edge
qn = [u1 (n), u2 (n), . . . , uM (n)]. weights are calculated so as to represent mutual similarities
between these images, then spectral vertex analysis can be in-
Remark 29: Graph clustering in the spectral domain may
terpreted as follows. If the set is complete and with very high
be performed by assigning the spectral vector,
similarity among all vertices, then Wmn = 1, and λ0 = 0, λ1 =
N, λ2 = N, . . . , λN −1 = N , as shown in Remark 19. The rel-
qn = [u1 (n), . . . , uM (n)]
ative eigenvalue gap is then δr = (λ2 − λ1 )/λ2 = 0 and the
in (41), to each vertex, n, and subsequently by grouping segmentation is not possible.
Assume now that the considered set of images consists of
the vertices with similar spectral vectors into the corre-
two connected subsets with the respective numbers of N1 and
sponding clusters [63, 65]. N2 ≥ N1 of very similar photos within each subset. In this case,
Low dimensional spectral vectors (up to M = 3) can be the graph consists of two complete components (sub-graphs).
represented by color coordinates of, for example, standard According to Remarks 18 and 19, the graph Laplacian eigenval-
RGB coloring system. To this end, it is common to use ues are now λ0 = 0, λ1 = 0, λ2 = N1 , . . . , λN1 = N1 , λN1 +1 =
different vertex colors, which represent different spectral N2 , . . . , λN −1 = N2 . Then, this graph may be well segmented
vectors, for the visualization of spectral domain cluster- into two components (sub-graphs) since the relative eigenvalue
ing. gap is now large, δr = (λ2 − λ1 )/λ2 = 1. Therefore, this case
can be used for collaborative data processing within each of
Example 22: Fig. 24 illustrates several spectral vector cluster- these subsets. The analysis can be continued and further re-
ing schemes for the graph in Fig. 19 (right), based on the three fined for cases with more than one eigenvector and more than
smoothest eigenvectors u1 , u2 , and u3 . Clustering based on two subsets of vertices. Note that segmentation represents a
the eigenvector u1 , with qn = [u1 (n)], is shown in Fig. 24(b), “hard-thresholding” operation of cutting the connections be-
clustering using the eigenvector u2 only, with qn = [u2 (n)], is tween vertices in different subsets, while clustering refers to
shown in Fig. 24(d), while Fig. 24(e) illustrates the clustering just a grouping of vertices, which exhibit some similarity, into
based on the eigenvectors u3 , when qn = [u3 (n)]. Clustering subsets, while keeping their mutual connections.
based on the combination of the two smoothest eigenvectors u1 ,
and u2 , with spectral vectors qn = [u1 (n), u2 (n)], is shown in Example 24: For enhanced intuition, we next consider a real-
Fig. 24(g), while Fig. 24(h) illustrates clustering based on the world dataset with 8 images, shown in Fig. 25. The connectiv-
three smoothest eigenvectors, u1 , u2 , and u3 , whereby the spec- ity weights were calculated using the structural similarity index
tral vector qn = [u1 (n), u2 (n), u3 (n)]. In all cases, two-level

31
Figure 24: Spectral vertex clustering schemes for the graph from Fig. 19. (a) The eigenvector, u1 , of the Laplacian matrix (plotted in red
lines on vertices designated by black dots) is first normalized and is then used to designate (b) a two-level blue colormap intensity (through
its signs) for every vertex (blue-white circles). (c) The eigenvector, u2 , of the Laplacian matrix is normalized and is then used to provide
(d) a two-level green colormap intensity for every vertex. (e) The eigenvector, u3 , of the Laplacian matrix is normalized and used as (f) a
two-level red colormap intensity for every vertex. (g) Clustering based on the combination of the eigenvectors u1 and u2 . (h) Clustering
based on the combination of the eigenvectors u1 , u2 , and u3 . Observe an increase in degrees of freedom with the number of eigenvectors
used; this is reflected in the number of detected clusters, starting from two clusters in (b) and (d), via four clusters in (g), to 8 clusters in (h).

32
2

0.32 1

0.33
0.49

0.
0

37
3
0.29

0
0.3
9

0.30
0.2

1 7 4
3
0.
0.31

0.29

1 5
0.3
0.48

6
0.3
0.64

0.40 Figure 26: Graph topology for the real-world images from Fig. 25.

Figure 25: A graph representation of a set of the real-world images


which exhibit an almost constant background but different head ori-
entation, which moves gradually from the left profile (bottom left)
to the right profile (top right). The images serve as vertices, while
the edges and the corresponding weight matrix are defined through
the squared structural similarity index (SSIM) between images, with (a) (b)
Wmn = SSIM2T (m, n), and hard thresholded at 0.28 to account for
the contribution of the background to the similarity index, that is, Figure 27: Graph clustering structure for the images from Fig.
SSIMT (m, n) = hard(SSIM(m, n), 0.53). 25. (a) Vertices are clustered (colored) using the row-normalized
spectral Fiedler eigenvector to give the spectral vector u1 , qn =
[u1 (n)]/||[u1 (n)]||2 . (b) Clustering scheme whereby spectral val-
(SSIM), with an appropriate threshold [71]. The so obtained ues of vertices are calculated using the two smoothest eigenvectors,
weight matrix, W, is given by qn = [u1 (n), u2 (n)], which are then employed to designate the col-
ormap for the vertices. Recall that the so obtained similar vertex
colors indicate spectral similarity of the images from Fig. 25.
 
0 0 0.49 0.33 0.29 0.31 0 0 0
1 0.49 0 0.32 0 0.30 0 0 0.29
2

0.33 0.33 0 0.37 0.30 0 0 0
 Laplacian for this example are λk ∈ {0, 0.32, 0.94, 1.22, 1.27,
3

0.29 0 0.37 0 0, 31 0 0 0
 1.31, 1.39, 1.55}. The largest relative eigenvalue gap is there-
W= , .
fore between the eigenvalues λ1 = 0.42 and λ2 = 1.12, and

4 0.31 0.30 0.30 0.31 0 0.31 0.30 0.29
indicates that the best clustering will be obtained in a one-
 
5  0 0 0 0 0.31 0 0.40 0.48
dimensional spectral space (with clusters shown in Fig. 27(a)).
 
6  0 0 0 0 0.30 0.40 0 0.64
7 0 0.29 0 0 0.29 0.48 0.64 0 However, the value of such cut would be large, Cut({0, 1, 2, 3, 4}, {5, 6, 7})
0 1 2 3 4 5 6 7 1.19, while the value of the normalized cut,
(60)
CutN ({0, 1, 2, 3, 4}, {5, 6, 7}) ∼ λ1 = 0.42,
The standard graph form for this real-world scenario in Fig.
25 is shown in Fig. 26, together with the corresponding im- indicates that the connections between these two clusters are
age/vertex indexing. Notice the almost constant background in too significant for a segmented graph to produce a “close” ap-
all 8 images (the photos were taken in the wild by a “hand-held proximation of the original graph with only two components
device”), and that the only differences between the images are (disconnected subgraphs). Given the gradual change in head
in that the model gradually moved her head position from the orientation, this again conforms with physical intuition, and the
left profile (bottom left) to the right profile (top right). There- subsequent clustering based on two smoothest eigenvectors, u1
fore, the two frontal face positions, at vertices n = 4 and n = 0, and u2 , yields three meaningful clusters of vertices correspond-
exhibit higher vertex degrees than the other head orientations, ing to the “left head orientation” (red), “frontal head orien-
which exemplifies physical meaningfulness of graph represen- tation ” (two shades of pink), and “right head orientation”
tations. The normalized spectral vectors for this graph, qn = (yellow).
[u1 (n)]/||[u1 (n)]||2 and qn = [u1 (n), u2 (n)]/||[u2 (n)]|2 were ob-
tained as the generalized eigenvectors of the graph Laplacian, Example 25: Minnesota roadmap graph. Three eigen-
and were used to define the coloring scheme for the graph clus- vectors of the graph Laplacian matrix, u2 , u3 , and u4 , were
tering in Fig. 27. Recall that similar vertex colors indicate used as the coloring templates to represent the spectral sim-
spectral similarity of the images assigned to the corresponding ilarity and clustering in the benchmark Minnesota roadmap
vertices. graph, shown in Fig. 28. The eigenvectors u0 and u1 were
The eigenvalues of the generalized eigenvectors of the graph omitted, since their corresponding eigenvalues are λ0 = λ1 = 0

33
Figure 28: Vertex coloring in the benchmark Minnesota road-
map graph using the three smoothest Laplacian eigenvectors
{u2 ,u3 ,u4 }, as coordinates in the standard RGB coloring system
(a three-dimensional spectral space with the spectral vector qn =
[u2 (n), u3 (n), u4 (n)] for every vertex, n). The vertices with similar
colors are therefore also considered spectrally similar. Observe three
different clusters, characterized by the shades of predominantly red,
green, and blue color, that correspond to intensities defined by the
eigenvectors u2 (n), u3 (n), and u4 (n).

(due to an isolated vertex in the graph data which behaves as


a graph component, see Remark 18). The full (nonquantized)
colormap scale was used to color the vertices (that is, represent
three-dimensional spectral vectors). As elaborated above, re-
gions where the vertices visually assume similar colors are also
spectrally similar, and with similar behavior of the correspond-
ing slow-varying eigenvectors.
Example 26: Brain connectivity graph. Fig. 29 shows
the benchmark Brain Atlas connectivity graph [72, 73], for
which the data is given in two matrices: “Coactivation ma-
trix”, Ŵ, and “Coordinate matrix”. The “Coordinate ma-
trix”contains the vertex coordinates in a three-dimensional Eu-
clidean space, whereby the coordinate of a vertex n is defined Figure 29: Brain atlas (top) and its graph (bottom), with ver-
by the n-th row of the “Coordinate matrix”, that is, [xn , yn , zn ]. tex coloring based on the three smoothest generalized eigenvec-
In our analysis, the graph weight matrix, W, was empiri- tors, u1 , u2 , and u3 , of graph Laplacian. The spectral vector,
cally formed by: qn = [u1 (n), u2 (n), u3 (n)] is employed as the coordinates in the RGB
(i) Thresholding the “Coactivation matrix”, Ŵ, to preserve coloring scheme [72, 73].
only the strongest connections within this brain atlas, for ex-
ample, those greater than 0.1 max{Ŵmn }, as recommended in
[73]; were next used to define the spectral vectors
(ii) Only the edges between the vertices m and n, whose
qn = [u1 (n), u2 (n), u3 (n)]
Euclidean distance satisfies dmn ≤ 20 are kept in the graph
representation. for each vertex, n = 0, 1, . . . , N − 1. The elements of this spec-
The elements, Wmn , of the brain graph weight matrix, W, tral vector, qn , were then used to designate the corresponding
are therefore obtained from the corresponding elements, Ŵmn , RGB coordinates for the coloring of the vertices of the brain
of the “Coactivation matrix” as graph, as shown in Fig. 29.
(
Ŵmn , if Ŵmn > 0.1 max{Ŵmn } and dmn ≤ 20
Wmn = 4.4. Vertex Dimensionality Reduction Using the Laplacian
0, elsewhere.
(61) Eigenmaps
The brain connectivity graph with the so defined weight We have seen that graph clustering can be used for
matrix, W, is shown in Fig. 29 (bottom), collaborative processing on the set of data which is rep-
The three smoothest generalized eigenvectors, u1 , u2 and resented by the vertices within a cluster. In general, any
u3 , of the corresponding graph Laplacian matrix, L = W − D, form of the presentation of a graph and its correspond-
ing vertices, that employs the eigenvectors of the graph
34
Laplacian may be considered as a Laplacian eigenmap. M = 3 we can now clearly divide students into the three affinity
The idea which underpins eigenmap-based approaches pre- groups (designated by the read, blue, and black). Although the
sented here is to employ spectral vectors, qn , to define the obtained groups (clusters) are logically ordered even in the one-
new positions of the original vertices in such a “transform- dimensional case in 30(g), observe that we cannot use M = 1
domain” space so that spectrally similar vertices appear for precise grouping since there is no enough gap between the
groups. However, even in this case, if we re-cast the vertices on
spatially closer than in the original vertex space.
a circle instead on a line (by connecting two ends of a line), and
Remark 31: The Laplacian eigenmaps may be employed draw the connecting edges (the same edges as in Figs. 30(d),
for vertex dimensionality reduction, while at the same time (e) and (f)) we can see the benefit of a graph representation
preserving the local properties and natural connections even after such a radical dimensionality reduction.
within the original graph [65]. The dimensionality reduction principle can also be demon-
strated based on Example 24, whereby each vertex is a 640×480
Consider a vertex n, n = 0, 1, . . . , N − 1, which re-
RBG color image which can be represented as a vector in the
sides in an L-dimensional space RL , at the position de- L = 640 × 480 × 3 = 921600 dimensional space. Indeed, using
fined by the L-dimensional vector rn . A spectral vector spectral vectors with M = 2, this graph can be presented in a
for vertex n is then defined in a new lower-dimensional two-dimensional space as in Fig. 25.
(M -dimensional) space, with M < N , by keeping the M
Within the Laplacian eigenmaps method, we may use
smoothest eigenvectors of graph Laplacian, u0 , u1 , . . . ,
any of the three forms of graph Laplacian eigenvectors in-
uM −1 . Upon omitting the constant eigenvector, u0 , this
troduced in Section 4.2.3. The relations among these three
gives the new basis designated by the spectral vector
presentations are explained in Section 4.2.3 and Table 1.
qn = [u1 (n), . . . , uM −1 (n)]. (62) A unified algorithm for all three variants of the Laplacian
eigenmaps, and the corresponding clustering methods, is
Since M < L, this provides the desired dimensionality given in Algorithm 3 in the Appendix.
reduction of the vertex space. The concepts of spectral Remark 32: The Laplacian eigenmaps are optimal in the
vector-based vertex dimensionality reduction, and physi- sense that they minimize an objective function which pe-
cal meaning associated with the spectral vector space rep- nalizes for the distance between the neighboring vertices in
resentation are illustrated in the enxt example. the spectral space. This ensures that if the vertices at the
Example 27: Vertex dimensionality reduction. Consider positions rm and rn in the original high-dimensional L-
a set of N = 70 students and their marks in 40 lecture courses. dimensional space are “close” in the sense of some data
Every student can be considered as a vertex located in the association metric, then they will also be close in the
original L = 40 dimensional space at the position rn , where Euclidean sense in the reduced M -dimensional spectral
rn (k) is a mark for the n-th student at k-th course. Assume space, where their positions are defined by the correspond-
that the marks are within the set {2, 3, 4, 5} and that some ing spectral vectors, qm and qn .
students have affinity to certain subsets of courses (for example,
social sciences, natural sciences and skills). This set-up can be
represented in a tabular (70 × 40) compact matrix form as in 4.4.1. Euclidean distances in the space of spectral vectors.
Fig. 30(a), where the columns contain the marks for every We shall prove the “distance preserving” property of
student (the marks are color coded). the above spectral mapping in an inductive way. Assume
The average marks per student and per course are shown that a graph is connected, i.e., λ1 6= 0. The derivation is
in Fig. 30(b) and 30(c). Observe the limitations of this repre- based on the quadratic form in (40)
sentation, as for example, the average marks cannot be used to
determine student affinities to the subsets of their courses. N −1 N −1 2
1 X X
We can now create a graph representation by connecting uTk Luk = uk (m) − uk (n) Wmn
with edges students with similar marks. In our example, the 2 m=0 n=0
edge weights were determined through a distance in the 40-
dimensional feature (marks) space, as which states that uTk Luk is equal to the weighted sum of
( 2 squared Euclidean distances between the elements of the
e−krm −rn k2 /70 , for krm − rn k2 ≥ 7 m-th and n-th eigenvector at vertices m and n, for all m
Wmn =
0, otherwise. and n. Recall that uTk Luk is also equal to λk , by definition
(see the elaboration after (38)).
With the so obtained connectivity, this graph is presented in
Fig. 30(d), whereby the vertices (students) are randomly po-
Single-dimensional case. To reduce the original L-
sitioned in a plane and connected with edges. We shall now dimensional vertex space to a single-dimensional path graph
calculate the normalized Laplacian eigenvectors and remap the with vertex coordinates qn = uk (n), the minimum sum of
vertices according to the three-dimensional, two-dimensional the weighted squared distances between the vertices m and
and one-dimensional spectral vectors, qn , defined by (62) that
is, for M = 3, M = 2, and M = 1. In this way, the vertex
dimensionality is reduced from the original L = 40 to a much
lower M  L. The corresponding graph representations are
respectively shown in Figs. 30(e), (f), and (g). For M = 2 and

35
5 5
0.2
10

0 54
15 4
course index

20 35
-0.2 51 11 43
60 42
4
25 3 1
0.2
30
0.1
35 2 0
0.1 0.15
-0.1 0 0.05
-0.1 -0.05
40 -0.2 -0.15
10 20 30 40 50 60 70
student (vertex)

5 0.3

0.25
4
0.2

3 0.15
1
0.1 35
2
0 10 20 30 40 50 60 70 0.05
54
0
11
-0.05 42 51
5
60 4
-0.1
43
4
-0.15
-0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15
3

2
0 5 10 15 20 25 30 35 40

60 42 51 11 1 54 35 4 43

1
51 42
60 11
4

35 42

1 60
43

54
54
51 11
43
35 4

Figure 30: Illustration of spectral dimensionality reduction through an example of exam marks for a cohort of students. (a) Each of the 70
columns (students) represents a 40-dimensional vector with student marks. Therefore the dimensionality of the original representation space
is L = 40. (b) Average mark per student. (c) Average mark per course. (d) Two-dimensional graph representation of the matrix in a), where
the individual students are represented by randomly positioned vertices in the plane. To perform vertex (student) dimensionality reduction
we can use spectral vectors to reduce their original L = 40 dimensional representation space to (e) M = 3, (f) M = 2, and (g) M = 1
dimensional spectral representation spaces. (h) Vertices from path graph g) positioned on a circle (by connecting the ends of the line) which
allows us to also show the edges.

36
n, that is 0.23
0 1
0.
74
1
N −1 N −1
0.
24 0.35
X X 2
||q(m) − q(n)||22 Wmn

0.23
2 m=0 n=0 0.26
3 0.2
4
N −1 N −1  2
1 X X 4
= uk (m) − uk (n) Wmn = λk

0.32
2 m=0 n=0 4
0.1

0.5
1
will be obtained with the new positions of vertices, desig- 6 0.32 0.15 5
7 (a)
nated by qn = [u1 (n)], and for k = 1, since mink,λk 6=0 {λk } =
λ1 is the smallest nonzero eigenvalue.
Two-dimensional case. If we desire to reduce the orig-
inal L-dimensional vertex representation space to a two-
dimensional spectral space, designated by qn = [uk (n), ul (n)] 7 65 4 3 2 1 0
and defined through any two eigenvectors of the graph (b)
Laplacian, uk and ul , then the minimum sum of the weighted
squared distances between all vertices, m and n, given by Figure 31: Principle of vertex dimensionality reduction based on
the spectral vectors. (a) The weighted graph from Fig. 2 with
N −1 N −1 its vertices in a two-dimensional space. (b) The graph from
1 X X
||qm − qn ||22 Wmn (a) with its vertices located along a line (one-dimensional ver-
2 m=0 n=0 tex space), whereby the positions on the line are defined by
the one-dimensional spectral vector, qn = [u1 (n)], with u1 =
N −1 N −1
1 X X 2 [0.42, 0.38, 0.35, 0.15, −0.088, −0.34, −0.35, −0.54]T . Observe that
= uk (m) − uk (n) Wmn + this dimensionality reduction method may be used for clustering,
2 m=0 n=0 based on the vertex position on the line.
N −1 N −1 2
1 X X
ul (m) − ul (n) Wmn
2 m=0 n=0 31(b) illustrates the same graph but represented in a reduced
single-dimensional vertex space (a line). The vertex positions
= uTk Luk + uTl Lul = λk + λl (63) on the line are defined by the spectral vector, qn = [u1 (n)], with
u1 = [0.42, 0.38, 0.35, 0.15, −0.088, −0.34, −0.35, −0.54]T .
will be obtained with the new spectral positions, qn =
[uk (n), ul (n)], such that qn = [u1 (n), u2 (n)], since Remark 33: After the vertices are reordered according to
the Fiedler eigenvector, u1 , Example 28 indicates the pos-
min {λk + λl } = λ1 + λ2 (64) sibility of clustering refinement through a recalculation of
k,l,k6=l,kl6=0
normalized cuts. For the set of vertices V = {0, 1, 2, . . . , N −
for nonzero k and l, and keeping in mind that λ1 ≤ λ2 ≤ 1}, Fig. 31(b) illustrates their ordering along a line, with
λ3 ≤ · · · ≤ λN −1 . The same reasoning holds for new three- the new order {v1 , v2 , . . . , vN } = {7, 6, 5, 4, 3, 2, 1, 0}. In-
and higher-dimensional spectral representation spaces for stead of using the sign of u1 to cluster the vertices, we can
the vertices, which yields (62) as the optimal vertex posi- recalculate the normalized cuts, CutN (Ep , Hp ), with this
tions in the reduced M -dimensional vertex space. sequential vertex order, where Ep = {v1 , v2 , . . . , vp } and
The same relations hold for both the generalized eigen- Hp = {vp+1 , vp+2 , . . . , vN }, for p = 1, 2, . . . , N − 1. The
vectors of the Laplacian, defined by Luk = λk Duk , and estimation of the minimum normalized cut then becomes
the eigenvectors of the normalized Laplacian, defined by
D−1/2 LD−1/2 vk = λk vk . The only difference is in their (Ep , Hp ) = arg min{CutN (Ep , Hp )}.
p
respective normalization conditions, uTk Duk and vkT vk .
The relation between the eigenvectors of the normalized This method is computationally efficient since only (N −1)
graph Laplacian, vk , and the generalized eigenvectors of cuts, CutN (Ep , Hp ), need to be calculated. In addition,
the graph Laplacian, uk , in the form uk = D−1/2 vk , fol- the cuts CutN (Ep , Hp ) can be calculated recursively, us-
lows from their definitions (see Remark 27). Since the ing the previous CutN (Ep−1 , Hp−1 ) and the connectivity
elements u1 (n) and u2 (n) are obtained by multiplying the parameters (degree, Dpp , and weights, Wpm ) of vertex p.
elements v1 (n) and v2 (n) by the same value, 1/Dnn , that Any normalized cut form presented in Section 4.1 can also
is, [u1 (n), u2 (n)] = [v1 (n), v2 (n)]/Dnn , their normalized be used instead of CutN (Ep , Hp ). When the Cheeger ra-
forms of uk and vk are identical, tio, defined in (37), is used in this minimization, then an
qn [u1 (n), u2 (n)] [v1 (n), v2 (n)] upper bound on the normalized cut can be obtained as [67]
= = .
||qn ||2 ||[u1 (n), u2 (n)]||2 ||[v1 (n), v2 (n)]||2 p p
min{φ(Ep )} ≤ 2λ1 ≤ 2 φ(V), (65)
p
4.4.2. Examples of graph analysis in the spectral space
Example 28: The graph from Fig. 2, where the vertices reside where φ(V) denotes the combinatorial (true) minimum cut,
in a two-dimensional plane, is shown in Fig. 31(a), while Fig. with bounds given in (50).
37
Example 29: We shall now revisit the graph in Fig. 24 and ex-
amine the clustering schemes based on (i) standard Laplacian
eigenvectors (Fig. 32), (ii) generalized eigenvectors of graph
Laplacian (Fig. 33), and (iii) eigenvectors of the normalized
Laplacian (Fig. 34). Fig. 32(b) illustrates Laplacian eigenmaps
based dimensionality reduction for the graph from Fig. 24(g),
with the two eigenvectors, u1 and u2 , serving as new vertex co-
ordinates, and using the same vertex coloring scheme as in Fig.
24(g). While both the original and the new vertex space are
two-dimensional, we can clearly see that in the new vertex space (a) (b)
the vertices belonging to the same clusters are also spatially
closer, which is both physically meaningful and exemplifies the
practical value of the eigenmaps. Fig. 32(c) is similar to Fig.
32(b) but is presented using the normalized spectral space co-
ordinates, qn = [u1 (n), u2 (n)]/||[u1 (n), u2 (n)]||2 . In Fig. 32(d)
the clusters are refined using the k-means algorithm, as per Re-
mark 30. The same representations are repeated and shown in
Fig. 33(a)-(d) for the representation based on the generalized
eigenvectors of the graph Laplacian, obtained as a solution to
Luk = λk Duk . Finally, in Fig. 34(a)-(d), the Laplacian eigen-
maps and clustering are produced based on the eigenvectors (c) (d)
of the normalized graph Laplacian, LN = D−1/2 LD−1/2 . As
expected, the eigenmaps obtained using the generalized Lapla- Figure 32: Principle of Laplacian eigenmaps and clustering based
cian eigenvectors, in Fig. 34(b), and the eigenvectors of the on the eigenvectors of the graph Laplacian, L. (a) The original
normalized Laplacian, in Fig. 33(b), are different; however, graph from Fig. 24, with the spectral vector qn = [u1 (n), u2 (n)],
they reduce to the same eigenmaps after spectral vector nor- defined by the graph Laplacian eigenvectors {u1 ,u2 }, which is used
to cluster (color) the vertices. (b) Two-dimensional vertex posi-
malization, as shown Fig. 34(c) and Fig. 33(c). After the tions obtained through Laplacian eigenmaps, with the spectral vec-
k-means based clustering refinement was applied, in all three tor qn = [u1 (n), u2 (n)] serving as the vertex coordinates (the 2D
cases two vertices switched their initial color (cluster), as shown Laplacian eigenmap). While both the original and this new vertex
in Fig. 32(d), Fig. 33(d), and Fig. 34(d). space are two-dimensional, the new eigenmaps-based space is advan-
Observe that the eigenmaps obtained with the normalized tageous in that it emphasizes vertex spectral similarity in a spatial
forms of the generalized eigenvectors of the Laplacian and the way (physical closeness of spectrally similar vertices). (c) The graph
from (b) but produced using normalized spectral space coordinates
eigenvectors of the normalized Laplacian are the same, and
qn = [u1 (n), u2 (n)]/||[u1 (n), u2 (n)]||2 , as in (54). (d) The graph
in this case their clustering performances are similar to those from (c) with clusters refined using the k-means algorithm, as per
based on the eigenmaps produced with the eigenvectors of the Remark 30. The centroids of clusters are designated by squares of the
original Laplacian. same color. The complexity of graph presentation is also significantly
reduced through eigenmaps, with most of the edges between strongly
Remark 34: In general, an independent quantization of connected vertices being very short and located along a circle.
two smoothest eigenvectors of the graph Laplacian, u1 and
u2 , will produce four clusters. However, that will not be
the case if we analyze the graph with an almost ideal eigen- form [sign(u1 ), sign(u2 )]. The elements of these indicator
value gap (unit value) between λ2 and λ3 . In other words, vectors, [sign(u1 (n)), sign(u2 (n))], have therefore a subset-
when the gap δr = 1−λ2 /λ3 tends to 1, that is, λ2 → 0 and wise constant vector form, assuming exactly three different
λ1 < λ2 → 0, then this case corresponds to a graph with vector values that correspond to individual disjoint sets E,
exactly three disjoint subgraph components, with vertices H, and K.
belonging to the disjoint sets E, H, and K. Without loss This procedure can be generalized up to every individ-
of generality, assume NE > NH > NK . The minimum ual vertex becoming a cluster (no clustering). To charac-
normalized cut, CutN (E, H ∪ K) is then obtained with the terize N independent disjoint sets we will need (N − 1)
first indicator vector x1 (n) = c11 for n ∈ E and x1 (n) = c12 spectral vectors, if the constant eigenvector, u0 , is omit-
for n ∈ H ∪ K. The second indicator vector will produce ted.
the next minimum normalized cut, CutN (E ∪ K, H) with Example 30: The two-dimensional Laplacian eigenmap for
x2 (n) = c21 for n ∈ E ∪ K and x2 (n) = c22 for n ∈ H. the benchmark Minnesota roadmap graph (with M = 2) is
Following the same analysis as in the case of one indi- given in Fig. 35. In this new space, the spectral vectors
cator vector and the cut of graph into two disjoint sub- qn = [u2 (n), u3 (n)], are used as the coordinates of the new
sets of vertices, we can immediately conclude that the two vertex positions. Here, two vertices with similar slow-varying
smoothest eigenvectors, u1 and u2 , which correspond to eigenvectors are located close to one another in the new coor-
λ2 → 0 and λ1 → 0, can be used to form an indicator ma- dinate system defined by u2 and u3 . This illustrates that the
eigenmaps can be considered as a basis for “scale-wise ”graph
trix Y = [x1 , x2 ], so that the corresponding matrix of the
representation.
solution (within the graph Laplacian eigenvector space) to
the minimization problem of two normalized cuts, has the Example 31: The Laplacian eigenmaps of the Brain Atlas

38
0.05

0.04

0.03

0.02

(a) (b)
0.01

-0.01

-0.02

(c) (d) -0.03


-0.04 -0.03 -0.02 -0.01 0 0.01 0.02 0.03

Figure 33: Principle of Laplacian eigenmaps and clustering based on


the generalized eigenvectors of the graph Laplacian, obtained as a Figure 35: Laplacian eigenmaps for the Minnesota road-map graph,
solution to Luk = λk Duk . Vertex coloring was produced using the produced based on the new two-dimensional spectral vertex posi-
same procedure as in Fig. 32. tions defined by the Laplacian eigenvectors {u2 ,u3 } as the vertex
coordinates (the 2D Laplacian eigenmap).

graph from Fig. 29, whose original vertex locations reside in


an L = 3 dimensional space, is presented in a new reduced
M = 2 dimensional space which is defined based on the two
smoothest eigenvectors, u1 and u2 . This example of vertex
dimensionality reduction, with new vertex locations but with
the original edges kept, is shown in Fig. 36.
The generalized eigenvectors of the graph Laplacian, uk ,
for k = 1, 2, 3, 4, 5, 6, are shown in Fig. 37(a) using the stan-
(a) (b) dard colormap in both the original three-dimensional and the
reduced two-dimensional space, as shown in Fig. 37(b).
Example 32: Vertices of a three-dimensional Swiss roll graph
are shown in Fig. 39(a). The vertex locations in this original
L = 3 dimensional space are calculated as xn = αn cos(αn )/(4π),
yn = βn , and zn = αn sin(αn )/(4π), n = 0, 1, 2, . . . , N − 1, with
αn randomly taking values between π and 4π, and βn from
−1.5 to 1.5. The edge weights are calculated using Wmn =
exp(−d2mn /(2κ2 )), where dmn is the square Euclidean distance
between the vertices m and n, and Wmn = 0 if dmn ≥ 0.15 with
κ = 0.1. The resulting three-dimensional Swiss roll graph is
(c) (d) shown in Fig. 39(b), while Fig. 39(c) shows the same graph but
with vertices colored (clustered) using the normalized graph
Figure 34: Principle of Laplacian eigenmaps and clustering based
Laplacian eigenvectors, u1 (n) and u2 (n), as a colormap. The
on the eigenvectors of the normalized graph Laplacian, LN =
D−1/2 LD−1/2 . Vertex coloring was performed using the same same vectors are then used in Fig. 39(d) as the new coordi-
procedure as in Fig. 32. The eigenvectors of the normalized nates in the reduced two-dimensional Laplacian eigenmap ver-
graph Laplacian, vk , are related to the generalized eigenvectors tex space (M = 2) for the Swiss roll graph.
of the graph Laplacian, uk , through uk = D−1/2 vk , as stated
in Remark 27. This means that the signs of these two eigenvec-
tors are the same, sign(uk ) = sign(vk ). Since in order to ob- 4.5. Pseudo-inverse of Graph Laplacian-Based Mappings
tain u1 (n) and u2 (n), the elements v1 (n) and v2 (n) are multiplied The graph Laplacian is a singular matrix (since λ0 = 0)
by the same value, 1/Dnn , then [u1 (n), u2 (n)]/||[u1 (n), u2 (n)]||2 =
[v1 (n), v2 (n)]/||[v1 (n), v2 (n)]||2 , thus yielding the same graph forms for which an inverse does not exist. To deal with this issue,
in (c) and (d) in both this figure and in Fig. 33. the pseudo-inverse of the graph Laplacian, L+ , is defined

39
0.08
the original graph Laplacian, u0 , u1 , . . . , uN −1 . The
eigenmaps for which the spectral coordinates are scaled
0.06 based on the eigenvalues of the pseudo-inverse of graph
Laplacian can be interpreted within the Principal Compo-
0.04 nent Analysis (PCA) framework in the following way.
Notice that the M -dimensional eigenmaps based on the
0.02 pseudo-inverse of the Laplacian are the same as those for
the original graph Laplacian, since they share the same
0 eigenvectors. If the spectral vectors qn = [u1 (n), u2 (n),
. . . , uM (n)] are scaled with the square roots of the eigen-
-0.02 values of the Laplacian pseudo-inverse, we obtain

u1 (n) u2 (n) uM (n)


-0.04 qn = [ √ , √ , . . . , √ ]
λ1 λ2 λM
-0.06
The elements of this spectral vector are now equal to the
first M elements (omitting 0 · u0 (n)) of the full-dimension
-0.08 spectral vector
-0.1 -0.05 0 0.05 0.1
(a) qn = [u1 (n), u2 (n), . . . , uN −1 (n)]Λ̄−1/2 , (67)
1

0.8 where Λ̄ is a diagonal matrix with elements λ1 , λ2 , . . . , λN −1 .

0.6
4.5.1. Commute time mapping
0.4 Physical meaning of the new vector positions in the
spectral space, defined by (67), is related to the notion of
0.2
commute time, which is a property of a diffusion process
0 on a graph [74, 75]. The commute time, CT (m, n) be-
tween vertices m and n is defined as the expected time for
-0.2 the random walk to reach vertex n starting from vertex
m, and then to return. The commute time is therefore
-0.4
proportional to the Euclidean distance between these two
-0.6 vertices, with the vertex positions in the new spectral space
defined by qn in (67), that is
-0.8
N
X −1
-1 CT 2 (m, n) = VV ||qm − qn ||22 = VV (qi (m) − qi (n))2 ,
-1 -0.5 0 0.5 1
i=1
(b)
PN −1
where VV is the volume of the whole graph, VV = n=0 Dnn .
Figure 36: Brain atlas representation based on normalized spec-
tral vectors. (a) A two-dimensional Laplacian eigenmap based on To put this into perspective, in a graph representation
the generalized Laplacian eigenvectors. The original L = 3 dimen- of a resistive electric circuit/network, for which the edge
sional graph from Fig. 29 is reduced to a two-dimensional rep- weights are equal to the conductances (inverse resistances,
resentation based on the two smoothest eigenvectors, u1 (n) and
see Part 3), the commute time, CT (m, n), is defined as
u2 (n), which both serve as spectral coordinates and define color
templates in the colormap, as in Fig. 29. (b) Eigenmaps from (a) the equivalent resistance between the electric circuit nodes
but in the space of normalized spectral space coordinates, qn = (vertices) m and n [76].
[u2 (n), u3 (n)]/||[u2 (n), u3 (n)]||2 , with the complexity of graph rep- The covariance matrix of the scaled spectral vectors in
resentation now significantly reduced. Observe that most edges only
exists between strongly connected vertices located along the circle.
(67) is given by
N −1
1 X T 1
as a matrix that satisfies the property S= q qn = Λ̄−1 .
N n=0 n N
 
0 01×(N −1)
LL+ = . (66) In other words, the principal directions in the reduced di-
0(N −1)×1 I(N −1)×(N −1) mensionality space of M eigenvectors, u1 , u2 , . . ., uM , cor-
The eigenvalues of the graph Laplacian pseudo-inverse are respond to the maximum variance of the graph embedding,
therefore the inverses of the original eigenvalues, {0, 1/λ1 , since 1/λ1 > 1/λ2 >, · · · , > 1/λM . This, in turn, directly
. . . , 1/λN −1 }, while it shares the same eigenvectors with corresponds to principal component analysis (PCA).

40
Figure 37: Generalized eigenvectors, uk , k = 1, 2, 3, 4, 5, 6, of the graph Laplacian of the Brain Atlas graph, shown using vertex coloring in
the original three-dimensional vertex space. Each panel visualizes a different uk , k = 1, 2, 3, 4, 5, 6.

0.08 0.08 0.08


k=1 k=2 k=3
0.06 0.06 0.06

0.04 0.04 0.04

0.02 0.02 0.02

0 0 0

-0.02 -0.02 -0.02

-0.04 -0.04 -0.04

-0.06 -0.06 -0.06

-0.08 -0.08 -0.08


-0.1 -0.05 0 0.05 0.1 -0.1 -0.05 0 0.05 0.1 -0.1 -0.05 0 0.05 0.1

0.08 0.08 0.08


k=4 k=5 k=6
0.06 0.06 0.06

0.04 0.04 0.04

0.02 0.02 0.02

0 0 0

-0.02 -0.02 -0.02

-0.04 -0.04 -0.04

-0.06 -0.06 -0.06

-0.08 -0.08 -0.08


-0.1 -0.05 0 0.05 0.1 -0.1 -0.05 0 0.05 0.1 -0.1 -0.05 0 0.05 0.1

Figure 38: Laplacian eigenmaps of the Brain Atlas graph in the reduced two-dimensional space defined by the two smoothest generalized
eigenvectors of the graph Laplacian, u1 and u2 . The panels each visualize a different generalized eigenvector, uk , k = 1, 2, 3, 4, 5, 6.

41
1 1

0.5 0.5

0 0

-0.5 -0.5

-1 -1

1 1
0 1 (a) 0 1 (b)
-1 -1 -0.5 0 0.5 -1 -1 -0.5 0 0.5

0.15
1

0.1
0.5
0.05
0
0

-0.5 -0.05

-1 -0.1

1 -0.15
0 1 (e) -0.1 -0.05 0 0.05
-1 -1 -0.5 0 0.5

0.15
1

0.1
0.5
0.05
0
0

-0.5 -0.05

-1 -0.1

1 -0.15
0 1 (c) -0.1 -0.05 0 0.05
-1 -1 -0.5 0 0.5

Figure 39: Laplacian eigenmaps based dimensionality reduction for the Swiss roll graph. (a) Vertex locations for the Swiss roll graph in
the original L = 3 dimensional space with N = 500 points (vertices). (b) The the Swiss roll graph with edges whose weights are calculated
based on the Euclidean distances between vertices. (c) The Swiss roll graph with vertices colored using the normalized graph Laplacian
eigenvectors, u1 (n) and u2 (n), as a colormap. (d) The same vectors are used as the new coordinates (spectral vectors) in a reduced two-
dimensional Laplacian eigenmap vertex space (M = 2). The vertices with high similarity (similar values of the smoothest eigenvectors) are
located close to one another, thus visually indicating the expected similarity of data observed at these vertices. (e) Clustering of the Swiss
roll graph, in the original L = 3 dimensional space, using two the smoothest eigenvectors, u1 (n) and u2 (n). (f) Clustering of the Swiss roll
graph using the two smoothest eigenvectors, u1 (n) and u2 (n), presented in the M = 2 Eigenmap space, where for every vertex its spatial
position (quadrant of the coordinate system) indicates the cluster where it belongs.

42
Remark 35: Two-dimensional case comparison. The When considering all vertices together, such probabilities
two-dimensional spectral space of the standard graph Lapla- can be written in a matrix form, within the weight of a
cian eigenvectors is defined by u1 and u2 , while the spec- random walk matrix, defined as in (10), by
tral vector in this space is given by
P = D−1 W. (71)
qn = [u1 (n), u2 (n)]. (68)
Diffusion distance. The Diffusion distance between the
In the case of commute time mapping, the two-dimensional
vertices m and n, denoted by Df (m, l), is equal to the
spectral domain of the vertices becomes
distance between the vector (N -dimensional ordered set)
u1 (n) u2 (n) of probabilities for a random walk to move from a vertex
qn = [ √ , √ ], (69)
λ1 λ2 m to all other vertices (as in (70)), given by
that is, the commute time mapping is related √ to the graph pm = [Pm0 , Pm1 , . . . , Pm(N −1) ]
Laplacian mapping through axis scaling by 1/ λk .
We can conclude that when λ1 ≈ λ2 , the two mappings and the corresponding vector of probabilities for a random
in (68) and (69) are almost the same, when normalized. walk to move from a vertex n to all other vertices, given
However, when λ1  λ2 , the relative eigenvalue gap by
between the one dimensional and two-dimensional spec- pn = [Pn0 , Pn1 , . . . , Pn(N −1) ],
tral space is large, since δr = 1 − λ1 /λ2 is close to 1. This
means that the segmentation into two disjoint subgraphs that is
will be “close” to the original graph, while at the same
Df2 (m, n) = ||(pm − pn )D−1/2 ||22 VV
time this also indicates that the eigenvector u2 does not
N −1 
contribute to a new “closer” segmentation (in the sense X 2 1
of Section 4.3.2), since its gap δr = 1 − λ2 /λ3 is not = Pmi − Pni VV
i=0
Dii
small. Therefore, the influence of u2 should be reduced, as
compared to the standard spectral vector of graph Lapla- PN −1
where VV = n=0 Dnn is constant for a given graph,
cian where both u1 and u2 employ unit weights to give
which is equal to the sum of degrees (volume) of all graph
qn = [u1 (n), u2 (n)]. Such downscaling of the influence
vertices in V.
of the almost irrelevant eigenvector, u2 , when λ1  λ2 ,
is equivalent to the commute time Example 33: For the graph from Fig. 2, with its weight ma-
q mapping, since qn = trix, W, and the degree matrix, D, given respectively in (4)
[ u√1 (n)
λ1
, u√2 (n)
λ2
] = √1λ [u1 (n), u2 (n) λλ12 ] ∼ [u1 (n), 0]. and (6), the random walk weight matrix in (71) is of the form
1
For example, for the graph from Example 29, shown in
Fig. 32(a), the commute time mapping will produce the  
p0 0 0.19 0.61 0.20 0 0 0 0
same vertex presentation as in Fig. 32(b), which is ob-
p1  0.28 0 0.43 0 0.28 0 0 0 
tained with the eigenvectors of the graph Laplacian, when  
 0.47
p2  0.22 0 0.16 0.15 0 0 0 
the vertical axis, u2 , is scaled by 
 0.29
p3  0 0.32 0 0 0 0.39 0 
P= 
p4  0 0.21 0.21 0 0 0.46 0 0.12
r r
λ1 0.0286

 
= = 0.8932. p5  0 0 0 0 0.77 0 0 0.23 
λ2 0.0358 
p6  0 0 0 0.50 0 0 0 0.50


This eigenmap will also be very close to the eigenmap in p7 0 0 0 0 0.23 0.25 0.52 0
Fig. 32(b), produced based on the graph Laplacian eigen- 0 1 2 3 4 5 6 7
vectors and the spectral vector qn = [u1 (n), u2 (n)]. (72)
with VV = 7.46.
Therefore, the diffusion distance between, for example, the
4.5.2. Diffusion (Random Walk) Mapping
vertices m = 1 and n = 3, for the t = 1 step, is
Finally, we shall now relate the commute time mapping

to the diffusion mapping. Df (1, 3) = ||(p1 − p3 )D−1/2 ||2 VV = 1.54,
Definition: Diffusion on a graph deals with the problem
of propagation along the edges of a graph, whereby at the while the diffusion distance between the vertices m = 6 and
n = 3 is Df (6, 3) = 2.85. From this simple example, we can
initial step, t = 0, the random walk starts at a vertex n.
see that the diffusion distance is larger for vertices m = 6 and
At the next step t = 1, the walker moves from its current n = 3 than for the neighboring vertices m = 1 and n = 3. This
vertex n to one of its neighbors l, chosen uniformly at result is in a perfect accordance with the clustering scheme
random from the neighbors of n. The probability of going (expected similarity) in Fig. 23(b), where the vertices m = 1
from vertex n to vertex l is equal to the ratio of the weight and n = 3 are grouped into the same cluster, while the vertices
Wnl and the sum of all possible edge weights from the m = 6 and n = 3 belong to different clusters.
vertex n, that is The probability vectors, pn , are called the diffusion
Wnl 1 clouds (in this case for step t = 1), since they resemble a
Pnl =P = Wnl . (70) cloud around a vertex n. The diffusion distance can then
l W nl Dnn

43
be considered as a distance between the diffusion clouds and is equal to the generalized Laplacian spectral space
(sets of data) around a vertex m and a vertex n. If the mapping, whereby the axis vectors qn = [u1 (n), u2 (n), . . . ,
vertices are well connected (approaching a complete graph uN −1 (n)] are multiplied by the corresponding eigenvalues,
structure) then this distance is small, while for vertices (1 − λk )t .
with long paths between them, this distance is large. It can be shown that the diffusion distance between
The diffusion analysis can be easily generalized to any vertices in the new diffusion map space is equal to their
value of the diffusion step, t, whereby after t steps, the Euclidean distance [77], that is
matrix of probabilities in (71) becomes
(t)
p
Df (m, n) = VV ||qm − qn ||2 . (73)
Pt = (D−1 W)t .
(t)
The elements of this matrix, denoted by Pmn , are equal to Example 34: For the graph from Fig. 2, whose weight matrix,
the probabilities that a random walker moves from a vertex W, and the degree matrix, D, are defined in (4) and (6), the
m to a vertex n, in t steps. The t-step diffusion distance diffusion distance between the vertices m = 1 and n = 3 can
between the vertices m and n, is accordingly defined as be calculated using (73) as
(1) √
(t) −1/2 Df (1, 3) = VV ||(q1 − q3 )||2 = 1.54,
p
Df (m, n) = ||(p(t) (t)
m − pn )D ||2 VV ,
where the spectral vectors, q1 = [u1 (1)(1 − λ1 )1 , . . . , uN (1)(1 −
where
(t) (t) (t) λN )1 ] and q3 = [u1 (3)(1 − λ1 )1 , . . . , uN (3)(1 − λN )1 ] are ob-
p(t)
m = [Pm0 , Pm1 , . . . , Pm(N −1) ] tained using the generalized graph Laplacian eigenvectors, uk ,
and and the corresponding eigenvalues, λk , from Luk = λk Duk .
(t) (t) (t) This is the same diffusion distance value, Df (1, 3), as in Ex-
p(t)
n = [Pn0 , Pn1 , . . . , Pn(N −1) ].
ample 33.
It can be shown that the diffusion distance is equal Dimensionality reduced diffusion maps. Dimension-
to the Euclidean distance between the considered vertices ality of the vertex representation space can be reduced in
when they are presented in a new space of their general- diffusion maps by keeping only the eigenvectors that cor-
ized Laplacian eigenvectors, which are then scaled by their respond to the M most significant eigenvalues, (1 − λk )t ,
corresponding eigenvalues; this new space is referred k = 1, 2, . . . , M , in the same way as for the Laplacian
to as the diffusion maps (cf. eigenmaps). eigenmaps, For example, the two-dimensional spectral do-
The eigenanalysis relation for the random walk weight main of the vertices in the diffusion mapping is defined
matrix for the state t = 1 now becomes as
(P ) qn = [u1 (n)(1 − λ1 )t , u2 (n)(1 − λ2 )t ].
(D−1 W) uk = λk uk .
While the analysis and intuition for the diffusion mapping
Since the weight matrix can be written as W = D − L, is similar to that for the commute time mapping, presented
(P )
this yields D−1 (D − L)uk = λk uk , or in Remark 35, diffusion maps have an additional degree of
freedom, the step t.
(P )
(I − D−1 L)uk = λk uk , Example 35: For the graph in Fig 25, which corresponds to a
set of real-world images, the commute time two-dimensional
to finally produce the generalized graph Laplacian equa- spectral vectors in (69), normalized by the first eigenvector

tion, value through a multiplication of its coordinates by λ1 , as-
Luk = λk Duk , sume the form

(P ) λ1
with λk = (1 − λk ). This relation indicates that a one-
h i
qn = u1 (n), √ u2 (n) = [u1 (n), 0.62u2 (n)].
step diffusion mapping is directly obtained from the corre- λ2
sponding generalized graph Laplacian mapping. The corresponding vertex colors designate diffusion-based clus-
After t steps, the random walk matrix (of probabilities) tering, as shown in Fig. 40(a). Fig. 40(b) shows the vertices
becomes of this graph, colored with the two-dimensional diffusion map
Pt = (D−1 W)t , spectral vectors, which are normalized by (1 − λ1 ), to yield
(P )t
for which the eigenvalues are λk = (1 − λk )t , while 1 − λ2
h i
qn = u1 (n), u2 (n) = [u1 (n), 0.09u2 (n)].
the (right) eigenvectors remain the same as for the graph 1 − λ1
Laplacian, see (26).
Finally, the sum over all steps, t = 0, 1, 2, . . . , of the
The spectral space for vertices, for a t-step diffusion
diffusion space yields
process (diffusion mapping), is then defined based on the
spectral vector qn = [u1 (n), u2 (n), . . . , uN −1 (n)]Λ̄−1 ,
qn = [u1 (n), u2 (n), . . . , uN −1 (n)](I − Λ̄)t ,

44
[78]. For a given large (in general directed) graph, G, with
N vertices, its resampling aims to produce a much simpler
graph which retains most of the properties of the original
graph, but is both less complex and more physically and
computationally meaningful. The similarity between the
original large graph G, and the down-scaled graph, S, with
M vertices, where M  N , is defined with respect to the
(a) (b) set of parameters of interest, like for example, the connec-
Figure 40: Graph structure for the images from Fig. 25, with ver- tivity or distribution on a graph. Such criteria may also
tex color embedding which corresponds to the two-dimensional nor- be related to the spectral behavior of graphs.
malized spectral vectors in (a) the commute time representation, Several methods exist for graph down-scaling, of which
qn = [u1 (n), 0.62u2 (n)], and (b) the spectral eigenvectors of the dif-
fusion process, qn = [u1 (n), 0.09u2 (n)], with t = 1. For the commute
some are listed below.
time presentation in (a), the graph Laplacian eigenvectors, u1 and
u2 , are used, while for the diffusion process presentation in (b) the • The simplest method for graph down-sampling is
generalized Laplacian eigenvectors, u1 and u2 , are used. the random vertex or random node (RN) selection
method, whereby a random subset of vertices is used
for the analysis and representation of large graphs
since the sum of a geometric progression is equal to
and data observed on such large graphs. Even though

X the vertices are here selected with equal probabili-
(I − Λ̄)t = Λ̄−1 . ties, this method produces good results in practical
t=0 applications.
This mapping also corresponds to the cumulative diffusion
• Different from the RN method, where the vertices
distance, given by
are selected with a uniform probability, the random

X (t)
degree vertex/node (RDN) selection method is based
Dc (n, l) = Df (n, l). on the probability of vertex selection that is pro-
t=0 portional to the vertex degree. In other words, ver-
The diffusion eigenmaps can be therefore obtained by tices
P with more connections, thus having larger Dn =
appropriate axis scaling of the standard eigenmaps, pro- m Wnm , are selected with higher probability. This
duced by the generalized eigenvectors of the graph Lapla- makes the RDN approach biased with respect to
cian. highly connected vertices.
Remark 36: The commute time and the diffusion process • The PageRank method is similar to the RDN, and
mappings are related in the same way as the mappings is based on the vertex rank. The PageRank is de-
based on the graph Laplacian eigenvectors and the gener- fined by the importance of the vertices connected to
alized eigenvectors of the graph Laplacian. the considered vertex n. Then, the probability that
a vertex n will be used in a down-scaled graph is
4.6. Summary of Embedding Mappings proportional to the PageRank of this vertex. This
A summary of the considered embedding mappings method is also known as the random PageRank ver-
is given in Table 1. Notice that various normalization tex (RPN) selection, and is biased with respect to the
schemes may be used to obtain the axis vectors, yn , from highly connected vertices (with a high PageRank).
the spectral vectors, qn (see Algorithm 3).
These examples of dimensionality reduction reveal close • A method based on a random selection of edges that
connections with spectral clustering algorithms developed will remain in the simplified graph is called the ran-
in standard machine learning and computer vision; in this dom edge (RE) method. This method may lead to
sense, the notions of dimensionality reduction and graphs that are not well connected, and which ex-
clustering can be considered as two sides of the hibit large diameters.
same coin [65]. In addition to the reduction of dimen- • The RE method may be combined with random ver-
sionality for visualization purposes, the resulting spectral tex selection to yield a combined RNE method, whereby
vertex space of lower dimensionality may be used to miti- the initial random vertex selection is followed by a
gate the complexity and accuracy issues experienced with random selection of one of the edges that is con-
classification algorithms, or in other words to bypass the nected to the selected vertex.
course of dimensionality.
• In addition to these methods, more sophisticated
5. Graph Sampling Strategies methods based on random vertex selection and ran-
dom walk (RW) analysis may be defined. For exam-
In the case of extremely large graphs, subsampling and ple, we can randomly select a small subset of ver-
down-scaling of graphs is a prerequisite for their analysis tices and form several random walks starting from
45
Mapping Eigen-analysis relation Reduced dimensionality spectral vector

Graph Laplacian mapping Luk = λk uk qn = [u1 (n), u(2), . . . , uM (n)]

Generalized eigenvectors
of Laplacian mapping Luk = λk Duk qn = [u1 (n), u(2), . . . , uM (n)]

Normalized Laplacian mapping (D−1/2 LD−1/2 )uk = λk uk qn = [u1 (n), u(2), . . . , uM (n)]

Commute time mapping Luk = λk uk qn = [ u√1 (n)


λ
, u√2 (n)
λ
, . . . , u√Mλ(n) ]
1 2 M

Diffusion (random walk) mapping Luk = λk Duk qn = [u1 (n)(1 − λ1 ) , . . . , uM (n)(1 − λM )t ]


t

Cumulative diffusion mapping Luk = λk Duk qn = [ u1λ(n)


1
, u2λ(n)
2
, . . . , uM (n)
λM ]

Table 1: Summary of graph embedding mappings. The Graph Laplacian mapping, the Generalized eigenvectors of the Laplacian mapping,
the Normalized Laplacian mapping, the Commute time mapping, the Diffusion mapping, and the Cummulative diffusion mapping.

each selected vertex. The Random Walk (RW), Ran- 7. Appendix: Power Method for Eigenanalysis
dom Jump (RJ) and Forest Fire graph down-scaling
strategies are all defined in this way. Computational complexity of the eigenvalue and eigen-
vector calculation for a symmetric matrix is of the order of
O(N 3 ), which is computationally prohibitive for very large
6. Conclusion graphs, especially when only a few the smoothest eigenvec-
tors are needed, like in spectral graph clustering. To mit-
Although within the graph data analytics paradigm,
igate this computational bottleneck, an efficient iterative
graphs have been present in various forms for centuries,
approach, called the Power Method, may be employed.
the advantages of the graph framework for data analytics
Consider the normalized weight matrix,
on graphs, as opposed to the optimization of the graphs
themselves, have been recognized only recently. In order to WN = D−1/2 WD−1/2 ,
provide a comprehensive and Data Analytics friendly in-
troduction to graph data analytics, an overview of graphs and assume that the eigenvalues of WN are |λ0 |> |λ1 |>
from this specific practitioner-friendly signal processing · · · > |λM −1 |, with the corresponding eigenvectors, u1 , u2 ,
point of view is a prerequisite. . . . , uM −1 . Consider also an arbitrary linear combination
In this part of our article, we have introduced graphs of the eigenvectors, un , through the coefficients αn ,
as irregular signal domains, together with their properties
that are relevant for data analytics applications which rest x = α1 u1 + α2 u2 + · · · + αM −1 uM −1 .
upon the estimation of signals on graphs. This has been
A further multiplication of the vector x by the normalized
achieved in a systematic and example rich way and by
weight matrix, WN , results in
highlighting links with classic matrix analysis and linear
algebra. Spectral analysis of graphs has been elaborated WN x = α1 WN u1 + α2 WN u2 + · · · + αM −1 WN uM −1
upon in detail, as this is the main underpinning method-
= α1 λ1 u1 + α2 λ2 u2 + · · · + αM −1 λM −1 uM −1 .
ology for efficient data analysis, the ultimate goal in Data
Science. Both the adjacency matrix and the Laplacian ma- A repetition of this multiplication k times yields
trix have been used in this context, along with their spec-
k
tral decompositions. Finally, we have highlighted impor- WN x = α1 λk1 u1 + α2 λk2 u2 + · · · + αM −1 λkM −1 uM −1
tant aspects of graph segmentation and Laplacian eigen-  λk λk 
maps, and have emphasized their role as the foundation = α1 λk1 u1 + α2 k2 u2 + · · · + αM −1 Mk−1 uM −1
for advances in Data Analytics and unsupervised learning λ1 λ1
on graphs. u α1 λk1 u1 .
Part 2 of this monograph will address theory and meth-
ods of processing data on graphs, while Part 3 is devoted to In other words, we have just calculated the first eigenvector
unsupervised graph topology learning, from the observed of WN , given by
data. k k
u1 = WN x/||WN x||2

which are achieved through only matrix products of WN


and x [67, 79]. The convergence of this procedure depends
46
on the eigenvalue ratio λ2 /λ1 , and requires that α1 is not Algorithm 2. Power Method for eigenanalysis.
close to 0. Note that WN is a highly sparse matrix, which Input:
significantly reduces the calculation complexity. • Normalized weight matrix WN
After the eigenvector u1 is obtained, the corresponding
eigenvalue can be calculated as its smoothing index, λ1 = • Number of iterations, It
uT1 WN u1 . • Number of the desired largest eigenvectors, M
After calculating u1 and λ1 , we can remove their con-
tribution from the normalized weight matrix,WN , through 1: for m = 1 to M do
deflation, as WN ← WN − λ1 u1 uT1 , and then continue to 2: um ∈ {−1, 1}M , drawn randomly (uniformly)
calculate the next largest eigenvalue and its eigenvector,
3: for i = 1 to It
λ2 and u2 . This procedure can be repeated iteratively
until the desired number of eigenvectors is found. 4: um ← WN um /||WN um ||2
The relation of the normalized weight matrix, WN , 5: λm ← uH
m WN um
with the normalized graph Laplacian, LN , is given by 6: end do
LN = I − WN , 7: WN ← WN − λm um uH
m
8: end do
while the relation between the eigenvalues and eigenvectors
Output:
of L and WN follows from WN = UT ΛU, to yield
• Largest M eigenvalues |λ0 |> |λ1 |> · · · > |λM −1 |
LN = I − UT ΛU = UT (I − Λ)U. and the corresponding eigenvectors u1 , . . . , uM −1
• Fiedler vector of the normalized graph Laplacian
The eigenvalues of LN and WN are therefore related as
(L) is the eigenvector un1 of the second largest eigen-
λn = 1 − λn , and share the same corresponding eigen-
value, λn1 , λ0 = 1 > λn1 > · · · > λnM −1 .
vectors, un , of the normalized graph Laplacian and the
normalized weight matrix. This means that λ1 = 1 cor-
(L)
responds to λ0 = 0 and that the second largest eigen- 8. Appendix: Algorithm for Graph Laplacian Eigen-
value of WN produces the Fiedler vector of the normalized maps
Laplacian.
Note that the second largest eigenvector of WN is not The algorithm for the Laplacian eigenmap and spectral
necessarily λ2 since the eigenvalues of WN can be nega- clustering based on the eigenvectors of the graph Lapla-
tive. cian, the generalized eigenvectors of the graph Laplacian,
Example 36: The weight matrix W from (4) is normalized by and the eigenvectors of the normalized Laplacian, is given
the degree matrix from (6) to arrive at WN = D−1/2 WD−1/2 . in the pseudo-code form in Algorithm 3.
The power algorithm is then used to calculate the four largest
eigenvalues and the corresponding eigenvectors of WN in 200
References
iterations, to give λn ∈ {1.0000 − 0.7241 − 0.6795, 0.6679}.
These are very close to the four exact largest eigenvalues of [1] N. Christofides, Graph theory: An algorithmic approach, Aca-
WN , λn ∈ {1.0000 − 0.7241 − 0.6796, 0.6677}. Note that the demic Press, 1975.
Fiedler vector of the normalized graph Laplacian is associated [2] F. Afrati, A. G. Constantinides, The use of graph theory in
binary block code construction, in: Proceedings of the Interna-
with λ4 = 0.6679 as it corresponds to the second largest eigen-
tional Conference on Digital Signal Processing, 1978, pp. 228–
value of WN , when the eigenvalue signs are accounted for. 233.
Even when calculated using the approximative power method, [3] O. J. Morris, M. d. J. Lee, A. G. Constantinides, Graph theory
the Fiedler vector is close to its exact value, as shown in Fig. for image analysis: An approach based on the shortest span-
22(d), with the maximum relative error of its elements being ning tree, IEE Proceedings F-Communications, Radar and Sig-
0.016. nal Processing 133 (2) (1986) 146–152.
[4] L. J. Grady, J. R. Polimeni, Discrete calculus: Applied anal-
Notice that it is possible to calculate the Fiedler vec- ysis on graphs for computational science, Springer Science &
tor of a graph Laplacian even without using the weight Business Media, 2010.
matrix. Consider a graph whose eigenvalues of the Lapla- [5] S. S. Ray, Graph theory with algorithms and its applications: in
Applied Science and Technology, Springer Science & Business
cian are λ0 = 0 > λ1 > λ2 > · · · > λN −1 , where λ1 Media, 2012.
corresponds to the largest value of the sequence λ0 = [6] A. Marques, A. Ribeiro, S. Segarra, Graph signal processing:
0, 1/λ1 , 1/λ2 , , . . . , 1/λN −1 . These are also the eigenval- Fundamentals and applications to diffusion processes, in: Proc.
ues of the pseudo-inverse of the graph Laplacian, L+ = Int. Conf. Accoustic, Speech and Signal Processing, (ICASSP),
2017, IEEE, 2017.
pinv(L). Now, since the pseudo-inverse of the graph Lapla- [7] H. Krim, A. B. Hamza, Geometric methods in signal and image
cian, L+ , and the graph Laplacian, L, have the same eigen- analysis, Cambridge University Press, 2015.
vectors, we may apply the power method to the pseudo- [8] M. I. Jordan, Learning in graphical models, Vol. 89, Springer
Science & Business Media, 1998.
inverse of the graph Laplacian, L+ , and the eigenvector [9] A. Bunse-Gerstner, W. B. Gragg, Singular value decompositions
corresponding to the largest eigenvalue is the Fiedler vec- of complex symmetric matrices, Journal of Computational and
tor. Applied Mathematics 21 (1) (1988) 41–54.

47
Algorithm 3. Graph Laplacian Based Eigenmaps. [10] D. S. Grebenkov, B.-T. Nguyen, Geometrical structure of Lapla-
cian eigenfunctions, SIAM Review 55 (4) (2013) 601–667.
Input: [11] R. Bapat, The Laplacian matrix of a graph, Mathematics
• Vertex V = {0, 1, . . . , N − 1} positions, rows of X Student-India 65 (1) (1996) 214–223.
[12] S. O’Rourke, V. Vu, K. Wang, Eigenvectors of random matrices:
• Weight matrix W, with elements Wmn A survey, Journal of Combinatorial Theory, Series A 144 (2016)
• Laplacian eigenmap dimensionality, M 361–442.
[13] K. Fujiwara, Eigenvalues of Laplacians on a closed Riemannian
• Position, mapping, normalization, and coloring manifold and its nets, Proceedings of the American Mathemat-
indicators P, M ap, S, C ical Society 123 (8) (1995) 2585–2594.
[14] S. U. Maheswari, B. Maheswari, Some properties of Cartesian
PN −1 product graphs of Cayley graphs with arithmetic graphs, In-
1: D ← diag(Dnn = m=0 Wmn , n = 0, 1, . . . , N − 1) ternational Journal of Computer Applications 138 (3) (2016)
2: L←D−W 26–29.
3: [U, Λ] ← eig(L) [15] M. I. Jordan, et al., Graphical models, Statistical Science 19 (1)
(2004) 140–155.
4: uk (n) ← U (n, k), for k = 1, . . . , M , n = 0, 1, . . . , N−1. [16] J. M. Moura, Graph signal processing, in: Cooperative and
5: M ← maxn (U (n, 1 : L)), m ← minn (U (n, 1 : L)) Graph Signal Processing, P. Djuric and C. Richard, Editors,
6: qn ← [u1 (n), u2 (n), . . . , uL (n)], for all n Elsevier, 2018, pp. 239–259.
7: If Map=1, qn ← qn Λ̄−1/2 , end [17] M. Vetterli, J. Kovačević, V. Goyal, Foundations of Signal Pro-
t cessing, Cambridge University Press., 2014.
8:  qn ← qn (I − Λ̄) , end
If Map=2, [18] A. Sandryhaila, J. M. Moura, Discrete signal processing on

 qn , for S = 0, graphs, IEEE Transactions on Signal Processing 61 (7) (2013)
1644–1656.

qn /||qn ||2 , for S = 1,


[19] V. N. Ekambaram, Graph-structured data viewed through a

9: yn ← sign(qn ), for S = 2, Fourier lens, University of California, Berkeley, 2014.

sign(qn − (M + m)/2), for S = 3,
 [20] A. Sandryhaila, J. M. Moura, Discrete signal processing on


 graphs: Frequency analysis, IEEE Transactions on Signal Pro-
(q − m)./(M − m), for S = 4 cessing 62 (12) (2014) 3042–3054.
n
10: Y ←( yn , as the rows of Y [21] A. Sandryhaila, J. M. Moura, Big data analysis with signal
processing on graphs: Representation and processing of mas-
X, for P = 0, sive data sets with irregular structure, IEEE Signal Processing
11: Z← Magazine 31 (5) (2014) 80–90.
Y, for P = 1
( [22] D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, P. Van-
Constant, for C = 0, dergheynst, The emerging field of signal processing on graphs:
12: ColorMap ← , Extending high-dimensional data analysis to networks and other
(Y + 1)/2, for C = 1 irregular domains, IEEE Signal Processing Magazine 30 (3)
13: GraphPlot(W, Z, ColorMap) (2013) 83–98.
14: Cluster the vertices according to Y and refine using [23] R. Hamon, P. Borgnat, P. Flandrin, C. Robardet, Extraction
of temporal network structures from graph-based signals, IEEE
the k-means algorithm (Remark 30) or the normalized Transactions on Signal and Information Processing over Net-
cut recalculation algorithm (Remark 33). works 2 (2) (2016) 215–226.
[24] S. Chen, A. Sandryhaila, J. M. Moura, J. Kovačević, Signal
Output: denoising on graphs via graph filtering, in: Proc. 2014 IEEE
Global Conference on Signal and Information Processing (Glob-
• New graph alSIP), 2014, pp. 872–876.
• Subsets of vertex clusters [25] A. Gavili, X.-P. Zhang, On the shift operator, graph frequency,
and optimal filtering in graph signal processing, IEEE Transac-
˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ tions on Signal Processing 65 (23) (2017) 6303–6318.
Comments on the Algorithm: For the normalized Lapla-
[26] M. J. Wainwright, M. I. Jordan, et al., Graphical models, ex-
cian, Line 2 should be replaced by L ← I − D−1/2 WD−1/2 ,
ponential families, and variational inference, Foundations and
while for the generalized eigenvectors Line 3 should be replaced
Trends R in Machine Learning 1 (1–2) (2008) 1–305.
by [U, Λ] ← eig(L, D), see also Table 1. The indicator values
[27] D. M. Cvetković, M. Doob, H. Sachs, Spectra of graphs: Theory
of vertex positions in the output graph are: P = 0, for the
and application, Vol. 87, Academic Press, 1980.
original vertex space, and P = 1, for the spectral vertex space.
[28] D. M. Cvetković, M. Doob, Developments in the theory of graph
The indicator of mapping is: M ap = 1, for the commute time
spectra, Linear and Multilinear Algebra 18 (2) (1985) 153–181.
mapping (matrix Λ̄ is obtained from Λ, by omitting the trivial
[29] D. M. Cvetković, I. Gutman, Selected topics on applications of
element λ0 = 0), and M ap = 2, for the diffusion mapping (in
graph spectra, Matematički Institut SANU (Serbian Academy
this case the generalized eigenvectors must be used in Line 3,
of Scences and Arts), 2011.
[U, Λ] ← eig(L, D) and the diffusion step t should be given as an
[30] A. E. Brouwer, W. H. Haemers, Spectra of graphs, Springer-
additional input parameter), otherwise M ap = 0. The indicator
Verlag New York, 2012.
of the eigenvectors normalization is: S = 0, for the case with-
[31] F. Chung, Spectral graph theory, AMS, Providence, RI, 1997.
out normalization, S = 1, for two-norm normalization, S = 2,
[32] O. Jones, Spectra of simple graphs[Online]. Available:
for the case of binary normalization, S = 3, for binary normal-
https://www.whitman.edu/Documents/Academics /Mathe-
ization with the mean as a reference, and S = 4, for marginal
matics/Jones.pdf, Whitman College, 2013.
normalization. The indicator of vertex coloring is: C = 0, for
[33] D. Mejia, O. Ruiz-Salguero, C. A. Cadavid, Spectral-based
the same color for all vertices is used, and C = 1, when the
mesh segmentation, International Journal on Interactive Design
spectral vector defines the vertex colors.
and Manufacturing (IJIDeM) 11 (3) (2017) 503–514.
[34] L. Stanković, E. Sejdić, M. Daković, Vertex-frequency energy
distributions, IEEE Signal Processing Letters 25 (3) (2017) 358–
362.

48
[35] L. Stanković, M. Daković, E. Sejdić, Vertex-frequency en- partmental Papers (CIS) (2000) 107.
ergy distributions, in: L. Stanković, E. Sejdić (Eds.), Vertex- [60] B. Mohar, Isoperimetric numbers of graphs, Journal of combi-
Frequency Analysis of Graph Signals, Springer, 2019, pp. 377– natorial theory, Series B 47 (3) (1989) 274–291.
415. [61] M. Fiedler, Algebraic connectivity of graphs, Czechoslovak
[36] H. Lu, Z. Fu, X. Shu, Non-negative and sparse spectral cluster- mathematical journal 23 (2) (1973) 298–305.
ing, Pattern Recognition 47 (1) (2014) 418–426. [62] J. Malik, S. Belongie, T. Leung, J. Shi, Contour and texture
[37] X. Dong, P. Frossard, P. Vandergheynst, N. Nefedov, Clustering analysis for image segmentation, International Journal of Com-
with multi-layer graphs: A spectral perspective, IEEE Transac- puter Vision 43 (1) (2001) 7–27.
tions on Signal Processing 60 (11) (2012) 5820–5831. [63] A. Y. Ng, M. I. Jordan, Y. Weiss, On spectral clustering: Analy-
[38] R. Horaud, A short tutorial on graph Laplacians, sis and an algorithm, in: Proc. Advances in Neural Information
Laplacian embedding, and spectral clustering, [Online], Processing Systems, 2002, pp. 849–856.
Available: http://csustan.csustan.edu/ tom/Lecture- [64] D. A. Spielman, S.-H. Teng, Spectral partitioning works: Pla-
Notes/Clustering/GraphLaplacian-tutorial.pdf (2009). nar graphs and finite element meshes, Linear Algebra and its
[39] R. Hamon, P. Borgnat, P. Flandrin, C. Robardet, Relabelling Applications 421 (2-3) (2007) 284–305.
vertices according to the network structure by minimizing the [65] M. Belkin, P. Niyogi, Laplacian eigenmaps for dimensionality
cyclic bandwidth sum, Journal of Complex Networks 4 (4) reduction and data representation, Neural Computation 15 (6)
(2016) 534–560. (2003) 1373–1396.
[40] M. Masoumi, A. B. Hamza, Spectral shape classification: A [66] F. Chung, Laplacians and the cheeger inequality for directed
deep learning approach, Journal of Visual Communication and graphs, Annals of Combinatorics 9 (1) (2005) 1–19.
Image Representation 43 (2017) 198–211. [67] L. Trevisan, Lecture notes on expansion, sparsest cut, and spec-
[41] M. Masoumi, C. Li, A. B. Hamza, A spectral graph wavelet tral graph theory (2013).
approach for nonrigid 3D shape retrieval, Pattern Recognition [68] Y. Weiss, Segmentation using eigenvectors: A unifying view, in:
Letters 83 (2016) 339–348. Proceedings of the Seventh IEEE International Conference on
[42] L. Stanković, M. Daković, E. Sejdić, Vertex-frequency analysis: Computer Vision, Vol. 2, IEEE, 1999, pp. 975–982.
A way to localize graph spectral components [lecture notes], [69] P. Perona, W. Freeman, A factorization approach to grouping,
IEEE Signal Processing Magazine 34 (4) (2017) 176–182. in: Prof. of European Conference on Computer Vision, Springer,
[43] L. Stanković, E. Sejdić, M. Daković, Reduced interference 1998, pp. 655–670.
vertex-frequency distributions, IEEE Signal Processing Letters [70] G. L. Scott, H. C. Longuet-Higgins, Feature grouping by relo-
25 (9) (2018) 1393–1397. calisation of eigenvectors of the proximity matrix., in: Proc. of
[44] L. Stankovic, D. Mandic, M. Dakovic, I. Kisil, E. Sejdic, A. G. the British Machine Vision Conference (BMVC), 1990, pp. 1–6.
Constantinides, Understanding the basis of graph signal pro- [71] Z. Wang, E. P. Simoncelli, A. C. Bovik, Multiscale structural
cessing via an intuitive example-driven approach, IEEE Signal similarity for image quality assessment, in: Proc. of The Thrity-
Processing Magazine, arXiv preprint arXiv:1903.11179, Novem- Seventh Asilomar Conference on Signals, Systems & Computers,
ber. 2003, Vol. 2, 2003, pp. 1398–1402.
[45] F. R. Chung, R. P. Langlands, A combinatorial laplacian with [72] M. Mijalkov, E. Kakaei, J. B. Pereira, E. Westman,
vertex weights, journal of combinatorial theory, Series A 75 (2) G. Volpe, BRAPH: A graph theory software for the
(1996) 316–327. analysis of brain connectivity, PLOS ONE 12.8 (2017):
[46] A. Duncan, Powers of the adjacency matrix and the walk ma- e0178798doi:https://doi.org/10.1371/journal.pone.0178798.
trix, The Collection (2004) 1–11. [73] M. Rubinov, O. Sporns, Complex network measures of brain
[47] S. Saito, D. P. Mandic, H. Suzuki, Hypergraph p-Laplacian: A connectivity: Uses and interpretations, NeuroImage 52 (3)
differential geometry view, in: Thirty-Second AAAI Conference (2010) 1059 – 1069, computational Models of the Brain.
on Artificial Intelligence, 2018. doi:https://doi.org/10.1016/j.neuroimage.2009.10.003.
[48] E. R. Van Dam, W. H. Haemers, Which graphs are determined [74] H. Qiu, E. R. Hancock, Clustering and embedding using com-
by their spectrum?, Linear Algebra and Its Applications 373 mute times, IEEE Transactions on Pattern Analysis and Ma-
(2003) 241–272. chine Intelligence 29 (11) (2007) 1873–1890.
[49] S. E. Schaeffer, Graph clustering, Computer Science Review [75] R. Horaud, A short tutorial on graph Laplacians, Laplacian
1 (1) (2007) 27–64. embedding, and spectral clustering (2012).
[50] J. N. Mordeson, P. S. Nair, Fuzzy graphs and fuzzy hypergraphs, [76] A. K. Chandra, P. Raghavan, W. L. Ruzzo, R. Smolensky, P. Ti-
Vol. 46, Physica, 2012. wari, The electrical resistance of a graph captures its commute
[51] J. Kleinberg, E. Tardos, Algorithm design, Pearson Education and cover times, Computational Complexity 6 (4) (1996) 312–
India, 2006. 340.
[52] O. Morris, M. d. J. Lee, A. Constantinides, Graph theory for [77] R. R. Coifman, S. Lafon, Diffusion maps, Applied and compu-
image analysis: An approach based on the shortest spanning tational harmonic analysis 21 (1) (2006) 5–30.
tree, IEE Proceedings F (Communications, Radar and Signal [78] J. Leskovec, C. Faloutsos, Sampling from large graphs, in: Pro-
Processing) 133 (2) (1986) 146–152. ceedings of the 12th ACM SIGKDD International Conference
[53] S. Khuller, Approximation algorithms for finding highly con- on Knowledge Discovery and Data Mining, ACM, 2006, pp.
nected subgraphs, Tech. rep. (1998). 631–636.
[54] A. K. Jain, Data clustering: 50 years beyond k-means, Pattern [79] M. Tammen, I. Kodrasi, S. Doclo, Complexity reduction of
Recognition Letters 31 (8) (2010) 651–666. eigenvalue decomposition-based diffuse power spectral density
[55] I. S. Dhillon, Y. Guan, B. Kulis, Kernel k-means: Spectral clus- estimators using the power method, in: Proc. of the IEEE Inter-
tering and normalized cuts, in: Proceedings of the Tenth ACM national Conference on Acoustics, Speech and Signal Processing
SIGKDD International Conference on Knowledge Discovery and (ICASSP), IEEE, 2018, pp. 451–455.
Data Mining, ACM, 2004, pp. 551–556.
[56] M. Stoer, F. Wagner, A simple min-cut algorithm, Journal of
the ACM (JACM) 44 (4) (1997) 585–591.
[57] G. Kron, Diakoptics: the piecewise solution of large-scale sys-
tems, Vol. 2, MacDonald, 1963.
[58] L. Hagen, A. B. Kahng, New spectral methods for ratio cut par-
titioning and clustering, IEEE Trans. Computer-Aided Design
of Int. Circuits and Systems 11 (9) (1992) 1074–1085.
[59] J. Shi, J. Malik, Normalized cuts and image segmentation, De-

49

You might also like