GraphSigProc Part I v18 NowFnT
GraphSigProc Part I v18 NowFnT
Ljubiša Stankovića , Danilo Mandicb , Miloš Dakovića , Miloš Brajovića , Bruno Scalzob , Anthony G. Constantinidesb
a University of Montenegro, Podgorica, Montenegro
b Imperial College London, London, United Kingdom
Abstract
The area of Data Analytics on graphs promises a paradigm shift, as we approach information processing of new classes of
data which are typically acquired on irregular but structured domains (social networks, various ad-hoc sensor networks).
Yet, despite the long history of Graph Theory, current approaches mostly focus on the optimization of graphs themselves,
rather than on directly inferring learning strategies, such as detection, estimation, statistical and probabilistic inference,
clustering and separation from signals and data acquired on graphs. To fill this void, we first revisit graph topologies
from a Data Analytics point of view, to establish a taxonomy of graph networks through a linear algebraic formalism
of graph topology (vertices, connections, directivity). This serves as a basis for spectral analysis of graphs, whereby
the eigenvalues and eigenvectors of graph Laplacian and adjacency matrices are shown to convey physical meaning
related to both graph topology and higher-order graph properties, such as cuts, walks, paths, and neighborhoods.
Through a number of carefully chosen examples, we demonstrate that the isomorphic nature of graphs enables both
the basic properties of data observed on graphs and their descriptors (features) to be preserved throughout the data
analytics process, even in the case of reordering of graph vertices, where classical approaches fail. Next, to illustrate
the richness and flexibility of estimation strategies performed on graph signals, spectral analysis of graphs is introduced
through eigenanalysis of mathematical descriptors of graphs and in a generic way. Finally, benefiting from enhanced
degrees of freedom associated with graph representations, a framework for vertex clustering and graph segmentation is
established based on graph spectral representation (eigenanalysis) which demonstrates the power of graphs in various
data association tasks, from image clustering and segmentation trough to low-dimensional manifold representation. The
supporting examples demonstrate the promise of Graph Data Analytics in modeling structural and functional/semantic
inferences. At the same time, Part I serves as a basis for Part II and Part III which deal with theory, methods and
applications of processing Data on Graphs and Graph Topology Learning from data.
(b)
2. Graph Definitions and Properties
Graph theory has been established for almost three Figure 1: Basic graph structures. (a) Undirected graph and (b)
centuries as a branch in mathematics, and has become Directed graph.
a staple methodology in science and engineering areas in-
cluding chemistry, operational research, electrical and civil Regarding the directionality of vertex connections, a
engineering, social networks, and computer sciences. The graph can be undirected and directed, as illustrated re-
beginning of graph theory applications in electrical engi- spectively in Fig. 1(a) and Fig. 1(b).
neering can be traced back to the mid-XIX century with Definition: A graph is undirected if the edge connecting a
the introduction of Kirchoff’s laws. Fast forward two cen- vertex m to a vertex n also connects the vertex n to the
turies or so, the analytics of data acquired on graphs has vertex m, for all m and n.
become a rapidly developing research paradigm in Signal In other words, for an undirected graph, if (n, m) ∈ B
Processing and Machine Learning [4, 5, 6, 7]. then also (m, n) ∈ B, as in the case, for example, with
edges (1, 2) and (2, 1) in Fig. 1(a). For directed graphs,
2.1. Basic Definitions in general, this property does not hold, as shown in Fig.
Definition: A graph G = {V, B} is defined as a set of 1(b). Observe, for example, that the edge (2, 1) does not
vertices, V, which are connected by a set of edges, B ⊂ V × exist, although the edge (1, 2) connects vertices 1 and 2.
3
Therefore, undirected graphs can be considered as a special 0.23
0 1
case of directed graphs. 0.
74
0.
24 0.35
For a given set of vertices and edges, a graph can be 2
0.23
formally represented by its adjacency matrix, A, which 0.26
3 0.2
describes the vertex connectivity; for N vertices A is an 4
N × N matrix. 4
0.32
4
Definition: The elements Amn of the adjacency matrix 0.1
0.5
1
A assume values Amn ∈ {0, 1}. The value Amn = 0 is 6 0.32 0.15 5
assigned if the vertices m and n are not connected with an 7
edge, and Amn = 1 if these vertices are connected, that is
( Figure 2: Example of a weighted graph.
def 1, if (m, n) ∈ B
Amn =
0, if (m, n) ∈
/ B. the original and renumerated graphs, A1 and A2 respec-
Therefore, the respective adjacency matrices, Aun and tively, is straightforwardly defined using an appropriate
Adir , for the undirected and directed graphs from Fig. 1(a) permutation matrix, P, in the form
and (b) are given by
A2 = P A1 PT . (3)
0
0 1 1 1 0 0 0 0
Recall that each row and in each column of a permutation
1 1 0 1 0 1 0 0 0 matrix has exactly one nonzero element equal to unity.
In general, the edges can also convey information about
2 1 1 0 1 1 0 0 0
a relative importance of their connection, through a weighted
3 1 0 1 0 0 0 1 0
Aun = , (1) graph.
4
0 1 1 0 0 1 0 1
5
0 0 0 0 1 0 0 1 Remark 3: The set of weights, W, corresponds morpho-
6 0 0 0 1 0 0 0 1 logically to the set of edges, B, so that a weighted graph
7 0 0 0 0 1 1 1 0 represents a generic extension of an unweighted graph. It
0 1 2 3 4 5 6 7 is commonly assumed that edge weights are nonnegative
real numbers; therefore, if weight 0 is associated with a
nonexisting edge, then the graph can be described by a
0
0 1 0 0 0 0 0 0
weight matrix, W, similar to the description by the adja-
1 0 0 1 0 0 0 0 0 cency matrix A.
2
1 0 0 1 1 0 0 1 Definition: A nonzero element in the weight matrix W,
Adir = 3
1 0 0 0 0 0 0 0. (2) Wmn ∈ W, designates both an edge between the vertices m
4
0 1 1 0 0 1 0 0 and n and the corresponding weight. The value Wmn = 0
5
0 0 0 0 0 0 0 1 indicates no edge connecting the vertices m and n. The
6 0 0 0 1 0 0 0 1 elements of a weight matrix are nonnegative real numbers.
7 0 0 1 0 0 0 1 0 Fig. 2 shows an example of a weighted undirected
Adjacency matrices not only fully reflect the structure graph, with the corresponding weight matrix given by
arising from the topology of data acquisition, but also they
admit the usual feature analysis through linear algebra, 0
0 0.23 0.74 0.24 0 0 0 0
and can be sparse, or exhibit some other interesting and 1 0.23 0 0.35 0 0.23 0 0 0
useful matrix properties.
2 0.74 0.35 0 0.26 0.24 0 0 0
Remark 1: The adjacency matrix of an undirected graph 3 0.24 0 0.26 0 0 0 0.32 0
W= .
4 0 0.23 0.24 0 0 0.51 0 0.14
is symmetric, that is,
5 0 0 0 0 0.51 0 0 0.15
A = AT . 6 0 0 0 0.32 0 0 0 0.32
7 0 0 0 0 0.14 0.15 0.32 0
0 1 2 3 4 5 6 7
Since a graph is fully determined by its adjacency ma- (4)
trix, defined over a given set of vertices, any change in In this sense, the adjacency matrix, A, can be consid-
vertex ordering will cause the corresponding changes in ered as a special case of the weight matrix, W, whereby
the adjacency matrix. all nonzero weights are equal to unity. It then follows that
the weight matrix of undirected graphs is also symmetric
Remark 2: Observe that a vertex indexing scheme does
not change the graph itself (graphs are isomorphic do- W = WT , (5)
mains), so that the relation between adjacency matrices of
4
while, in general, for directed graphs this property does For practical reasons, it is often advantageous to use
not hold. the normalized Laplacian, defined as
Definition: A degree matrix, D, for an undirected graph def
is a diagonal matrix with elements, Dmm , which are equal LN = D−1/2 (D − W)D−1/2 = I − D−1/2 WD−1/2 . (9)
to the sum of weights of all edges connected to the vertex
m, that is, the sum of elements in its m-th row Remark 5: For undirected graphs, the normalized Lapla-
cian matrix is symmetric, and has all diagonal values equal
def
N
X −1 to 1, with its trace equal to the number of vertices N .
Dmm = Wmn . Other interesting properties, obtained through Lapla-
n=0
cian normalization, shall be described later in the various
Remark 4: For an unweighted and undirected graph, the application contexts.
value of the element Dmm is equal to the number of edges One more form of the graph Laplacian is the so called
connected to the m-th vertex. random-walk Laplacian, defined as
Vertex degree centrality. The degree centrality of a def
LRW = D−1 L = I − D−1 W. (10)
vertex is defined as the number of vertices connected to
the considered vertex with a single edge, and in this way The random-walk graph Laplacian is rarely used, since
it models the importance of a given vertex. For undirected it has lost the symmetry property of the original graph
and unweighted graphs, the vertex degree centrality of a Laplacian for undirected graphs, LRW 6= LTRW .
vertex is equal to the element, Dmm , of the degree matrix. Vertex-weighted graphs. Most of the applications of
Example 1: For the undirected weighted graph from Fig. 2, graph theory are based on edge-weighted graphs, where
the degree matrix is given by edge-weighting is designated by the weight matrix, W.
Note that weighting can be also introduced into graphs
0
1.21 0 0 0 0 0 0 0
based on vertex-weighted approaches (although rather rarely),
1 0 0.81 0 0 0 0 0 0 whereby a weight is assigned to each vertex of a graph.
To this end, we can use a diagonal matrix, V, to define
2 0 0 1.59 0 0 0 0 0
3 0 0 0 0.82 0 0 0 0
. the vertex weights vi , i = 0, 1, . . . , N − 1, with one pos-
D= (6)
4 0
0 0 0 1.12 0 0 0 sible (the Chung/Langlands, [45]) version of the vertex-
5 0 0 0 0 0 0.66 0 0 weighted graph Laplacian, given by
6 0 0 0 0 0 0 0.64 0
7 0 0 0 0 0 0 0 0.61 def
LV = V1/2 LV1/2 . (11)
0 1 2 3 4 5 6 7
Observe that for V = D−1 , the vertex-weighted graph
Another important descriptor of graph connectivity is
Laplacian in (11) reduces to the standard edge-weighted
the graph Laplacian matrix, L, which combines the weight
normalized graph Laplacian in (9).
matrix and the degree matrix.
Definition: The graph Laplacian matrix is defined as 2.2. Some Frequently Used Graph Topologies
def
When dealing with graphs, it is useful to introduce a
L = D − W, (7) taxonomy of graph topologies, as follows.
where W is the weight matrix P and D the diagonal degree 1. Complete graph. A graph is complete if there ex-
matrix with elements Dmm = n Wmn . The elements of ists an edge between every pair of its vertices. There-
a Laplacian matrix are therefore nonnegative real numbers fore, the adjacency matrix of a complete graph has
at the diagonal positions, and nonpositive real numbers at elements Amn = 1 for all m 6= n, and Amm = 0, that
the off-diagonal positions. is, no self-connections are present. Fig. 3(a) gives an
For an undirected graph, the Laplacian matrix is sym- example of a complete graph.
metric, taht is, L = LT . For example, the graph Laplacian
for the weighted graph from Fig. 2 is given by
2. Bipartite graph. A graph for which the vertices, V,
1.21 −0.23 −0.74 −0.24 0 0 0 0 can be partitioned into two disjoint subsets, E and H,
−0.23 0.81 −0.35 0 −0.23 0 0 0
−0.74 −0.35 1.59 −0.26 −0.24
whereby V = E ∪H and E ∩H = ∅, such that there are
0 0 0
−0.24
no edges between the vertices within the same subset
0 −0.26 0.82 0 0 −0.32 0
L=
0 −0.23 −0.24 0 1.12 −0.51
.
0 −0.14 E or H, is referred to as a bipartite graph. Fig. 3(b)
0 0 0 0 −0.51 0.66
0 −0.15 gives an example of a bipartite undirected graph with
E = {0, 1, 2} and H = {3, 4, 5, 6}, whereby all edges
0 0 0 −0.32 0 0 0.64 −0.32
0 0 0 0 −0.14 −0.15 −0.32 0.61 designate only connections between the sets E and H.
(8) Observe also that the graph in Fig. 3(b) is a complete
5
bipartite graph, since all possible edges between the
0 1 sets E and H are present.
3
For convenience of mathematical formalism, if vertex
0
7 2 ordering is performed in a such way that all vertices
4 belonging to E are indexed before the vertices be-
1 longing to H, then the resulting adjacency matrix
5 can be written in a block form
6 3
2
0 AEH
6
A= , (12)
AHE 0
5 4
where the submatrices AEH and AHE define the re-
(a) Complete graph (b) Bipartite graph
spective connections between the vertices belonging
to the disjoint sets E and H. Observe that for an
0 1 undirected bipartite graph, AEH = ATHE . Bipartite
1 graphs are also referred to as Kuratowski graphs,
7 2
denoted by KNE ,NH , where NE and NH are the re-
7 2 spective numbers of vertices in the sets E and H.
It is important to mention that a complete bipar-
0
6 3
tite graph with three vertices in each of the sets, H
6 3
and E, is referred to as the first Kuratowski graph,
denoted by K3,3 , which may be used to define con-
5 4 ditions for a graph to be planar (more detail is given
5 4 in the sequel).
(c) Regular graph (d) Star graph Multipartite graph. A generalization of the con-
cept of bipartite graph is a multipartite (M -partite)
graph for which the vertices are partitioned into M
0
0 1 subsets, whereby each edge connects only vertices
that belong to one of M different subsets.
1
7 2
3. Regular graph. An unweighted graph is said to be
2
regular (or J -regular) if all its vertices exhibit the
same degree of connectivity, J , which is defined as
6 3
3 the number of edges connected to each vertex is J .
An example of a regular graph with J = 4 is given
5 4
4
in Fig. 3(c). From (7) and (9), the Laplacian and
the normalized Laplacian of a J -regular graph are
(e) Circular graph (f) Path graph
1
L=J I−A and LN = I − A. (13)
0 J
0 1
1
7 2 4. Planar graph. A graph that can be drawn on a
two-dimensional plane without the crossing of any
2 of its edges is called planar.
6 3 For example, if the edges (0, 2), (2, 4), (4, 6), and
3 (6, 0) in the regular graph from Fig. 3(c) are plotted
as arches outside the circle defined by the vertices, all
5 4 instances of edge crossing will be avoided and such
4
(g) Directed circular graph (h) Directed path graph graph presentation will be planar.
6
considered as a special case of a complete bipartite 2.3. Properties of Graphs and Associated Matrices
graph, with only one vertex in the first set, E. The The notions from graph analysis that are most relevant
vertex degree centrality for the central vertex of a to the processing of data on graphs are:
star graph with N vertices is therefore N − 1.
M1 : Symmetry: For an undirected graph, the matrices
A, W, and L are all symmetric.
6. Circular (ring) graph. A graph is said to be cir-
cular if the degree of its every vertex is J = 2. This M2 : A walk between a vertex m and a vertex n is a con-
graph is also a regular graph with J = 2. An ex- nected sequence of edges and vertices that begins at
ample of a circular graph with 8 vertices is given in the vertex m and ends at the vertex n. Edges and
Fig. 3(e). vertices can be included in a walk more than once-
There is also more than one walk between vertices
m and n.
7. Path graph. A series of connected vertices defines
a path graph, whereby the first and the last vertex The length of a walk is equal to the number of in-
are of connectivity degree J = 1, while all other cluded edges in unweighted graphs. The number of
vertices are of the connectivity degree J = 2. An walks of length K, between a vertex m and a ver-
tex n, is equal to the value of the mn-th element of
example of a path graph with 5 vertices is presented
in Fig. 3(f). the matrix AK , which can be proved through math-
ematical induction, as follows [46]:
(i) The elements, Amn , of the adjacency matrix A,
8. Directed circular graph. A directed graph is said by definition, indicate the existence of a walk of
to be circular if each vertex is related to only one length K = 1 (an edge, in this case) between the
predecessor vertex and only one successor vertex. An vertices m and n in a graph;
example of a circular directed graph with 8 vertices
is given in Fig. 3(g), with the adjacency matrix (ii) Assume that the elements of matrix AK−1 are
equal to the number of walks of length K−1, between
two arbitrary vertices m and n;
0 0 0 0 0 0 0 0 1 (iii) The number of walks of length K between two
1 1 0 0 0 0 0 0 0 vertices, m and n, is then equal to the number of
2 0 1 0 0 0 0 0 0 all walks of length K − 1, between the vertex m and
3 0 0 1 0 0 0 0 0 an intermediate vertex s, s ∈ V, which itself is indi-
A= . (14)
4 0 0 0 1 0 0 0 0 cated by the element at the position ms of the matrix
5
0 0 0 0 1 0 0 0 AK−1 , according to (ii), for all s for which there is
6 0 0 0 0 0 1 0 0 an edge from vertex s to the destination vertex n. If
7 0 0 0 0 0 0 1 0 an edge between the intermediate vertex s and the
0 1 2 3 4 5 6 7 final vertex n exists, then Asn = 1. This means that
the number of walks of length K between the ver-
Remark 6: The adjacency matrix of any directed tices m and n is obtained as the inner product of the
or undirected circular graph is a circulant matrix. m-th row of AK−1 with the n-th column in A, to
yield the element mn of matrix AK−1 A = AK .
Example 2: Consider the vertex 0 and the vertex 4 in
9. Directed path graph. A directed path graph con- the graph from Fig. 4, and only the walks of length K =
sists of a series of vertices connected in only one di- 2. The adjacency matrix for this graph is given in (1).
rection, whereby the first and the last vertex do not There are two such walks (0 → 1 → 4 and 0 → 2 → 4),
have a respective predecessor or successor. An ex- so that the element A204 in the first row and the fifth
ample of a directed path graph with 5 vertices is column of matrix A2 , is equal to 2, as designated in bold
presented in Fig. 3(h). font in the matrix A2 below,
7
0 1 0 1
2 2
3 3
4 4
6 5 6 5
7 7
2
thus indicating K = 2 walks between these vertices.
3
4
M3 : The number of walks of length not higher than K,
between the vertices m and n, is given by the mn-th
element of the matrix 6 5
7
BK = A + A2 + · · · + AK , (16)
(a)
that is, by a value in its m-th row and n-th column.
In other words, the total number of walks is equal to Figure 5: The K-neighborhoods of vertex 0 for the graph from Fig.
4, where: (a) K = 1 and (b) K = 2. The neighboring vertices are
the sum of all walks, which are individually modeled shaded.
by Ak , k = 1, 2, . . . , K, as stated in property M2 .
2 2
3 3
4 4
6 5 6 5
7 (a) 7
0.23
0 1
0.
Figure 7: A disconnected graph which consists of two sub-graphs.
74
24 0.35
0. 2
0.23
0.1
4 disjoint components (subgraphs) of a graph, these
0.5
6 0.32 0.15 5
7 (b)
A1 0 · · · 0
0 A2 · · · 0
Figure 6: Concept of the spanning tree for graphs. (a) A spanning
A= . . .. (17)
tree for the unweighted graph from Fig. 1(a). (b) A spanning tree . . . .
. . . .
for the weighted graph from Fig. 2, designated by thick blue edges.
The graph edges in thin blue lines are not included in this spanning 0 0 · · · AM
tree.
L1 0 · · · 0
0 L2 · · · 0
L= . . .. . (18)
M10 : Spanning Tree and Minimum Spanning Tree. The .. .. . .. .
spanning tree of a graph is a subgraph that is tree- 0 0 · · · LM
shaped and connects all its vertices together. A tree
does not have cycles and cannot be disconnected. Note that this block diagonal form is obtained only if
The cost of the spanning tree represents the sum of the vertex numbering follows the subgraph structure.
the weights of all edges in the tree. The minimum
spanning tree is a spanning tree for which the cost Example 3: Consider a graph derived from Fig. 1(a) by
removing some edges, as shown in Fig. 7. The adjacency
is minimum among all possible spanning trees in a
matrix for this graph is given by
graph. Spanning trees are typically used in graph
clustering analysis.
In the classical literature on graph theory, it is com- 0 0 1 1 1 0 0 0 0
1 1 0 1 0 0 0 0 0
monly assumed that the values of edge weights in
2 1 1 0 1 0 0 0 0
weighted graphs are proportional to the standard 3
1 0 1 0 0 0 0 0
vertex distance, rmn . However, this is not the case A= (19)
4 0 0 0 0 0 1 0 1
in data analytics on graphs, where the edge weights 5
0 0 0 0 1 0 0 1
are typically defined as a function of vertex distance,
6 0 0 0 0 0 0 0 1
for example, through a Gaussian kernel, Wmn ∼ 7 0 0 0 0 1 1 1 0
2
exp(−rmn ), or some other data similarity metric. 0 1 2 3 4 5 6 7
The cost function to minimize for the Minimum Span-
with the corresponding Laplacian in the form
ning Tree (MST) can then be defined as a log-sum
of distances, rmn = −2 ln Wmn . A spanning tree for
3 −1 −1 −1 0 0 0 0
the graph from Fig. 2 is shown in Fig. 6. The cost −1 2 −1 0 0 0 0 0
for this spanning tree, calculated as a sum of all dis-
−1 −1 3 −1 0 0 0 0
tances (log-weights), rmn , is 15.67. −1 0 −1 2 0 0 0 0
L= .
0 0 0 0 2 −1 0 −1
0
0 0 0 −1 2 0 −1
M11 : An undirected graph is called connected if there exists 0 0 0 0 0 0 1 −1
a walk between each pair of its vertices. 0 0 0 0 −1 −1 −1 3
(20)
Observe that, as elaborated above, these matrices are in
M12 : If the graph is not connected, then it consists of two a block-diagonal form with the two constituent blocks
or more disjoint but locally connected subgraphs (graph clearly separated. Therefore, for an isolated vertex in a
9
graph, the corresponding row and column of the matrices 2a 2b
A and L will be zero-valued.
2 4a 4b
A = A1 + A2 .
0a 0b
To maintain the binary values in the resultant ad-
jacency matrix, Amn ∈ {0, 1}, a logical (Boolean) Figure 8: Kronecker (tensor) product of two graphs.
summation rule, e.g., 1 + 1 = 1, may be used for
matrix addition. In this monograph, the arithmetic 3
summation rule is assumed in data analytics algo-
5
rithms, as for example, in equation (16) in property
M3 .
2 a b c =
4
M14 : The Kronecker (tensor) product of two disjoint graphs
G1 = (V1 , B1 ) and G2 = (V2 , B2 ) yields a new graph 1
G = (V, B) where V = V1 × V2 is a direct product
of the sets V1 and V2 , and ((n1 , m1 ), (n2 , m2 )) ∈ B
only if (n1 , n2 ) ∈ B1 and (m1 , m2 ) ∈ B2 . 3a 3b 3c
A = A1 ⊗ A2 . 4a 4b 4c
1a 1b 1c
An illustration of the Kronecker product for two sim-
ple graphs is given in Fig. 8.
Figure 9: Cartesian product of two graphs.
Definition: The minimal polynomial of the considered ad- λ ∈ {−1.5616, −1.4812, −1, −1, 0, 0.3111, 2.1701, 2.5616}.
jacency matrix, A, is obtained from its characteristic poly-
Observe that the eigenvalue λ = −1 is of multiplicity higher
nomial by reducing the algebraic multiplicities of all eigen-
than 1 (multiplicity of 2), so that the corresponding minimal
values to unity, and has the form polynomial becomes
Pmin (λ) = (λ − µ1 )(λ − µ2 ) · · · (λ − µNm ). Pmin (λ) = λ7 − λ6 − 8λ5 + 2λ4 + 19λ3 + 7λ2 − 4λ.
3.1.1. Properties of the characteristic and minimal poly- Although this graph is disconnected, the largest eigenvalue of
nomial its adjacency matrix, λmax = 2.5616, is of multiplicity 1. Re-
lation between the graph connectivity and the multiplicity of
P1 : The order of the characteristic polynomial is equal
eigenvalues will be discussed later.
to the number of vertices in the considered graph.
P2 : For λ = 0, P (0) = det(A) = −λ0 (−λ1 ) · · · (−λN −1 ). 3.2. Spectral Graph Theory
P3 : The sum of all the eigenvalues is equal to the sum If all the eigenvalues of A are distinct (of algebraic
of the diagonal elements of the adjacency matrix, A, multiplicity 1), then the N equations in the eigenvalue
that is, its trace, tr{A}. For the characteristic poly- problem in (21), that is, Auk = λk uk , k = 0, 1, . . . , N − 1,
nomial of the adjacency matrix, P (λ), this means can be written in a compact form as one matrix equation
that the value of c1 in (22) is c1 = tr{A} = 0. with respect to the adjacency matrix, as
U−1 = UT . 0
1
2 4
3
1 2 5 6
0 3 4 7 5
T
Remark 10: For directed graphs, in general, A 6= A . 6 7
1 3 4 5 6
value is equal to its geometrical multiplicity. 7
5
6
For some directed graphs, the eigenvalues of their ad-
jacency matrix may be with algebraic multiplicity higher 0
1
than one, and the matrix A may not be diagonalizable. 2
3 4
In such cases, the algebraic multiplicity of the considered 0 1 5 6 7
2 3 4
eigenvalue is higher than its geometric multiplicity and the 6 7
5
1 4 6 7
Remark 11: The spectral theory of graphs studies proper- 7
5
6
ties of graphs through the eigenvalues and eigenvectors of
their associated adjacency and graph Laplacian matrices. 0
1
Example 7: For the graph presented in Fig. 1(a), the graph 3
2 4
0 3 6 7
adjacency spectrum is given by λ ∈ {−2, −1.741, −1.285, −0.677, 1 2 4 5 5
−0.411, 1.114, 1.809, 3.190}, and is shown in Fig. 10(top). 6 7
12
3.2.1. The DFT basis functions as a special case of eigen-
vectors of the adjacency matrix
For continuity with standard spectral analysis, we shall
first consider directed circular graphs, as this graph topol-
0 1 2 3 4
ogy encodes the standard time and space domains.
5 6 7 Eigenvalue decomposition for the directed circular graph
in Fig. 3(g), assuming N vertices, follows from the defini-
tion Auk = λk uk , and the form of the adjacency matrix in
3
2 (14). Then, the elements of vector Auk are uk (n − 1), as
4
1 3 6
5 1
effectively matrix A here represents a shift operator, while
0 2 4 5 7 0 the elements of vector λk uk are λk uk (n), to give
6 7
uk (n − 1) = λk uk (n), (24)
3
2
0 2 4 6
5
4 1 where uk (n) are the elements of the eigenvector uk for
1 3 5 7 0 given vertex indices n = 0, 1, . . . , N − 1, and k is the index
6 7
of an eigenvector, k = 0, 1, . . . , N − 1. This is a first-
order linear difference equation, whose general form for a
3
2 discrete signal x(n) is x(n) = ax(n − 1), for which the
3 4 7
5
4 1 solution is
0 1 2 5 6 0
1
uk (n) = √ ej2πnk/N and λk = e−j2πk/N ,
7
6
(25)
N
3
2
4
with k = 0, 1, . . . , N −1. It is straightforward to verify that
5 1
0 2 3 6 7 this solution satisfies the difference equation (24). Since
1 4 5 0
6 7 the considered graph is circular, the eigenvectors also ex-
hibit circular behavior, that is, uk (n) = uk (n + N ). For
3 convenience, a unit energy condition is used to find the
2
5
4 1
constants within the general solution of this first-order lin-
0
1 2
3 4 5
6 7
ear difference equation. Observe that the eigenvectors in
0
6 7 (25) correspond exactly to the standard harmonic basis
functions in DFT.
3
2 Remark 13: Classic DFT analysis may be obtained as a
3 5 6 7
5
4 1 special case of the graph spectral analysis in (25), when
0 1 2 4 0 considering directed circular graphs. Observe that for cir-
7
6
cular graphs, the adjacency matrix plays the role of a shift
operator, as seen in (24), with the elements of Auk equal
3
2 to uk (n − 1). This property will be used to define the shift
4
0 1 6 7
5 1
operator on a graph in the following sections.
2 3 4 5 0
6 7
3.2.2. Decomposition of graph product adjacency matrices
3 We have already seen in Fig. 8 and Fig. 9 that complex
2
5
4 1
graphs, for example those with a three-dimensional vertex
0 1 2 3 4 5 6 7
space, may be obtained as a Kronecker (tensor) product
0
6 7 or a Cartesian (graph) product of two disjoint graphs G1
and G2 . Their respective adjacency matrices, A1 and A2 ,
are correspondingly combined into the adjacency matrices
Figure 11: Eigenvalues, λk , for spectral indices (eigenvalue number)
k = 0, 1, . . . , N − 1, and elements of the corresponding eigenvectors, of the Kronecker graph product, A⊗ = A1 ⊗ A2 and the
uk (n), as a function of the vertex index n = 0, 1, . . . , N − 1, for the Cartesian graph product, A⊕ = A1 ⊕ A2 , as described in
adjacency matrix, A, of the undirected graph presented in Fig. 1(a) properties M14 and M15 .
with index reordering according to the scheme [0, 1, 2, 3, 4, 5, 6, 7] → Graph Kronecker product. For the eigendecomposi-
[3, 2, 4, 5, 1, 0, 6, 7]. The distinct eigenvectors are shown both on the
vertex index axis, n, (left) and on the graph itself (right). Compare tion of the Kronecker product of matrices A1 and A2 , the
with the results for the original vertex ordering in Fig. 10. following holds
A⊗ = A1 ⊗ A2 = (U1 Λ1 UH H
1 ) ⊗ (U2 Λ2 U2 )
13
or in other words, the eigenvectors of the adjacency ma-
trix of the Kronecker product of graphs are obtained by
a Kronecker product of the eigenvectors of the adjacency
(A ) (A )
matrices of individual graphs, as uk+lN1 = uk 1 ⊗ ul 2 ,
k = 0, 1, 2, . . . , N1 − 1, l = 0, 1, 2, . . . , N2 − 1.
Remark 14: The eigenvectors of the individual graph ad-
(A ) (A )
jacency matrices, uk 1 and uk 2 , are of much lower di-
mensionality than those of the adjacency matrix of the
resulting graph Kronecker product. This property can be
used to reduce computational complexity when analyzing
data observed on this kind of graph. Recall that the eigen-
values of the resulting graph adjacency matrix are equal
to the product of the eigenvalues of adjacency matrices of
Figure 12: Graph Cartesian product of two planar circular un-
the constituent graphs, G2 and G2 , that is, weighted graphs, with N = 8 vertices, produces a three-dimensional
(A1 ) (A2 )
torus topology.
λk+lN1 = λk λl .
3 4
its corresponding constant unit √ energy√eigenvector is 0 2 5 6
1 3 4 7
given by u0 = [1, 1, . . . , 1]T / N = 1/ N . 6 7
5
0.26
3
0 1 2 3 4 5 6 7
4
4
0.1
0.5
1
0
1
6 0.32 0.15 5 2
7 3 4
0 1 2 3 4 5 6 7 5
6 7
Figure 14: A disconnected weighted graph which consists of two
sub-graphs.
0
1
2 4
3
graph Laplacian matrix and any graph. In this sense, 0 1 2 3 4 5 6 7 5
the graph Laplacian matrix carries more physical mean- 6 7
4
connected. If λ2 > 0, then there are exactly two in- 4 5
3
0 1 2 3 6 7
dividually connected but globally disconnected com- 6 7
5
0 1 2 3 5 6 5
6 7
is equal to 0, since λ = 0 is an eigenvalue for the
Laplacian matrix.
0
1
For unweighted graphs, the coefficient c1 is equal 3
2 4
0 1 3
to the number of edges multiplied by −2. This is 2 4 5 6 7 5
straightforward to show following the relations from 6 7
16
with the eigenvalues λ ∈ {0, 5.5616, 5, 4, 4, 3, 1, 1}. Ob- L8 : The eigenvalues and eigenvectors of the normalized
serve that the eigenvalues λ = 1 and λ = 4 are of multi- Laplacian of a bipartite graph, with the disjoint sets
plicity higher than one. The minimal polynomial there- of vertices E and H, satisfy the relation, referred to
fore becomes Pmin (λ) = λ6 − 19λ5 + 139λ4 − 485λ3 + as the graph spectrum folding, given by
796λ2 − 480λ.
λk = 2 − λN −k (28)
For the disconnected graph in Fig. 7, the characteristic
polynomial of the Laplacian is given by uE uE
uk = and uN −k = , (29)
8 7 6 5 4 3 2
uH −uH
P (λ) = λ −18λ +131λ −490λ +984λ −992λ +384λ ,
where uk designates the k-th eigenvector of a bipar-
with the eigenvalues λ ∈ {0, 0, 1, 2, 3, 4, 4, 4}. The eigen- tite graph, uE is its part indexed on the first set of
value λ = 0 is of algebraic multiplicity 2 and the eigen- vertices, E, while uH is the part of the eigenvector
value λ = 4 of algebraic multiplicity 3, so that the mini- uk indexed on the second set of vertices, H.
mal polynomial takes the form
In order to prove this property, we shall write the
5 4 3 2
Pmin (λ) = λ − 10λ + 35λ − 50λ + 24λ adjacency and the normalized Laplacian matrices of
an undirected bipartite graph in their block forms
Since the eigenvalue λ = 0 is of algebraic multiplicity
2, property L2 indicates that this graph is disconnected, 0 AEH I LEH
A= and L N = .
with two disjoint sub-graphs as its constituent compo- ATEH 0 LTEH I
nents. The eigenvalue relation, LN uk = λk uk , can now be
evaluated as
uE + LEH uH u
L5 : Graphs with identical spectra are called isospectral LN uk = T = λk E .
or cospectral graphs. However, isospectral graphs LEH uE + uH uH
are not necessary isomorphic, and construction of From there, we have uE +LEH uH = λk uE and LTEH uE +
isospectral graphs that are not isomorphic is an im- uH = λk uH , resulting in LEH uH = (λk − 1)uE and
portant topic in graph theory. LTEH uE = (λk − 1)uH , to finally yield
Remark 19: A complete graph is uniquely deter- uE uE
LN = (2 − λk ) .
mined by its Laplacian spectrum [48]. The Laplacian −uH −uH
spectrum of a complete unweighted graph, with N
This completes the proof.
vertices, is λk ∈ {0, N, N, . . . , N }. Therefore, two
complete isospectral graphs are also isomorphic. Since for the graph Laplacian λ0 = 0 always holds
(see the property L1 ), from λk = 2 − λN −k in (28),
it then follows that the largest eigenvalue is λN = 2,
L6 : For a J -regular graph, as in Fig. 3(c), the eigen- which also proves the property L7 for a bipartite
vectors of the graph Laplacian and the adjacency graph.
matrices are identical, with the following relation for
3.3.2. Fourier analysis as a special case of the Laplacian
the eigenvalues,
spectrum
λk
(L)
= J − λk ,
(A) Consider the undirected circular graph from Fig. 3(e).
Then, from the property L1 , the eigendecomposition rela-
where the superscript L designates the Laplacian and tion for the Laplacian of this graph, Lu = λu, admits a
superscript A the corresponding adjacency matrix. simple form
This follows directly from UT LU = UT (J I − A)U. −u(n − 1) + 2u(n) − u(n + 1) = λu(n). (30)
This is straightforward to show by inspecting the Lapla-
L7 : The eigenvalues of the normalized graph Laplacian, cian for the undirected circular graph from Fig. 3(e), with
LN = I−D−1/2 AD−1/2 , are nonnegative and upper- N = 8 vertices for which the eigenvalue analysis is based
bounded by on
0 ≤ λ ≤ 2.
2 −1 0 0 0 0 0 −1
u(0)
−1 2 −1 0 0 0 0 0
The equality for the upper bound holds if and only if u(1)
the graph is a bipartite graph, as in Fig. 3(b). This 0 −1 2 −1 0 0 0 0 u(2)
will be proven within the next property. 0 0 −1 2 −1 0 0 0
u(3) .
Lu = 0
0 0 −1 2 −1 0 0 u(4)
0
0 0 0 −1 2 −1 0
u(5)
0 0 0 0 0 −1 2 −1 u(6)
−1 0 0 0 0 0 −1 2 u(7)
(31)
17
This directly gives the term −u(n − 1) + 2u(n) − u(n + into nonverlapping vertex subsets, with data in each sub-
1), while a simple inspection of the values u(0) and u(N ) set expected to exhibit relative similarity in same sense,
illustrates the circular nature of the eigenvectors; see also the segmentation of a graph refers to its partitioning into
Remark 6. The solution to the second order difference nonoverlapping graph segments (components).
equation in (30) is uk (n) = cos( 2πknN + φk ), with λk = The notion of vertex similarity metrics and their use
2πk to accordingly cluster the vertices into sets, Vi , of “re-
2(1−cos( N )). Obviously, for every eigenvalue, λk (except
for λ0 and for the last eigenvalue, λN −1 , for an even N ), lated” vertices in graphs, has been a focus of significant
we can choose to have two orthogonal eigenvectors with, research effort in machine learning and pattern recogni-
for example, φk = 0 and φk = π/2. This means that tion; this has resulted in a number of established vertex
most of the eigenvalues are of algebraic multiplicity 2, i.e., similarity measures and corresponding methods for graph
λ1 = λ2 , λ3 = λ4 , and so on. This eigenvalue multiplicity clustering [49]. These can be considered within two main
of two can be formally expressed as categories (i) clustering based on graph topology and (ii)
( spectral (eigenvector-based) methods for graph clustering.
2 − 2 cos(π(k + 1)/N ), for odd k = 1, 3, 5, . . . , Notice that in traditional clustering, a vertex is as-
λk =
2 − 2 cos(πk/N ), for even k = 2, 4, 6, . . . . signed to one cluster only. The type of clustering where
a vertex may belong to more than one cluster is referred
For an odd N , λN −2 = λN −1 , whereas for an even N we to as fuzzy clustering [49, 50], an approach that is not yet
have λN −1 = 2 which is of algebraic multiplicity 1. widely accepted in the context of graphs.
The corresponding eigenvectors u0 , u1 , . . . , uN −1 , then
have the form 4.1. Clustering based on graph topology
Among many such existing methods, the most popular
sin(π(k + 1)n/N, ) for odd k, k < N − 1
ones are based on:
uk (n) = cos(πkn/N ), for even k • Finding the minimum set of edges whose removal
cos(πn), for odd k, k = N − 1, would disconnect a graph in some “optimal” or “least
(32) disturbance” way (minimum cut based clustering).
18
E = {0, 1, 2, 3} 4.1.2. Maximum-flow minimum-cut approach
0.23 This approach to the minimum cut problem employs
0 1
0.
74 the framework of flow networks.
0.
24 0.35
2
Definition: A flow network is a directed graph with an
0.23
3
0.26 0.2 arbitrary number of vertices, N ≥ 3, but which involves
4
two given vertices (nodes) called the source vertex, s,
) = 0.7
9 4
ut(E , H and the sink vertex, t, whereby the capacity of edges (arcs)
0.32
C
4 is defined by their weights. The flow (of information, wa-
0.1
0.5
1
0.32 ter, traffic, ...) through an edge cannot exceed its capacity
6 0.15 5
7 (the value of edge weight). For any vertex in the graph the
H = {4, 5, 6, 7}
sum of all input flows is equal to the sum of all its output
flows (except for the source and sink vertices).
Figure 16: A cut for the weighted graph from Fig. 2, with the disjoint
subsets of vertices defined by E = {0, 1, 2, 3} and H = {4, 5, 6, 7}. Problem formulation. The maximum-flow minimum-
The edges between the sets E and H are designated by thin red lines. cut solution to the graph partitioning aims to find the
The cut, Cut(E, H), is equal to the sum of the weights that connect maximum value of flow that can be passed through the
sets E and H, and has the value Cut(E, H) = 0.32+0.24+0.23 = 0.79.
graph (network flow) from the source vertex, s, to the sink
vertex, t. The solution is based on the max-flow min-cut
theorem which states that the maximum flow through a
graph from a given source vertex, s, to a given sink vertex,
Remark 20: For clarity, we shall focus on the case with
t, is equal to the minimum cut, that is, the minimum sum
k = 2 disjoint subsets of vertices. However, the analysis
of those edge weights (capacities) which, if removed, would
can be straightforwardly generalized to k ≥ 2 disjoint sub-
disconnect the source, s from the sink, t (minimum cut ca-
sets of vertices and the corresponding minimum k-cuts.
pacity) [51, 57]. Physical interpretation of this theorem is
Example 12: Consider the graph in Fig. 2, and the sets of obvious, since the maximum flow is naturally defined by
vertices E = {0, 1, 2, 3} and H = {4, 5, 6, 7}, shown in Fig. the graph flow bottleneck between the source and sink ver-
16. Its cut into the two components (sub-graphs), E and H, tices. The capacity of the bottleneck (maximum possible
involves the weights of all edges which exist between these two
flow) will then be equal to the minimum capacity (weight
sets, that is, Cut(E, H) = 0.32 + 0.24 + 0.23 = 0.79. Such edges
values) of the edges which, if removed, would disconnect
are shown by thin red lines in Fig. 16.
the graph into two parts, one containing vertex s and the
Definition: A cut which exhibits the minimum value of other containing vertex t. Therefore, the problem of max-
the sum of weights between the disjoint subsets E and H, imum flow is equivalent to the minimum cut (capacity)
considering all possible divisions of the set of vertices, V, problem, under the assumption that the considered ver-
is referred to as the minimum cut. Finding the minimum tices, s and t, must belong to different disjoint subsets of
cut of a graph in this way is a combinatorial problem. vertices E and H. This kind of cut, with predefined vertices
Remark 21: The number of all possible combinations to s and t, is called the (s, t) cut.
split an even number of vertices, N , into two disjoint sub- Remark 22: In general, if the source and sink vertices are
sets is given by not given, the maximum flow algorithm should be repeated
N N
N
N
for all combinations of the source and sink vertices in order
C= + + ··· + + /2. to find the minimum cut of a graph.
1 2 N/2 − 1 N/2
The most widely used approach to solve the minimum-
To depict the computational burden associated with this cut maximum-flow problem is the Ford–Fulkerson method
“brute force” graph cut approach, even for a relatively [51, 57].
small graph with N = 50 vertices, the number of combina- Example 14: Consider the weighted graph from Fig. 2, with
tions to split the vertices into two subsets is C = 5.6 ·1014 . the assumed source and sink vertices, s = 0 and t = 6, as
shown in Fig. 17(a). The Ford–Fulkerson method is based on
Example 13: The minimum cut for the graph from Fig. 16 is
the analysis of paths and the corresponding flows between the
Cut(E, H) = 0.32 + 0.14 + 0.15 = 0.61 source and sink vertex. One such possible path between s and t,
0 → 1 → 4 → 5 → 7 → 6, is designated by the thick line in Fig.
for E = {0, 1, 2, 3, 4,5} and
H= {6,
7}. This can be confirmed 17(a). Recall that the maximum flow, for a path connecting the
by considering all 81 + 82 + 83 + 84 /2 = 127 possible cuts, that vertices s = 0 and t = 6, is restricted by the minimum capacity
is, all combinations of the subsets E and H for this small size (equal to the minimum weight) along the considered path. For
graph or by using, for example, the Stoer-Wagner algorithm the considered path 0 → 1 → 4 → 5 → 7 → 6 the maximum
[56]. flow from s = 0 to t = 6 is therefore equal to
19
since the minimum weight along this path is that connecting
vertices 5 and 7, W57 = 0.15. The value of this maximum flow is
then subtracted from each capacity (weight) in the considered
path, with the new residual edge capacities (weights) designated
in red in the residual graph in Fig. 17(a). The same procedure
0.08
0.23 is repeated for the remining possible paths 0 → 3 → 6, 0 →
s 0 1
0.
74
2 → 4 → 7 → 6, and 0 → 2 → 3 → 6, with appropriate
24 0.35
0. 2 corrections to the capacities (edge weights) after consideration
0.23
0.08
of each path. The final residual form of the graph, after zero-
0.26 0.2
3 4 capacity edges are obtained in such a way that no new path with
4 nonzero flow from s to t can be defined, is given in Fig. 17(b).
For example, if we consider the path 0 → 1 → 2 → 3 → 6
0.32
0.3
4
0.1 0.5 (or any other path), in the residual graph, then its maximum
6
1
6 0.
32 5
0.1 5
flow would be 0, since the residual weight in the edge 3 → 6 is
t 0.17 7 0 equal to 0. The minimum cut has now been obtained as that
(a) which separates the sink vertex, t = 6, and its neighborhood
from the the source vertex, s = 0, through the remaining zero-
E = {0, 1, 2, 3, 4, 5} capacity (zero-weight) edges. This cut is shown in Fig. 17(b),
0.08 and separates the vertices H = {6, 7} from the rest of vertices
s 0 1
0. by cutting the edges connecting vertices 3 → 6, 4 → 7, and
52
0 0.35 5 → 7. The original total weights of these edges are Cut(E, H)
2
0.08
)= directed graph with every edge being split into a pair of edges
0.61 0
0.3
6 0.03 0 5
7
tions. After an edge is used in one direction (for example, edge
t 5 − 7 in Fig. 17(a)) with a flow equal to its maximum capac-
H = {6, 7} ity of 0.15 in the considered direction, the other flow direction
(b)
(sister edge) becomes 0.30, as shown in Fig. 17(c). The edge
with opposite direction could be used (up the algebraic sum of
0.38
s 0
flows in both directions being equal to the total edge capacity)
1
0.08 to form another path (if possible) from the source to the sink
2 vertex. More specifically, the capacity of an edge (from the
0.08
0.38
residual capacities for the path from Fig. 17(a) are given in
0.6
6
0.17 0 Fig. 17(c). For clarity, the edge weights which had not been
6 5 changed by this flow are not shown in Fig. 17(c).
t 7
0.47 0.30 (c)
Figure 17: Principle of the maximum flow minimum cut method.
4.1.3. Normalized (ratio) minimum cut
(a) The weighted graph from Fig. 2, with the assumed source ver- A number of optimization approaches may be employed
tex s = 0 and sink vertex t = 6, and a path between these two to enforce some desired properties on graph clusters. One
vertices for which the maximum flow is equal to the minimum ca- such approach is the normalized minimum cut, which is
pacity (weight) along this path, W57 = 0.15. This maximum flow
value, W57 = 0.15, is then subtracted from all the original edge ca- commonly used in graph theory, and is introduced by pe-
pacities (weights) to yield the new residual edge capacities (weights) nalizing the value of Cut(E, H) by an additional term (cost)
which are shown in red. (b) The final edge capacities (weights) af- to enforce the subsets E and H to be simultaneously as
ter the maximum flows are subtracted for all paths 0 → 3 → 6,
large as possible. An obvious form of the normalized cut
0 → 2 → 4 → 7 → 6, and 0 → 2 → 3 → 6, between vertices s = 0
and t = 6, with the resulting minimum cut now crossing only the (ratio cut) is given by [58]
zero-capacity (zero-weight) edges with its value equal to the sum of 1
their initial capacities (weights), shown in Panel (a) in black. (c) A 1 X
directed form of the undirected graph from (a), with the same path
CutN (E, H) = + Wmn , (33)
NE NH
and the residual capacities (weights) given for both directions. m∈E
n∈H
20
0 1 4.1.5. Other forms of the normalized cut
In addition to the two presented forms of the normal-
2
ized cut, based on the number of vertices and volume,
3 other frequently used forms are:
4 1. The sparsity of a cut which is defined as
1 X
ρ(E) = Wmn , (35)
6 5 NE NV−E
7 m∈E
n∈V−E
Figure 18: A clustering scheme based on the minimum normalized where V − E is the set difference of V and E. The
cut of the vertices in the graph from Fig. 2 into two vertex clusters, sparsity of a cut, ρ(E), is related to the normalized
E = {0, 1, 2, 3} and H = {4, 5, 6, 7}. This cut corresponds to the
arbitrarily chosen cut presented in Fig. 16. cut as N ρ(E) = CutN (E, H), since H = V − E and
NE + NV−E = N . The sparsity of a graph is then
defined as the minimum sparsity of a cut. It then
Example 15: Consider again Example 12, and the graph from follows that the cut which exhibits minimum sparsity
Fig. 16. For the sets of vertices, E = {0, 1, 2, 3} and H = and the minimum normalized cut in (33) produce the
{4, 5, 6, 7}, the normalized cut is calculated as CutN (E, H) = same set E.
(1/4 + 1/4)0.79 = 0.395. This cut also represents the minimum
normalized cut for this graph; this can be confirmed by checking 2. The edge expansion of a subset, E ⊂ V, is defined by
all possible cut combinations of E and H in this (small) graph.
Fig. 18 illustrates the clustering of vertices according to the 1 X
α(E) = Wmn , (36)
minimum normalized cut. Notice, however, that in general the NE
m∈E
minimum cut and the minimum normalized cut do not produce n∈V−E
the same vertex clustering into E and H.
with NE ≤ N/2. Observe a close relation of edge
Graph separability. Relevant to this section, the mini-
expansion to the normalized cut in (33).
mum cut value admits a physical interpretation as a mea-
sure of graph separability. An ideal separability is possible 3. The Cheeger ratio of a subset, E ⊂ V, is defined as
if the minimum cut is equal to zero, meaning that there
is no edges between subsets E and H. In Example 15, 1 X
φ(E) = Wmn . (37)
the minimum cut value was CutN (E, H) = 0.395, which is min{VE , VV−E }
m∈E
not close to 0, and indicates that the segmentation of this n∈V−E
22
Figure 19: Illustration of the concept of smoothness of the graph Laplacian eigenvectors for three different graphs: The graph from Fig. 2
(left), a path graph corresponding to classic temporal data analysis (middle), and an example of a more complex graph with N = 64 vertices
(right). (a) Constant eigenvector, u0 (n), shown on the three considered graphs. This is the smoothest possible eigenvector for which the
smoothness index is λ0 = 0. (b) Slow-varying Fiedler eigenvector (the smoothest eigenvector whose elements are not constant), u1 (n), for
the three graphs considered. (c) Fast-varying eigenvectors, for k = 5 (left), and k = 30 (middle and right). Graph vertices are denoted by
black circles, and the values of elements of the eigenvectors, uk (n), by red lines, for n = 0, 1, . . . , N − 1. The smoothness index, λk , is also
given for each case.
0, 1, 2, . . . , N − 1, as shown in Fig. 20(a). This, in turn, about vertex similarity in the spectral space, or about the
means that a set of elements, u0 (n), u1 (n), u2 (n), . . . , uN −1 (n), spectrum based graph cut, segmentation, and vertex clus-
is assigned to every vertex n, as shown in Fig. 20(b). For tering.
every vertex, n, we can then group these elements into an An analogy with classical signal processing would be to
N -dimensional spectral vector assign a vector of harmonic basis function values at a time
instant (vertex) n, to “describe“ this instant, that is, to
def
qn = [u0 (n), u1 (n), . . . , uN −1 (n)], assign the n-th column of the Discrete Fourier transform
matrix to the instant n. This intuition is illustrated in Fig.
which is associated with the vertex n. Since the elements of 20(a) and 20(b).
the first eigenvector, u0 , are constant, they do not convey The spectral vectors shall next be used to define spec-
any spectral difference to the graph vertices. Therefore, tral similarity of vertices.
the elements of u0 are commonly omitted from the spectral
vector for vertex n, to yield Definition: Two vertices, m and n, are called spectrally
similar if their distance in the spectral space is within
qn = [u1 (n), . . . , uN −1 (n)], (41) a small predefined threshold. The spectral similarity be-
tween vertices m and n is typically measured through the
as illustrated in Fig. 20(b).
Euclidean norm of their spectral space distance, given by
Vertex dimensionality in the spectral space. Now
that we have associated a unique spectral vector qn in (41), def
dmn = kqm − qn k2 .
to every vertex n = 0, 1, . . . , N − 1, it is important to note
that this (N − 1)-dimensional representation of every ver- Spectral Manifold. Once graph is characterized by the
tex in a graph (whereby the orthogonal graph Laplacian original (N − 1)-dimensional spectral vectors, the so ob-
eigenvectors, u1 , u2 , . . . , uN −1 , serve as a basis of that tained vertex positions in spectral vertex representation
representation) does not affect the graph itself; this just may reside near some well defined surface (commonly a
means that the additional degrees of freedom introduced hyperplane) called a spectral manifold which is of a re-
through spectral vectors facilitate more sophisticated and duced dimensionality M < (N − 1). The aim of spec-
efficient graph analysis. For example, we may now talk tral vertex mapping is then to map each spectral vertex
23
representation from the original N -dimensional spectral 4.2.3. Indicator vector
vector space to a new spectral manifold which lies in a re- Remark 21 shows that the combinatorial approach to
duced M -dimensional spectral space, to a position closest minimum cut problem is computationally infeasible, as
to its original (N − 1)-dimensional spectral position. This even for a graph with only 50 vertices we have 5.6 · 1014
principle is related to the Principal Component Analysis such potential cuts.
(PCA) method, and this relation will be discussed later To break this Curse of Dimensionality it would be
in this section. An analogy with classical Discrete Fourier very convenient to relate the problem of the minimiza-
Transform analysis would imply a restriction of the spec- tion of the normalized cut in (33) and (34) to that of
tral analysis from the space of N harmonics to the reduced eigenanalysis of graph Laplacian. To this end, we shall
space of the M slowest-varying harmonics (excluding the introduce the notion of indicator vector x on a graph,
constant one). for which the elements take subgraph-wise constant val-
These spectral dimensionality reduction considerations ues within each disjoint subset (cluster) of vertices, with
suggest to restrict the definition of spectral similarity to these constants taking different values for different clus-
only a few lower-order (smooth) eigenvectors in the spec- ters of vertices (subset-wise constant vector ). While this
tral space of reduced dimensionality. For example, if the does not immediately reduce the computational burden
spectral similarity is restricted to the two smoothest eigen- (the same number of combinations remains as in the brute
vectors, u1 and u2 (omitting the constant u0 ), then the force method), the elements of x now uniquely reflect the
spectral vector for a vertex n would become assumed cut of the graph into disjoint subsets E, H ⊂ V.
Establishing a further link with only the smoothest eigen-
qn = [u1 (n), u2 (n)], vector of the graph Laplacian will convert the original,
computationally intractable, combinatorial minimum cut
as illustrated in Fig. 20(c) and Fig. 21(a). If for two
problem into a manageable algebraic eigenvalue problem,
vertices, m and n, the values of u1 (m) are close to u1 (n) for which the computation complexity is of the O(N 3 ) or-
and the values of u2 (m) are close to u2 (n), then these der. By casting the problem into the linear algebra frame-
two vertices are said to be spectrally similar, that is, they work, complexity of calculation can be additionally re-
exhibit a small spectral distance, dmn = kqm − qn k2 .
duced through efficient eigenanalysis methods, such as the
Finally, the simplest spectral description uses only one
Power Method which sequentially computes the desired
(smoothest nonconstant) eigenvector to describe the spec-
number of largest eigenvalues and the corresponding eigen-
tral content of a vertex, so that the spectral vector reduces vectors, at an affordable O(N 2 ) computations per itera-
to a spectral scalar tion, as shown in the Appendix.
However, unlike the indicator vector, x, the smoothest
qn = [qn ] = [u1 (n)].
eigenvector (corresponding to the smallest nonzero eigen-
whereby the so reduced spectral space is a one-dimensional value) of graph Laplacian is not subset-wise constant, and
line. so such solution would be approximate, but computation-
Example 17: The two-dimensional and three-dimensional spec-
ally feasible.
tral vectors, qn = [u1 (n), u2 (n)] and qn = [u1 (n), u2 (n), u3 (n)], Remark 24: The concept of indicator vector can be in-
of the graph from Fig. 2 are shown in Fig. 21, for n = 2 and troduced through the analysis with an ideal minimum cut
n = 6. of a graph, given by
Spectral embedding: The mapping from the reduced
X
Cut(E, H) = Wmn = 0,
dimensionality spectral space back onto the original ver- m∈E
tices is referred to as spectral Embedding. n∈H
We can proceed in two ways with the reduced spectral that is, when considering an already disjoint graph for
vertex space representation: (i) to assign the reduced di- which Cut(E, H) = 0 indicates that there exist no edges
mension spectral vectors to the original vertex positions, between the subsets E and H, that is, Wmn = 0 for m ∈ E,
for example, in the form of vertex coloring, as a basis and n ∈ H. Obviously, this ideal case can be solved with-
for subsequent vertex clustering (Section 4.2.3), or (ii) to out resorting to the combinatorial approach, since this
achieve new vertex positioning in the reduced dimension- graph is already in the form of two disconnected sub-
ality space of eigenvectors (reduced spectral space), using graphs, defined by the sets of vertices E and H. For such
eigenmaps (Section 4.4). Both yield similar information a disconnected graph, the second eigenvalue of the graph
and can be considered as two sides of the same coin [65]. Laplacian is λ1 = 0, as established by the graph Laplacian
For visualization purposes, we will use colors of the RGB property L2 . When λ1 = 0, then
system to represent the spectral vector values in a reduced N −1 N −1 2
(one, two, or three) dimensional spectral space. Vertices
X X
2uT1 Lu1 = Wmn u1 (n) − u1 (m) = 2λ1 = 0,
at the original graph positions will be colored according to m=0 n=0
the spectral vector values.
which follows from (38) and (40). Since all terms in the
last sum are nonnegative, this implies that they must be
24
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
(a) (b)
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
(c) (d)
Figure 20: Illustration of spectral vectors for the graph from Fig. 2, with N = 8 vertices. For an intuitive analogy with the classical Discrete
Fourier Transform, notice that the complex harmonic basis functions within the DFT would play the role of eigenvectors in graph spectral
representation, uk , k = 0, 1, . . . , 7. Then, the spectral vectors, qn , n = 0, 1, . . . , 7, would be analogous to the basis functions of the inverse
Discrete Fourier transform (excluding the first constant element).
zero-valued, that is, the eigenvector u1 is subset-wise con- x(n) = −1/NH for n ∈ H. The membership of a vertex,
stant, with u1 (n) = u1 (m) = c1 for m, n ∈ E and u1 (n) = n, to either the subset E or H of the ideal minimum cut is
u1 (m) = c2 for m, n, ∈ H. Since the eigenvector u1 is or- therefore uniquely defined by the sign of indicator vector
PN −1
thogonal to the constant eigenvector u0 , then n=0 u1 (n) = x = u1 . This form of x is not normalized to unit energy,
0. A possible solution for u1 (n), that satisfies the subset- as its scaling by any constant would not influence solution
wise constant form and has zero mean, is u1 (n) = c1 = for vertex clustering into subsets E or H.
1/NE for n ∈ E and u1 (n) = c2 = −1/NH for n ∈ H. We For a general graph, and following the above reasoning,
can conclude that the problem of finding an ideal mini- we here consider two specific subset-wise constant forms of
mum cut can indeed be solved by introducing an indicator the indicator vector, x, which are based on
vector x = u1 , such that x(n) = 1/NE for n ∈ E and
25
0 1
2
4
3
4 5 6 7
0 1 2 3
5
6
7
0 1
2
4
3
1 4 5
0 2 3 6 7
5
7
6
(a)
0 1
2
4
3
4 5 6 7
0 1 2 3
5
6
7
0 1
2
4
3
1 4 5
0 2 3 6 7
5
7
6
0 1
2
4
3
3 4 5 6
0 1 2 7
5
6
7
(b)
Figure 21: Illustration of the spectral vectors, qn = [u1 (n), u2 (n)] and qn = [u1 (n), u2 (n), u3 (n)], for the Laplacian matrix of the graph
in Fig. 2. (a) Two-dimensional spectral vectors, q2 = [u1 (2), u2 (2)] and q6 = [u1 (6), u2 (6)]. (b) Three-dimensional spectral vectors,
q2 = [u1 (2), u2 (2), u3 (2)] and q6 = [u1 (6), u2 (6), u3 (6)]. For clarity, the spectral vectors are shown on both the vertex index axis and directly
on graph.
(i) The number of vertices in disjoint subgraphs, Before proceeding further with the analysis of these
( two forms of indicator vector (in the next two remarks), it
1
, for n ∈ E is important to note that if we can find the vector x which
x(n) = NE 1 (42) minimizes the normalized cut, CutN (E, H) in (33), then
− NH , for n ∈ H,
the elements of vector x (their signs, sign(x(n)) = 1 for
where NE is the number of vertices in E, and NH is the n ∈ E and sign(x(n)) = −1 for n ∈ H) may be used to
number of vertices in H, and decide whether to associate a vertex, n, to either the set
(ii) The volumes of the disjoint subgraphs, E or H of the minimum normalized cut.
( Remark 25: The normalized cut, CutN (E, H), defined in
1
, for n ∈ E (33), for the indicator vector x with the elements x(n) =
x(n) = VE 1 (43)
− VH , for n ∈ H, 1/NE for n ∈ E and x(n) = −1/NH for n ∈ H, is equal to
the Rayleigh quotient of the matrix L and vector x, that
where the volumes of the sets, VE and VH , are defined as is
the sums of allPvertex degrees, Dnn ,P
in the corresponding xT Lx
CutN (E, H) = T . (44)
subsets, VE = n∈E Dnn and VH = n∈H Dnn . x x
26
To prove this relation we shall rewrite (40) as mink {uTk Luk } = mink {λk }. After neglecting the trivial
solution λ0 = 0, since it produces a constant eigenvector
N −1 N −1
1 X X 2 u0 , we next arrive at mink {λk } = λ1 and x = u1 . Note
xT Lx = Wmn x(n) − x(m) . (45) that this solution yields a general form of vector x that
2 m=0 n=0
minimizes (44). However, such a form does not necessar-
For all vertices m and n within the same subgraph, that is, ily correspond to a subset-wise constant indicator vector,
such that m ∈ E and n ∈ E, the elements of vector x are x.
therefore the same and equal to x(m) = x(n) = 1/NE . In
turn, this means that the terms (x(n) − x(m))2 in (45) are 4.2.4. Bounds on the minimum cut
zero-valued. The same holds for any two vertices belonging In general, the subset-wise constant indicator vector,
to the set H, that is, for m ∈ H and n ∈ H. Therefore, x, may be written as a linear combination of the graph
only the terms corresponding to the edges which define the Laplacian eigenvectors, uk , k = 1, 2, . . . , N − 1, to give
cut, when m ∈ E and n ∈ H, and vice versa, remain in the
sum, and they are constant and equal to (x(n) − x(m))2 = x = α1 u1 + α2 u2 + · · · + αN −1 uN −1 . (49)
(1/NE − (−1/NH ))2 , to yield
This kind of vector expansion onto the set of eigenvectors
1 1 2 X shall be addressed in more detail in Part 2 of this mono-
xT Lx = + Wmn graph. Note that the constant eigenvector u0 is omitted
NE NH
m∈E
n∈H
since the indicator vector is zero-mean by definition (or-
1 1 thogonal to a constant vector). The calculation of coeffi-
= + CutN (E, H), (46) cients αi would require the indicator vector (that is, the
NE NH
sets E and H) to be known, leading again to the combina-
where the normalized cut, CutN (E, H), is defined in (33). torial problem of vertex set partitioning. It is interesting
Finally, from the energy of the indicator vector, xT x = e2x , to note that the quadratic form of indicator vector, x,
given by (49) is equal to xT Lx = α12 λ1 + α22 λ2 + · · · +
NE NH 1 1 αN2
−1 λN −1 , and that it assumes the minimum value for
xT x = ||x||22 = e2x = 2 + 2 = + , (47)
NE NH NE NH α1 = 1, α2 = · · · = αN −1 = 0, that is, when x = u1 , which
corresponds to imposing the normalized energy condition,
which proves (44). xT x = α12 + α22 + · · · + αN
2
−1 = 1. In other words, we now
The same analysis holds if the indicator vector is nor- arrive at a physically meaningful bound
malized to unit energy, whereby x(n) = 1/(NE ex ) for n ∈ E
and x(n) = −1/(NH ex ) for n ∈ H, with ex defined in (47) λ1 ≤ xT Lx = CutN (E, H).
as ex = ||x||2 .
Observe that this inequality corresponds to the lower Cheeger
We can therefore conclude that the indicator vector,
bound for the minimum normalized cut in (33).
x, which solves the problem of minimization of the nor-
malized cut, is also a solution to (44). This minimization Remark 26: If the space of approximative solutions for
problem, for the unit energy form of the indicator vector, the indicator vector, x, is relaxed to allow for vectors that
can also be written as are not subset-wise constant (while omitting the constant
eigenvector of the graph Laplacian, u0 ), the approximative
min{xT Lx} subject to xT x = 1. (48) solution becomes x = u1 (as previously shown and illus-
trated in Example 18). The above analysis indicates that
In general, this is again a combinatorial problem, since all this solution is quasi-optimal, however, despite its simplic-
possible combinations of subsets of vertices, E and H, to- ity, the graph cut based on only the second graph Lapla-
gether with the corresponding indicator vectors, x, should cian eigenvector, u1 , typically produces a good approxi-
be considered. mation to the optimal (minimum normalized) cut.
For a moment we shall put aside the very specific (subset- It has been shown that the value of the true normal-
wise constant) form of the indicator vector and consider ized minimum cut in (33), when the indicator vector x
the general minimization problem in (48). This problem is subset-wise constant, is bounded on both sides (upper
can be solved using the method of Lagrange multipliers, and lower) with the constants which are proportional to
with the corresponding cost function the smallest nonzero eigenvalue, uT1 Lu1 = λ1 , of the graph
Laplacian. The simplest form of these bounds (Cheeger’s
L(x) = xT Lx − λ(xT x − 1).
bounds) for the cut defined by (37), has the form [66, 67]
From ∂L(x)/∂xT = 0, it follows that Lx = λx, which is λ1 def p
precisely the eigenvalue/eigenvector relation for the graph ≤ φ(V) = min{φ(E)} ≤ 2λ1 . (50)
2 E⊂V
Laplacian L, the solution of which is λ = λk and x = uk ,
for k = 0, 1, . . . , N − 1. In other words, upon replac- This shows that the eigenvalue λ1 is also a good mea-
ing x by uk in the term min{xT Lx} above, we obtain sure of a graph separability and consequantly the quality of
27
spectral clustering in the sense of a minimum normalized It is important to note that, in general, clustering re-
cut. The value of the minimum normalized cut of a graph sults based on the three forms of eigenvectors,
(also referred to as Cheeger’s constant, conductivity, or
isoperimetric number of a graph) may also be considered (i) the smoothest graph Laplacian eigenvector,
as a numerical measure of the presence of a “bottleneck” (ii) the smoothest generalized eigenvector of the Lapla-
in a graph. cian, and
4.2.5. Indicator vector for normalized graph Laplacian (iii) the smoothest eigenvector of the normalized Lapla-
We shall know address the cut based on normalized cian,
graph Laplacian, in light of the above analysis.
are different. While the method (i) favors the clustering
Remark 27: The volume normalized cut, CutV (E, H),
into subsets with (almost) equal number of vertices, the
defined in (34), is equal to
methods (ii) and (iii) favor subsets with (almost) equal
xT Lx volumes (defined as sums of the vertex degrees in the sub-
CutV (E, H) = , (51) sets). Also note that the methods (i) and (ii) approximate
xT Dx
the indicator vector in different eigenvector subspaces. All
where the corresponding, subset-wise constant, indicator three methods will produce the same clustering result for
vector has the values x(n) = 1/VE for n ∈ E and x(n) = unweighted regular graphs, for which the volumes of sub-
−1/VH for n ∈ H, while the volumes of the sets, VE and sets are proportional to the number of their corresponding
VH , are defined in (34). vertices, while the eigenvectors for all the three Laplacian
The proof is identical that given in Remark 25. For the forms are the same in regular graphs, as shown in (13).
normalized indicator vector, we have xT Dx = 1, so that
the minimization problem in (51) reduces to Generalized eigenvectors of the graph Laplacian
and eigenvectors of the normalized Laplacian. Re-
min{xT Lx} subject to xT Dx = 1. (52) call that the matrix D−1/2 is of a diagonal form, and with
positive elements. Then, the solution to (52) which is equal
If the solution space is restricted to the space of general- to the generalized eigenvector of the graph Laplacian, and
ized eigenvectors of the graph Laplacian, defined by the solution to (53) which is equal to the eigenvector of
Luk = λk Duk , the normalized Laplacian, are related as sign(y) = sign(x)
or sign(v1 ) = sign(u1 ). This indicates that if the sign of
then the solution to (52) becomes the corresponding eigenvector is used for the minimum cut
approximation (clustering), both results are the same.
x = u1 ,
where u1 is the generalized eigenvector of the graph Lapla- 4.3. Spectral clustering implementation
cian that corresponds to the lowest nonzero eigenvalue. Spectral clustering is most conveniently implemented
The eigenvectors of the normalized Laplacian, LN = using only low-dimensional spectral vectors, with the the
D−1/2 LD−1/2 , may also be used in optimal cut approx- simplest case when only a one-dimensional spectral vector
imations since the minimization problem in (51) can be is used as indicator vector. More degrees of freedom can
rewritten using the normalized Laplacian through a change be achieved by clustering schemes which use two or three
of the variable, to yield Laplacian eigenvectors, as discussed next.
uk = D−1/2 vk .
28
yields a two-level form of the spectral vector 0.5
yn = [u1 (n)/||u1 (n)||2 ] = [sign(u1 (n))],
0
and represents a step before clustering, as proposed in [63].
This is justified based on the original form of the indica- -0.5
tor vector, whose sign indicates the vertex association to
0 1 2 3 4 5 6 7 (a)
either subset E or H. For illustrative representation of the
normalized spectral vector, we may use a simple two-level
colormap and assign one of two colors to each vertex. Such 0.5
a simple algorithm for clustering is given in Algorithm 1
(for an algorithm with more options for clustering and rep- 0
resentation see the Appendix (Algorithm 3) and Remarks
-0.5
30 and 33).
0 1 2 3 4 5 6 7 (b)
Algorithm 1. Clustering using the graph Laplacian.
Input: 0.5
• Graph vertices V = {0, 1, . . . , N − 1}
• Graph Laplacian L 0
-0.5
1: [U, Λ] ← eig(L)
2: yn ← U (2, n) 0 1 2 3 4 5 6 7 (c)
3: E ← {n | yn > 0}, H ← {n | yn ≤ 0}
Output: 0.5
• Vertex clusters E and H
0
Example 18: Consider the graph from Fig. 2 and its Lapla- -0.5
cian eigenvector, u1 , from Fig. 13. The elements of this single
eigenvector, u1 , are used to encode the vertex colormap, as 0 1 2 3 4 5 6 7 (d)
shown in Fig. 23(a). Here, the minimum element of u1 was
used to select the red color (vertex 7), while the white color at Figure 22: Principle of the minimum normalized cut based clustering
vertex 0 was designated by the maximum value of this eigen- and its spectral (graph Laplacian eigenvector) based approximation;
all vectors are plotted against the vertex index n. (a) The ideal indi-
vector. Despite its simplicity, this scheme immediately allows
cator vector for a minimum normalized cut, CutN (E, H), normalized
us to threshold u1 and identify two possible graph clusters, to unit energy. (b) The graph Laplacian eigenvector, u1 . (c) The
{0, 1, 2, 3}, and {4, 5, 6, 7}, as illustrated in Fig. 23(b). The generalized eigenvector of the Laplacian, u1 . (d) The eigenvector of
same result would be obtained if the sign of u1 was used to the normalized Laplacian, v1 . The eigenvectors in (c) and (d) are
color the vertices, and this would correspond to the minimum related as u1 = D−1/2 v1 . In this case, the signs of the indicator
normalized cut clustering in Fig. 18. vector and the eigenvectors, sign(x), sign(u1 ), and sign(v1 ) are the
same in all the four vectors. The signs of these vectors then all may
The true indicator vector, x, for the minimum normal-
be used to define the minimum normalized cut based clustering into
ized cut of this graph is presented in Fig. 22(a). This vec- E and H, that is, the association of a vertex, n, to either the subset
tor is obtained by checking all the 127 possible cut combi- E or subset H .
nations of E and H in this small graph, together with the
corresponding x(n). The signs of the elements of this vec-
tor indicate the way for optimal clustering into the subsets Example 19: Consider the graph from Fig. 2, with the weight
E = {0, 1, 2, 3} and H = {4, 5, 6, 7}, while the minimum cut matrix, W, in (4), and the graph Laplacian eigenvector u1
value is CutN (E, H) = xT Lx = 0.395. Fig. 22(b) shows an (shown in Fig. 13, Fig. 19(b)(left), and Fig. 22(b)). When this
approximation of the indicator vector within the space of the eigenvector is thresholded to only two intensity levels, sign(u1 ),
graph Laplacian eigenvector, u1 . The quadratic form of the two graph clusters are obtained, as shown in Fig. 23 (right).
eigenvector, u1 , is equal to uT1 Lu1 = λ1 = 0.286. As shown in In an ideal case, these clusters may even be considered as inde-
(49), note that the true indicator vector, x, can be decomposed pendent graphs (graph segmentation being the strongest form
into the set of all graph Laplacian eigenvectors, uk , and written of clustering); this can be achieved by redefining the weights as
as their linear combination. Wnm = 0, if m and n are in different clusters, and Wnm = Wnm
The generalized Laplacian eigenvector, u1 = [0.37, 0.24, otherwise [63], for the corresponding disconnected (segmented)
0.32, 0.13, −0.31, −0.56, −0.34, −0.58], which is an approxi-
mation of the indicator vector for the minimum volume nor-
malized cut in (34), is presented in Fig. 22(c). In this case, the
generalized eigenvector indicates the same clustering subsets,
E = {0, 1, 2, 3} and H = {4, 5, 6, 7}. The eigenvector of the
normalized Laplacian, v1 , is shown in Fig. 22(d).
29
0 1 0 1 large and indicates that the segmentation in Example 19 is not
2 2 “close”.
3 3 As an illustration, consider three hypothetical but practi-
4 4 cally relevant scenarios: (i) λ2 = 0 and λ3 = 1, (ii) λ2 = 0 and
λ3 = ε, (iii) λ2 = 1 and λ3 = 1 + ε, where ε is small positive
6 5 6 5 number and close to 0. According to Remark 18, the graph
7 7
(a) (b) in case (i) consists of exactly two disconnected components,
Figure 23: Vertex coloring for the graph from Fig. 2, with
and the subsequent clustering and segmentation is appropri-
its spectrum shown in Fig. 13. (a) The eigenvector, u1 , ate, with δr = 1. For the case (ii), the graph consists of more
of the Laplacian matrix of this graph, given in (8), is nor- than two almost disconnected components and the clustering in
malized and is used to define the red color intensity levels two sets can be performed in various ways, with δr = 1/ε. Fi-
within the colormap for every vertex. For this example, u1 = nally, in the last scenario the relative gap is very small, δr = ε,
[0.42, 0.38, 0.35, 0.15, −0.088, −0.34, −0.35, −0.54]T . The largest thus indicating that the behavior of the segmented graph is not
element of this eigenvector is u1 (0) = 0.42 at vertex 0, which indi-
“close”to the original graph, that is, L̂ is not “close”to L, and
cates that this vertex should be colored by the lowest red intensity
(white), while the smallest element is u1 (7) = −0.54, so that vertex thus any segmentation into two disconnected subgraphs would
7 is colored with the strongest red color intensity. (b) Simplified produce inadequate results.
two-level coloring based on the sign of the elements of eigenvector
u1 . Remark 28: The thresholding of elements of the Fiedler
vector, u1 , of the normalized graph Laplacian, LN = D−1/2 LD−1/2 ,
performed in order to cluster the graph is referred to as
graph, whose weight matrix, Ŵ, is given by the Shi – Malik algorithm [59, 68]. Note that similar re-
sults would have been obtained if clustering was based on
0
0 0.23 0.74 0.24 0 0 0 0
the thresholding of elements of the smoothest eigenvector
corresponding to the second largest eigenvalue of the nor-
1 0.23 0 0.35 0 0 0 0 0
malized weight matrix, WN = D−1/2 WD−1/2 (Perona
2 0.74 0.35 0 0.26 0 0 0 0
– Freeman algorithm [69, 68]). This becomes clear after
3 0.24 0 0.26 0 0 0 0 0
.
Ŵ =
4 0 0 0 0 0 0.51 0 0.14
recalling that the relation between the normalized weight
5 0 0 0 0 0.51 0 0 0.15
and graph Laplacian matrices is given by
6 0 0 0 0 0 0 0 0.32
7 0 0 0 0 0.14 0.15 0.32 0 LN = D−1/2 LD−1/2 = I − D−1/2 WD−1/2 ,
0 1 2 3 4 5 6 7
LN = I − WN . (57)
(55)
The eigenvalues of these two matrices are therefore related
4.3.2. “Closeness”of the segmented and original graphs (L ) (W )
as λk N = 1 − λk N , while they share the same eigenvec-
The issue of how “close” the behavior of the weight tors.
matrix of the segmented graph, Ŵ, in (55) (and the cor-
responding L̂) is to the original W and L, in (4) and (8), 4.3.3. Clustering based on more than one eigenvector
is usually considered within matrix perturbation theory.
More complex clustering schemes can be achieved when
It can be shown that a good measure of the “closeness”
using more than one Laplacian eigenvector. In turn, ver-
is the so-called eigenvalue gap, δ = λ2 −λ1 , [63], that is the
tices with similar values of several slow-varying eigenvec-
difference between the eigenvalue λ1 associated with the
tors, uk , would exhibit high spectral similarity.
eigenvector u1 , which is used for segmentation, and the
The concept of using more than one eigenvec-
next eigenvalue, λ2 , in the graph spectrum of the normal-
tor in vertex clustering and possible subsequent graph
ized graph Laplacian (for additional explanation see Ex-
segmentation was first introduced by Scott and Longuet-
ample 23). For the obvious reason of analyzing the eigen-
Higgins [70]. They used k eigenvectors of the weight ma-
value gap at an appropriate scale, we suggest to consider
trix W to form a new N × k matrix V, for which a fur-
the relative eigenvalue gap
ther row normalization was performed. Vertex clustering
λ2 − λ1 λ1 is then performed based on the elements of the matrix
δr = =1− . (56) VVT .
λ2 λ2
For the normalized weight matrix, WN , the Scott and
The relative eigenvalue gap value range is within the inter- Longuet-Higgins algorithm reduces to the corresponding
val 0 ≤ δr ≤ 1, since the eigenvalues are nonnegative real- analysis with k eigenvectors of the normalized graph Lapla-
valued numbers sorted into a nondecreasing order. The cian, LN . Since WN and LN are related by (57), they thus
value of this gap may be considered as large if it is close have the same eigenvectors.
to the maximum eigengap value, δr = 1. Example 21: Consider two independent normalized cuts of
Example 20: The Laplacian eigenvalues for the graph in Fig. a graph, where the first cut splits the graph into the sets of
23 are λ ∈ {0, 0.29, 0.34, 0.79, 1.03, 1.31, 1.49, 2.21}, with the vertices E1 and H1 , and the second cut further splits all vertices
relative eigenvalue gap, δr = (λ2 − λ1 )/λ2 = 0.15, which is not
30
into the sets E2 and H2 , and define this two-level cut as colormaps were used for each eigenvector. The smallest eigen-
values were λ0 = 0, λ1 = 0.0286, λ2 = 0.0358, λ3 = 0.0899,
CutN 2(E1 , H1 , E2 , H2 ) = CutN (E1 , H1 ) + CutN (E2 , H2 ) λ4 = 0.104, and λ5 = 0.167, so that the largest relative gap
(58) was obtained when u1 , and u2 were used for clustering, with
the corresponding eigenvalue gap of δr = 1 − λ2 /λ3 = 0.6.
where both CutN (Ei , Hi ), i = 1, 2, are defined by (33).
If we now introduce two indicator vectors, x1 and x2 , for Remark 30: k-means algorithm. The above clustering
the two respective cuts, then, from (44) we may write schemes are based on the quantized levels of spectral vec-
xT1 Lx1 xT Lx2 tors. These can be refined using the k-means algorithm,
CutN 2(E1 , H1 , E2 , H2 ) = T
+ 2T . (59) that is, through postprocessing in the form of unsupervised
x1 x1 x2 x2
learning and in the following way,
As mentioned earlier, finding the indicator vectors, x1 and x2 , (i) After an initial vertex clustering is performed by group-
which minimize (59) is a combinatorial problem. However, if ing the vertices into Vi , i = 1, 2, . . . , k nonoverlapping ver-
the space of solutions for the indicator vectors is now relaxed
tex subsets, a new spectral vector centroid, ci , is calculated
from the subset-wise constant form to the space spanned by
as
the eigenvectors of the graph Laplacian, then the approxima-
tive minimum value of the two cuts, CutN 2(E1 , H1 , E2 , H2 ), is ci = meann∈Vi {qn },
obtained for x1 = u1 and x2 = u2 , since u1 and u2 are maxi- for each cluster of vertices Vi ;
mally smooth but not constant (for the proof see (63)-(64) and (ii) Every vertex, n, is then reassigned to its nearest (most
for the illustration see Example 22).
similar) spectral domain centroid, i, where the spectral
For the case of two independent cuts, for convenience, we
distance (spectral similarity) is calculated as kqn − ci k2 .
may form the indicator N × 2 matrix Y = [x1 , x2 ], so that the
corresponding matrix of the solution (within the graph Lapla- This two-step algorithm is iterated until no vertex changes
cian eigenvectors space) to the two normalized cuts minimiza- clusters. Finally, all vertices in one cluster are colored
tion problem, has the form based on the corresponding common spectral vector ci (or
visually, a color representing ci ).
Q = [u1 , u2 ]. Clustering refinement using the k-means algorithm is
The rows of this matrix, qn = [u1 (n), u2 (n)], are the spectral illustrated later in Example 29.
vectors which are assigned to each vertex, n. Example 23: Graphs represent quite a general mathematical
The same reasoning can be followed for the cases of three formalism, and we will here provide only one possible physi-
or more independent cuts, to obtain an N × M indicator ma- cal interpretation of graph clustering. Assume that each ver-
trix Y = [x1 , x2 , . . . , xM ] with the corresponding eigenvector tex represents one out of the set of N images, which exhibit
approximation, Q, the rows of which are the spectral vectors both common elements and individual differences. If the edge
qn = [u1 (n), u2 (n), . . . , uM (n)]. weights are calculated so as to represent mutual similarities
between these images, then spectral vertex analysis can be in-
Remark 29: Graph clustering in the spectral domain may
terpreted as follows. If the set is complete and with very high
be performed by assigning the spectral vector,
similarity among all vertices, then Wmn = 1, and λ0 = 0, λ1 =
N, λ2 = N, . . . , λN −1 = N , as shown in Remark 19. The rel-
qn = [u1 (n), . . . , uM (n)]
ative eigenvalue gap is then δr = (λ2 − λ1 )/λ2 = 0 and the
in (41), to each vertex, n, and subsequently by grouping segmentation is not possible.
Assume now that the considered set of images consists of
the vertices with similar spectral vectors into the corre-
two connected subsets with the respective numbers of N1 and
sponding clusters [63, 65]. N2 ≥ N1 of very similar photos within each subset. In this case,
Low dimensional spectral vectors (up to M = 3) can be the graph consists of two complete components (sub-graphs).
represented by color coordinates of, for example, standard According to Remarks 18 and 19, the graph Laplacian eigenval-
RGB coloring system. To this end, it is common to use ues are now λ0 = 0, λ1 = 0, λ2 = N1 , . . . , λN1 = N1 , λN1 +1 =
different vertex colors, which represent different spectral N2 , . . . , λN −1 = N2 . Then, this graph may be well segmented
vectors, for the visualization of spectral domain cluster- into two components (sub-graphs) since the relative eigenvalue
ing. gap is now large, δr = (λ2 − λ1 )/λ2 = 1. Therefore, this case
can be used for collaborative data processing within each of
Example 22: Fig. 24 illustrates several spectral vector cluster- these subsets. The analysis can be continued and further re-
ing schemes for the graph in Fig. 19 (right), based on the three fined for cases with more than one eigenvector and more than
smoothest eigenvectors u1 , u2 , and u3 . Clustering based on two subsets of vertices. Note that segmentation represents a
the eigenvector u1 , with qn = [u1 (n)], is shown in Fig. 24(b), “hard-thresholding” operation of cutting the connections be-
clustering using the eigenvector u2 only, with qn = [u2 (n)], is tween vertices in different subsets, while clustering refers to
shown in Fig. 24(d), while Fig. 24(e) illustrates the clustering just a grouping of vertices, which exhibit some similarity, into
based on the eigenvectors u3 , when qn = [u3 (n)]. Clustering subsets, while keeping their mutual connections.
based on the combination of the two smoothest eigenvectors u1 ,
and u2 , with spectral vectors qn = [u1 (n), u2 (n)], is shown in Example 24: For enhanced intuition, we next consider a real-
Fig. 24(g), while Fig. 24(h) illustrates clustering based on the world dataset with 8 images, shown in Fig. 25. The connectiv-
three smoothest eigenvectors, u1 , u2 , and u3 , whereby the spec- ity weights were calculated using the structural similarity index
tral vector qn = [u1 (n), u2 (n), u3 (n)]. In all cases, two-level
31
Figure 24: Spectral vertex clustering schemes for the graph from Fig. 19. (a) The eigenvector, u1 , of the Laplacian matrix (plotted in red
lines on vertices designated by black dots) is first normalized and is then used to designate (b) a two-level blue colormap intensity (through
its signs) for every vertex (blue-white circles). (c) The eigenvector, u2 , of the Laplacian matrix is normalized and is then used to provide
(d) a two-level green colormap intensity for every vertex. (e) The eigenvector, u3 , of the Laplacian matrix is normalized and used as (f) a
two-level red colormap intensity for every vertex. (g) Clustering based on the combination of the eigenvectors u1 and u2 . (h) Clustering
based on the combination of the eigenvectors u1 , u2 , and u3 . Observe an increase in degrees of freedom with the number of eigenvectors
used; this is reflected in the number of detected clusters, starting from two clusters in (b) and (d), via four clusters in (g), to 8 clusters in (h).
32
2
0.32 1
0.33
0.49
0.
0
37
3
0.29
0
0.3
9
0.30
0.2
1 7 4
3
0.
0.31
0.29
1 5
0.3
0.48
6
0.3
0.64
0.40 Figure 26: Graph topology for the real-world images from Fig. 25.
33
Figure 28: Vertex coloring in the benchmark Minnesota road-
map graph using the three smoothest Laplacian eigenvectors
{u2 ,u3 ,u4 }, as coordinates in the standard RGB coloring system
(a three-dimensional spectral space with the spectral vector qn =
[u2 (n), u3 (n), u4 (n)] for every vertex, n). The vertices with similar
colors are therefore also considered spectrally similar. Observe three
different clusters, characterized by the shades of predominantly red,
green, and blue color, that correspond to intensities defined by the
eigenvectors u2 (n), u3 (n), and u4 (n).
35
5 5
0.2
10
0 54
15 4
course index
20 35
-0.2 51 11 43
60 42
4
25 3 1
0.2
30
0.1
35 2 0
0.1 0.15
-0.1 0 0.05
-0.1 -0.05
40 -0.2 -0.15
10 20 30 40 50 60 70
student (vertex)
5 0.3
0.25
4
0.2
3 0.15
1
0.1 35
2
0 10 20 30 40 50 60 70 0.05
54
0
11
-0.05 42 51
5
60 4
-0.1
43
4
-0.15
-0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15
3
2
0 5 10 15 20 25 30 35 40
60 42 51 11 1 54 35 4 43
1
51 42
60 11
4
35 42
1 60
43
54
54
51 11
43
35 4
Figure 30: Illustration of spectral dimensionality reduction through an example of exam marks for a cohort of students. (a) Each of the 70
columns (students) represents a 40-dimensional vector with student marks. Therefore the dimensionality of the original representation space
is L = 40. (b) Average mark per student. (c) Average mark per course. (d) Two-dimensional graph representation of the matrix in a), where
the individual students are represented by randomly positioned vertices in the plane. To perform vertex (student) dimensionality reduction
we can use spectral vectors to reduce their original L = 40 dimensional representation space to (e) M = 3, (f) M = 2, and (g) M = 1
dimensional spectral representation spaces. (h) Vertices from path graph g) positioned on a circle (by connecting the ends of the line) which
allows us to also show the edges.
36
n, that is 0.23
0 1
0.
74
1
N −1 N −1
0.
24 0.35
X X 2
||q(m) − q(n)||22 Wmn
0.23
2 m=0 n=0 0.26
3 0.2
4
N −1 N −1 2
1 X X 4
= uk (m) − uk (n) Wmn = λk
0.32
2 m=0 n=0 4
0.1
0.5
1
will be obtained with the new positions of vertices, desig- 6 0.32 0.15 5
7 (a)
nated by qn = [u1 (n)], and for k = 1, since mink,λk 6=0 {λk } =
λ1 is the smallest nonzero eigenvalue.
Two-dimensional case. If we desire to reduce the orig-
inal L-dimensional vertex representation space to a two-
dimensional spectral space, designated by qn = [uk (n), ul (n)] 7 65 4 3 2 1 0
and defined through any two eigenvectors of the graph (b)
Laplacian, uk and ul , then the minimum sum of the weighted
squared distances between all vertices, m and n, given by Figure 31: Principle of vertex dimensionality reduction based on
the spectral vectors. (a) The weighted graph from Fig. 2 with
N −1 N −1 its vertices in a two-dimensional space. (b) The graph from
1 X X
||qm − qn ||22 Wmn (a) with its vertices located along a line (one-dimensional ver-
2 m=0 n=0 tex space), whereby the positions on the line are defined by
the one-dimensional spectral vector, qn = [u1 (n)], with u1 =
N −1 N −1
1 X X 2 [0.42, 0.38, 0.35, 0.15, −0.088, −0.34, −0.35, −0.54]T . Observe that
= uk (m) − uk (n) Wmn + this dimensionality reduction method may be used for clustering,
2 m=0 n=0 based on the vertex position on the line.
N −1 N −1 2
1 X X
ul (m) − ul (n) Wmn
2 m=0 n=0 31(b) illustrates the same graph but represented in a reduced
single-dimensional vertex space (a line). The vertex positions
= uTk Luk + uTl Lul = λk + λl (63) on the line are defined by the spectral vector, qn = [u1 (n)], with
u1 = [0.42, 0.38, 0.35, 0.15, −0.088, −0.34, −0.35, −0.54]T .
will be obtained with the new spectral positions, qn =
[uk (n), ul (n)], such that qn = [u1 (n), u2 (n)], since Remark 33: After the vertices are reordered according to
the Fiedler eigenvector, u1 , Example 28 indicates the pos-
min {λk + λl } = λ1 + λ2 (64) sibility of clustering refinement through a recalculation of
k,l,k6=l,kl6=0
normalized cuts. For the set of vertices V = {0, 1, 2, . . . , N −
for nonzero k and l, and keeping in mind that λ1 ≤ λ2 ≤ 1}, Fig. 31(b) illustrates their ordering along a line, with
λ3 ≤ · · · ≤ λN −1 . The same reasoning holds for new three- the new order {v1 , v2 , . . . , vN } = {7, 6, 5, 4, 3, 2, 1, 0}. In-
and higher-dimensional spectral representation spaces for stead of using the sign of u1 to cluster the vertices, we can
the vertices, which yields (62) as the optimal vertex posi- recalculate the normalized cuts, CutN (Ep , Hp ), with this
tions in the reduced M -dimensional vertex space. sequential vertex order, where Ep = {v1 , v2 , . . . , vp } and
The same relations hold for both the generalized eigen- Hp = {vp+1 , vp+2 , . . . , vN }, for p = 1, 2, . . . , N − 1. The
vectors of the Laplacian, defined by Luk = λk Duk , and estimation of the minimum normalized cut then becomes
the eigenvectors of the normalized Laplacian, defined by
D−1/2 LD−1/2 vk = λk vk . The only difference is in their (Ep , Hp ) = arg min{CutN (Ep , Hp )}.
p
respective normalization conditions, uTk Duk and vkT vk .
The relation between the eigenvectors of the normalized This method is computationally efficient since only (N −1)
graph Laplacian, vk , and the generalized eigenvectors of cuts, CutN (Ep , Hp ), need to be calculated. In addition,
the graph Laplacian, uk , in the form uk = D−1/2 vk , fol- the cuts CutN (Ep , Hp ) can be calculated recursively, us-
lows from their definitions (see Remark 27). Since the ing the previous CutN (Ep−1 , Hp−1 ) and the connectivity
elements u1 (n) and u2 (n) are obtained by multiplying the parameters (degree, Dpp , and weights, Wpm ) of vertex p.
elements v1 (n) and v2 (n) by the same value, 1/Dnn , that Any normalized cut form presented in Section 4.1 can also
is, [u1 (n), u2 (n)] = [v1 (n), v2 (n)]/Dnn , their normalized be used instead of CutN (Ep , Hp ). When the Cheeger ra-
forms of uk and vk are identical, tio, defined in (37), is used in this minimization, then an
qn [u1 (n), u2 (n)] [v1 (n), v2 (n)] upper bound on the normalized cut can be obtained as [67]
= = .
||qn ||2 ||[u1 (n), u2 (n)]||2 ||[v1 (n), v2 (n)]||2 p p
min{φ(Ep )} ≤ 2λ1 ≤ 2 φ(V), (65)
p
4.4.2. Examples of graph analysis in the spectral space
Example 28: The graph from Fig. 2, where the vertices reside where φ(V) denotes the combinatorial (true) minimum cut,
in a two-dimensional plane, is shown in Fig. 31(a), while Fig. with bounds given in (50).
37
Example 29: We shall now revisit the graph in Fig. 24 and ex-
amine the clustering schemes based on (i) standard Laplacian
eigenvectors (Fig. 32), (ii) generalized eigenvectors of graph
Laplacian (Fig. 33), and (iii) eigenvectors of the normalized
Laplacian (Fig. 34). Fig. 32(b) illustrates Laplacian eigenmaps
based dimensionality reduction for the graph from Fig. 24(g),
with the two eigenvectors, u1 and u2 , serving as new vertex co-
ordinates, and using the same vertex coloring scheme as in Fig.
24(g). While both the original and the new vertex space are
two-dimensional, we can clearly see that in the new vertex space (a) (b)
the vertices belonging to the same clusters are also spatially
closer, which is both physically meaningful and exemplifies the
practical value of the eigenmaps. Fig. 32(c) is similar to Fig.
32(b) but is presented using the normalized spectral space co-
ordinates, qn = [u1 (n), u2 (n)]/||[u1 (n), u2 (n)]||2 . In Fig. 32(d)
the clusters are refined using the k-means algorithm, as per Re-
mark 30. The same representations are repeated and shown in
Fig. 33(a)-(d) for the representation based on the generalized
eigenvectors of the graph Laplacian, obtained as a solution to
Luk = λk Duk . Finally, in Fig. 34(a)-(d), the Laplacian eigen-
maps and clustering are produced based on the eigenvectors (c) (d)
of the normalized graph Laplacian, LN = D−1/2 LD−1/2 . As
expected, the eigenmaps obtained using the generalized Lapla- Figure 32: Principle of Laplacian eigenmaps and clustering based
cian eigenvectors, in Fig. 34(b), and the eigenvectors of the on the eigenvectors of the graph Laplacian, L. (a) The original
normalized Laplacian, in Fig. 33(b), are different; however, graph from Fig. 24, with the spectral vector qn = [u1 (n), u2 (n)],
they reduce to the same eigenmaps after spectral vector nor- defined by the graph Laplacian eigenvectors {u1 ,u2 }, which is used
to cluster (color) the vertices. (b) Two-dimensional vertex posi-
malization, as shown Fig. 34(c) and Fig. 33(c). After the tions obtained through Laplacian eigenmaps, with the spectral vec-
k-means based clustering refinement was applied, in all three tor qn = [u1 (n), u2 (n)] serving as the vertex coordinates (the 2D
cases two vertices switched their initial color (cluster), as shown Laplacian eigenmap). While both the original and this new vertex
in Fig. 32(d), Fig. 33(d), and Fig. 34(d). space are two-dimensional, the new eigenmaps-based space is advan-
Observe that the eigenmaps obtained with the normalized tageous in that it emphasizes vertex spectral similarity in a spatial
forms of the generalized eigenvectors of the Laplacian and the way (physical closeness of spectrally similar vertices). (c) The graph
from (b) but produced using normalized spectral space coordinates
eigenvectors of the normalized Laplacian are the same, and
qn = [u1 (n), u2 (n)]/||[u1 (n), u2 (n)]||2 , as in (54). (d) The graph
in this case their clustering performances are similar to those from (c) with clusters refined using the k-means algorithm, as per
based on the eigenmaps produced with the eigenvectors of the Remark 30. The centroids of clusters are designated by squares of the
original Laplacian. same color. The complexity of graph presentation is also significantly
reduced through eigenmaps, with most of the edges between strongly
Remark 34: In general, an independent quantization of connected vertices being very short and located along a circle.
two smoothest eigenvectors of the graph Laplacian, u1 and
u2 , will produce four clusters. However, that will not be
the case if we analyze the graph with an almost ideal eigen- form [sign(u1 ), sign(u2 )]. The elements of these indicator
value gap (unit value) between λ2 and λ3 . In other words, vectors, [sign(u1 (n)), sign(u2 (n))], have therefore a subset-
when the gap δr = 1−λ2 /λ3 tends to 1, that is, λ2 → 0 and wise constant vector form, assuming exactly three different
λ1 < λ2 → 0, then this case corresponds to a graph with vector values that correspond to individual disjoint sets E,
exactly three disjoint subgraph components, with vertices H, and K.
belonging to the disjoint sets E, H, and K. Without loss This procedure can be generalized up to every individ-
of generality, assume NE > NH > NK . The minimum ual vertex becoming a cluster (no clustering). To charac-
normalized cut, CutN (E, H ∪ K) is then obtained with the terize N independent disjoint sets we will need (N − 1)
first indicator vector x1 (n) = c11 for n ∈ E and x1 (n) = c12 spectral vectors, if the constant eigenvector, u0 , is omit-
for n ∈ H ∪ K. The second indicator vector will produce ted.
the next minimum normalized cut, CutN (E ∪ K, H) with Example 30: The two-dimensional Laplacian eigenmap for
x2 (n) = c21 for n ∈ E ∪ K and x2 (n) = c22 for n ∈ H. the benchmark Minnesota roadmap graph (with M = 2) is
Following the same analysis as in the case of one indi- given in Fig. 35. In this new space, the spectral vectors
cator vector and the cut of graph into two disjoint sub- qn = [u2 (n), u3 (n)], are used as the coordinates of the new
sets of vertices, we can immediately conclude that the two vertex positions. Here, two vertices with similar slow-varying
smoothest eigenvectors, u1 and u2 , which correspond to eigenvectors are located close to one another in the new coor-
λ2 → 0 and λ1 → 0, can be used to form an indicator ma- dinate system defined by u2 and u3 . This illustrates that the
eigenmaps can be considered as a basis for “scale-wise ”graph
trix Y = [x1 , x2 ], so that the corresponding matrix of the
representation.
solution (within the graph Laplacian eigenvector space) to
the minimization problem of two normalized cuts, has the Example 31: The Laplacian eigenmaps of the Brain Atlas
38
0.05
0.04
0.03
0.02
(a) (b)
0.01
-0.01
-0.02
39
0.08
the original graph Laplacian, u0 , u1 , . . . , uN −1 . The
eigenmaps for which the spectral coordinates are scaled
0.06 based on the eigenvalues of the pseudo-inverse of graph
Laplacian can be interpreted within the Principal Compo-
0.04 nent Analysis (PCA) framework in the following way.
Notice that the M -dimensional eigenmaps based on the
0.02 pseudo-inverse of the Laplacian are the same as those for
the original graph Laplacian, since they share the same
0 eigenvectors. If the spectral vectors qn = [u1 (n), u2 (n),
. . . , uM (n)] are scaled with the square roots of the eigen-
-0.02 values of the Laplacian pseudo-inverse, we obtain
0.6
4.5.1. Commute time mapping
0.4 Physical meaning of the new vector positions in the
spectral space, defined by (67), is related to the notion of
0.2
commute time, which is a property of a diffusion process
0 on a graph [74, 75]. The commute time, CT (m, n) be-
tween vertices m and n is defined as the expected time for
-0.2 the random walk to reach vertex n starting from vertex
m, and then to return. The commute time is therefore
-0.4
proportional to the Euclidean distance between these two
-0.6 vertices, with the vertex positions in the new spectral space
defined by qn in (67), that is
-0.8
N
X −1
-1 CT 2 (m, n) = VV ||qm − qn ||22 = VV (qi (m) − qi (n))2 ,
-1 -0.5 0 0.5 1
i=1
(b)
PN −1
where VV is the volume of the whole graph, VV = n=0 Dnn .
Figure 36: Brain atlas representation based on normalized spec-
tral vectors. (a) A two-dimensional Laplacian eigenmap based on To put this into perspective, in a graph representation
the generalized Laplacian eigenvectors. The original L = 3 dimen- of a resistive electric circuit/network, for which the edge
sional graph from Fig. 29 is reduced to a two-dimensional rep- weights are equal to the conductances (inverse resistances,
resentation based on the two smoothest eigenvectors, u1 (n) and
see Part 3), the commute time, CT (m, n), is defined as
u2 (n), which both serve as spectral coordinates and define color
templates in the colormap, as in Fig. 29. (b) Eigenmaps from (a) the equivalent resistance between the electric circuit nodes
but in the space of normalized spectral space coordinates, qn = (vertices) m and n [76].
[u2 (n), u3 (n)]/||[u2 (n), u3 (n)]||2 , with the complexity of graph rep- The covariance matrix of the scaled spectral vectors in
resentation now significantly reduced. Observe that most edges only
exists between strongly connected vertices located along the circle.
(67) is given by
N −1
1 X T 1
as a matrix that satisfies the property S= q qn = Λ̄−1 .
N n=0 n N
0 01×(N −1)
LL+ = . (66) In other words, the principal directions in the reduced di-
0(N −1)×1 I(N −1)×(N −1) mensionality space of M eigenvectors, u1 , u2 , . . ., uM , cor-
The eigenvalues of the graph Laplacian pseudo-inverse are respond to the maximum variance of the graph embedding,
therefore the inverses of the original eigenvalues, {0, 1/λ1 , since 1/λ1 > 1/λ2 >, · · · , > 1/λM . This, in turn, directly
. . . , 1/λN −1 }, while it shares the same eigenvectors with corresponds to principal component analysis (PCA).
40
Figure 37: Generalized eigenvectors, uk , k = 1, 2, 3, 4, 5, 6, of the graph Laplacian of the Brain Atlas graph, shown using vertex coloring in
the original three-dimensional vertex space. Each panel visualizes a different uk , k = 1, 2, 3, 4, 5, 6.
0 0 0
0 0 0
Figure 38: Laplacian eigenmaps of the Brain Atlas graph in the reduced two-dimensional space defined by the two smoothest generalized
eigenvectors of the graph Laplacian, u1 and u2 . The panels each visualize a different generalized eigenvector, uk , k = 1, 2, 3, 4, 5, 6.
41
1 1
0.5 0.5
0 0
-0.5 -0.5
-1 -1
1 1
0 1 (a) 0 1 (b)
-1 -1 -0.5 0 0.5 -1 -1 -0.5 0 0.5
0.15
1
0.1
0.5
0.05
0
0
-0.5 -0.05
-1 -0.1
1 -0.15
0 1 (e) -0.1 -0.05 0 0.05
-1 -1 -0.5 0 0.5
0.15
1
0.1
0.5
0.05
0
0
-0.5 -0.05
-1 -0.1
1 -0.15
0 1 (c) -0.1 -0.05 0 0.05
-1 -1 -0.5 0 0.5
Figure 39: Laplacian eigenmaps based dimensionality reduction for the Swiss roll graph. (a) Vertex locations for the Swiss roll graph in
the original L = 3 dimensional space with N = 500 points (vertices). (b) The the Swiss roll graph with edges whose weights are calculated
based on the Euclidean distances between vertices. (c) The Swiss roll graph with vertices colored using the normalized graph Laplacian
eigenvectors, u1 (n) and u2 (n), as a colormap. (d) The same vectors are used as the new coordinates (spectral vectors) in a reduced two-
dimensional Laplacian eigenmap vertex space (M = 2). The vertices with high similarity (similar values of the smoothest eigenvectors) are
located close to one another, thus visually indicating the expected similarity of data observed at these vertices. (e) Clustering of the Swiss
roll graph, in the original L = 3 dimensional space, using two the smoothest eigenvectors, u1 (n) and u2 (n). (f) Clustering of the Swiss roll
graph using the two smoothest eigenvectors, u1 (n) and u2 (n), presented in the M = 2 Eigenmap space, where for every vertex its spatial
position (quadrant of the coordinate system) indicates the cluster where it belongs.
42
Remark 35: Two-dimensional case comparison. The When considering all vertices together, such probabilities
two-dimensional spectral space of the standard graph Lapla- can be written in a matrix form, within the weight of a
cian eigenvectors is defined by u1 and u2 , while the spec- random walk matrix, defined as in (10), by
tral vector in this space is given by
P = D−1 W. (71)
qn = [u1 (n), u2 (n)]. (68)
Diffusion distance. The Diffusion distance between the
In the case of commute time mapping, the two-dimensional
vertices m and n, denoted by Df (m, l), is equal to the
spectral domain of the vertices becomes
distance between the vector (N -dimensional ordered set)
u1 (n) u2 (n) of probabilities for a random walk to move from a vertex
qn = [ √ , √ ], (69)
λ1 λ2 m to all other vertices (as in (70)), given by
that is, the commute time mapping is related √ to the graph pm = [Pm0 , Pm1 , . . . , Pm(N −1) ]
Laplacian mapping through axis scaling by 1/ λk .
We can conclude that when λ1 ≈ λ2 , the two mappings and the corresponding vector of probabilities for a random
in (68) and (69) are almost the same, when normalized. walk to move from a vertex n to all other vertices, given
However, when λ1 λ2 , the relative eigenvalue gap by
between the one dimensional and two-dimensional spec- pn = [Pn0 , Pn1 , . . . , Pn(N −1) ],
tral space is large, since δr = 1 − λ1 /λ2 is close to 1. This
means that the segmentation into two disjoint subgraphs that is
will be “close” to the original graph, while at the same
Df2 (m, n) = ||(pm − pn )D−1/2 ||22 VV
time this also indicates that the eigenvector u2 does not
N −1
contribute to a new “closer” segmentation (in the sense X 2 1
of Section 4.3.2), since its gap δr = 1 − λ2 /λ3 is not = Pmi − Pni VV
i=0
Dii
small. Therefore, the influence of u2 should be reduced, as
compared to the standard spectral vector of graph Lapla- PN −1
where VV = n=0 Dnn is constant for a given graph,
cian where both u1 and u2 employ unit weights to give
which is equal to the sum of degrees (volume) of all graph
qn = [u1 (n), u2 (n)]. Such downscaling of the influence
vertices in V.
of the almost irrelevant eigenvector, u2 , when λ1 λ2 ,
is equivalent to the commute time Example 33: For the graph from Fig. 2, with its weight ma-
q mapping, since qn = trix, W, and the degree matrix, D, given respectively in (4)
[ u√1 (n)
λ1
, u√2 (n)
λ2
] = √1λ [u1 (n), u2 (n) λλ12 ] ∼ [u1 (n), 0]. and (6), the random walk weight matrix in (71) is of the form
1
For example, for the graph from Example 29, shown in
Fig. 32(a), the commute time mapping will produce the
p0 0 0.19 0.61 0.20 0 0 0 0
same vertex presentation as in Fig. 32(b), which is ob-
p1 0.28 0 0.43 0 0.28 0 0 0
tained with the eigenvectors of the graph Laplacian, when
0.47
p2 0.22 0 0.16 0.15 0 0 0
the vertical axis, u2 , is scaled by
0.29
p3 0 0.32 0 0 0 0.39 0
P=
p4 0 0.21 0.21 0 0 0.46 0 0.12
r r
λ1 0.0286
= = 0.8932. p5 0 0 0 0 0.77 0 0 0.23
λ2 0.0358
p6 0 0 0 0.50 0 0 0 0.50
This eigenmap will also be very close to the eigenmap in p7 0 0 0 0 0.23 0.25 0.52 0
Fig. 32(b), produced based on the graph Laplacian eigen- 0 1 2 3 4 5 6 7
vectors and the spectral vector qn = [u1 (n), u2 (n)]. (72)
with VV = 7.46.
Therefore, the diffusion distance between, for example, the
4.5.2. Diffusion (Random Walk) Mapping
vertices m = 1 and n = 3, for the t = 1 step, is
Finally, we shall now relate the commute time mapping
√
to the diffusion mapping. Df (1, 3) = ||(p1 − p3 )D−1/2 ||2 VV = 1.54,
Definition: Diffusion on a graph deals with the problem
of propagation along the edges of a graph, whereby at the while the diffusion distance between the vertices m = 6 and
n = 3 is Df (6, 3) = 2.85. From this simple example, we can
initial step, t = 0, the random walk starts at a vertex n.
see that the diffusion distance is larger for vertices m = 6 and
At the next step t = 1, the walker moves from its current n = 3 than for the neighboring vertices m = 1 and n = 3. This
vertex n to one of its neighbors l, chosen uniformly at result is in a perfect accordance with the clustering scheme
random from the neighbors of n. The probability of going (expected similarity) in Fig. 23(b), where the vertices m = 1
from vertex n to vertex l is equal to the ratio of the weight and n = 3 are grouped into the same cluster, while the vertices
Wnl and the sum of all possible edge weights from the m = 6 and n = 3 belong to different clusters.
vertex n, that is The probability vectors, pn , are called the diffusion
Wnl 1 clouds (in this case for step t = 1), since they resemble a
Pnl =P = Wnl . (70) cloud around a vertex n. The diffusion distance can then
l W nl Dnn
43
be considered as a distance between the diffusion clouds and is equal to the generalized Laplacian spectral space
(sets of data) around a vertex m and a vertex n. If the mapping, whereby the axis vectors qn = [u1 (n), u2 (n), . . . ,
vertices are well connected (approaching a complete graph uN −1 (n)] are multiplied by the corresponding eigenvalues,
structure) then this distance is small, while for vertices (1 − λk )t .
with long paths between them, this distance is large. It can be shown that the diffusion distance between
The diffusion analysis can be easily generalized to any vertices in the new diffusion map space is equal to their
value of the diffusion step, t, whereby after t steps, the Euclidean distance [77], that is
matrix of probabilities in (71) becomes
(t)
p
Df (m, n) = VV ||qm − qn ||2 . (73)
Pt = (D−1 W)t .
(t)
The elements of this matrix, denoted by Pmn , are equal to Example 34: For the graph from Fig. 2, whose weight matrix,
the probabilities that a random walker moves from a vertex W, and the degree matrix, D, are defined in (4) and (6), the
m to a vertex n, in t steps. The t-step diffusion distance diffusion distance between the vertices m = 1 and n = 3 can
between the vertices m and n, is accordingly defined as be calculated using (73) as
(1) √
(t) −1/2 Df (1, 3) = VV ||(q1 − q3 )||2 = 1.54,
p
Df (m, n) = ||(p(t) (t)
m − pn )D ||2 VV ,
where the spectral vectors, q1 = [u1 (1)(1 − λ1 )1 , . . . , uN (1)(1 −
where
(t) (t) (t) λN )1 ] and q3 = [u1 (3)(1 − λ1 )1 , . . . , uN (3)(1 − λN )1 ] are ob-
p(t)
m = [Pm0 , Pm1 , . . . , Pm(N −1) ] tained using the generalized graph Laplacian eigenvectors, uk ,
and and the corresponding eigenvalues, λk , from Luk = λk Duk .
(t) (t) (t) This is the same diffusion distance value, Df (1, 3), as in Ex-
p(t)
n = [Pn0 , Pn1 , . . . , Pn(N −1) ].
ample 33.
It can be shown that the diffusion distance is equal Dimensionality reduced diffusion maps. Dimension-
to the Euclidean distance between the considered vertices ality of the vertex representation space can be reduced in
when they are presented in a new space of their general- diffusion maps by keeping only the eigenvectors that cor-
ized Laplacian eigenvectors, which are then scaled by their respond to the M most significant eigenvalues, (1 − λk )t ,
corresponding eigenvalues; this new space is referred k = 1, 2, . . . , M , in the same way as for the Laplacian
to as the diffusion maps (cf. eigenmaps). eigenmaps, For example, the two-dimensional spectral do-
The eigenanalysis relation for the random walk weight main of the vertices in the diffusion mapping is defined
matrix for the state t = 1 now becomes as
(P ) qn = [u1 (n)(1 − λ1 )t , u2 (n)(1 − λ2 )t ].
(D−1 W) uk = λk uk .
While the analysis and intuition for the diffusion mapping
Since the weight matrix can be written as W = D − L, is similar to that for the commute time mapping, presented
(P )
this yields D−1 (D − L)uk = λk uk , or in Remark 35, diffusion maps have an additional degree of
freedom, the step t.
(P )
(I − D−1 L)uk = λk uk , Example 35: For the graph in Fig 25, which corresponds to a
set of real-world images, the commute time two-dimensional
to finally produce the generalized graph Laplacian equa- spectral vectors in (69), normalized by the first eigenvector
√
tion, value through a multiplication of its coordinates by λ1 , as-
Luk = λk Duk , sume the form
√
(P ) λ1
with λk = (1 − λk ). This relation indicates that a one-
h i
qn = u1 (n), √ u2 (n) = [u1 (n), 0.62u2 (n)].
step diffusion mapping is directly obtained from the corre- λ2
sponding generalized graph Laplacian mapping. The corresponding vertex colors designate diffusion-based clus-
After t steps, the random walk matrix (of probabilities) tering, as shown in Fig. 40(a). Fig. 40(b) shows the vertices
becomes of this graph, colored with the two-dimensional diffusion map
Pt = (D−1 W)t , spectral vectors, which are normalized by (1 − λ1 ), to yield
(P )t
for which the eigenvalues are λk = (1 − λk )t , while 1 − λ2
h i
qn = u1 (n), u2 (n) = [u1 (n), 0.09u2 (n)].
the (right) eigenvectors remain the same as for the graph 1 − λ1
Laplacian, see (26).
Finally, the sum over all steps, t = 0, 1, 2, . . . , of the
The spectral space for vertices, for a t-step diffusion
diffusion space yields
process (diffusion mapping), is then defined based on the
spectral vector qn = [u1 (n), u2 (n), . . . , uN −1 (n)]Λ̄−1 ,
qn = [u1 (n), u2 (n), . . . , uN −1 (n)](I − Λ̄)t ,
44
[78]. For a given large (in general directed) graph, G, with
N vertices, its resampling aims to produce a much simpler
graph which retains most of the properties of the original
graph, but is both less complex and more physically and
computationally meaningful. The similarity between the
original large graph G, and the down-scaled graph, S, with
M vertices, where M N , is defined with respect to the
(a) (b) set of parameters of interest, like for example, the connec-
Figure 40: Graph structure for the images from Fig. 25, with ver- tivity or distribution on a graph. Such criteria may also
tex color embedding which corresponds to the two-dimensional nor- be related to the spectral behavior of graphs.
malized spectral vectors in (a) the commute time representation, Several methods exist for graph down-scaling, of which
qn = [u1 (n), 0.62u2 (n)], and (b) the spectral eigenvectors of the dif-
fusion process, qn = [u1 (n), 0.09u2 (n)], with t = 1. For the commute
some are listed below.
time presentation in (a), the graph Laplacian eigenvectors, u1 and
u2 , are used, while for the diffusion process presentation in (b) the • The simplest method for graph down-sampling is
generalized Laplacian eigenvectors, u1 and u2 , are used. the random vertex or random node (RN) selection
method, whereby a random subset of vertices is used
for the analysis and representation of large graphs
since the sum of a geometric progression is equal to
and data observed on such large graphs. Even though
∞
X the vertices are here selected with equal probabili-
(I − Λ̄)t = Λ̄−1 . ties, this method produces good results in practical
t=0 applications.
This mapping also corresponds to the cumulative diffusion
• Different from the RN method, where the vertices
distance, given by
are selected with a uniform probability, the random
∞
X (t)
degree vertex/node (RDN) selection method is based
Dc (n, l) = Df (n, l). on the probability of vertex selection that is pro-
t=0 portional to the vertex degree. In other words, ver-
The diffusion eigenmaps can be therefore obtained by tices
P with more connections, thus having larger Dn =
appropriate axis scaling of the standard eigenmaps, pro- m Wnm , are selected with higher probability. This
duced by the generalized eigenvectors of the graph Lapla- makes the RDN approach biased with respect to
cian. highly connected vertices.
Remark 36: The commute time and the diffusion process • The PageRank method is similar to the RDN, and
mappings are related in the same way as the mappings is based on the vertex rank. The PageRank is de-
based on the graph Laplacian eigenvectors and the gener- fined by the importance of the vertices connected to
alized eigenvectors of the graph Laplacian. the considered vertex n. Then, the probability that
a vertex n will be used in a down-scaled graph is
4.6. Summary of Embedding Mappings proportional to the PageRank of this vertex. This
A summary of the considered embedding mappings method is also known as the random PageRank ver-
is given in Table 1. Notice that various normalization tex (RPN) selection, and is biased with respect to the
schemes may be used to obtain the axis vectors, yn , from highly connected vertices (with a high PageRank).
the spectral vectors, qn (see Algorithm 3).
These examples of dimensionality reduction reveal close • A method based on a random selection of edges that
connections with spectral clustering algorithms developed will remain in the simplified graph is called the ran-
in standard machine learning and computer vision; in this dom edge (RE) method. This method may lead to
sense, the notions of dimensionality reduction and graphs that are not well connected, and which ex-
clustering can be considered as two sides of the hibit large diameters.
same coin [65]. In addition to the reduction of dimen- • The RE method may be combined with random ver-
sionality for visualization purposes, the resulting spectral tex selection to yield a combined RNE method, whereby
vertex space of lower dimensionality may be used to miti- the initial random vertex selection is followed by a
gate the complexity and accuracy issues experienced with random selection of one of the edges that is con-
classification algorithms, or in other words to bypass the nected to the selected vertex.
course of dimensionality.
• In addition to these methods, more sophisticated
5. Graph Sampling Strategies methods based on random vertex selection and ran-
dom walk (RW) analysis may be defined. For exam-
In the case of extremely large graphs, subsampling and ple, we can randomly select a small subset of ver-
down-scaling of graphs is a prerequisite for their analysis tices and form several random walks starting from
45
Mapping Eigen-analysis relation Reduced dimensionality spectral vector
Generalized eigenvectors
of Laplacian mapping Luk = λk Duk qn = [u1 (n), u(2), . . . , uM (n)]
Normalized Laplacian mapping (D−1/2 LD−1/2 )uk = λk uk qn = [u1 (n), u(2), . . . , uM (n)]
Table 1: Summary of graph embedding mappings. The Graph Laplacian mapping, the Generalized eigenvectors of the Laplacian mapping,
the Normalized Laplacian mapping, the Commute time mapping, the Diffusion mapping, and the Cummulative diffusion mapping.
each selected vertex. The Random Walk (RW), Ran- 7. Appendix: Power Method for Eigenanalysis
dom Jump (RJ) and Forest Fire graph down-scaling
strategies are all defined in this way. Computational complexity of the eigenvalue and eigen-
vector calculation for a symmetric matrix is of the order of
O(N 3 ), which is computationally prohibitive for very large
6. Conclusion graphs, especially when only a few the smoothest eigenvec-
tors are needed, like in spectral graph clustering. To mit-
Although within the graph data analytics paradigm,
igate this computational bottleneck, an efficient iterative
graphs have been present in various forms for centuries,
approach, called the Power Method, may be employed.
the advantages of the graph framework for data analytics
Consider the normalized weight matrix,
on graphs, as opposed to the optimization of the graphs
themselves, have been recognized only recently. In order to WN = D−1/2 WD−1/2 ,
provide a comprehensive and Data Analytics friendly in-
troduction to graph data analytics, an overview of graphs and assume that the eigenvalues of WN are |λ0 |> |λ1 |>
from this specific practitioner-friendly signal processing · · · > |λM −1 |, with the corresponding eigenvectors, u1 , u2 ,
point of view is a prerequisite. . . . , uM −1 . Consider also an arbitrary linear combination
In this part of our article, we have introduced graphs of the eigenvectors, un , through the coefficients αn ,
as irregular signal domains, together with their properties
that are relevant for data analytics applications which rest x = α1 u1 + α2 u2 + · · · + αM −1 uM −1 .
upon the estimation of signals on graphs. This has been
A further multiplication of the vector x by the normalized
achieved in a systematic and example rich way and by
weight matrix, WN , results in
highlighting links with classic matrix analysis and linear
algebra. Spectral analysis of graphs has been elaborated WN x = α1 WN u1 + α2 WN u2 + · · · + αM −1 WN uM −1
upon in detail, as this is the main underpinning method-
= α1 λ1 u1 + α2 λ2 u2 + · · · + αM −1 λM −1 uM −1 .
ology for efficient data analysis, the ultimate goal in Data
Science. Both the adjacency matrix and the Laplacian ma- A repetition of this multiplication k times yields
trix have been used in this context, along with their spec-
k
tral decompositions. Finally, we have highlighted impor- WN x = α1 λk1 u1 + α2 λk2 u2 + · · · + αM −1 λkM −1 uM −1
tant aspects of graph segmentation and Laplacian eigen- λk λk
maps, and have emphasized their role as the foundation = α1 λk1 u1 + α2 k2 u2 + · · · + αM −1 Mk−1 uM −1
for advances in Data Analytics and unsupervised learning λ1 λ1
on graphs. u α1 λk1 u1 .
Part 2 of this monograph will address theory and meth-
ods of processing data on graphs, while Part 3 is devoted to In other words, we have just calculated the first eigenvector
unsupervised graph topology learning, from the observed of WN , given by
data. k k
u1 = WN x/||WN x||2
47
Algorithm 3. Graph Laplacian Based Eigenmaps. [10] D. S. Grebenkov, B.-T. Nguyen, Geometrical structure of Lapla-
cian eigenfunctions, SIAM Review 55 (4) (2013) 601–667.
Input: [11] R. Bapat, The Laplacian matrix of a graph, Mathematics
• Vertex V = {0, 1, . . . , N − 1} positions, rows of X Student-India 65 (1) (1996) 214–223.
[12] S. O’Rourke, V. Vu, K. Wang, Eigenvectors of random matrices:
• Weight matrix W, with elements Wmn A survey, Journal of Combinatorial Theory, Series A 144 (2016)
• Laplacian eigenmap dimensionality, M 361–442.
[13] K. Fujiwara, Eigenvalues of Laplacians on a closed Riemannian
• Position, mapping, normalization, and coloring manifold and its nets, Proceedings of the American Mathemat-
indicators P, M ap, S, C ical Society 123 (8) (1995) 2585–2594.
[14] S. U. Maheswari, B. Maheswari, Some properties of Cartesian
PN −1 product graphs of Cayley graphs with arithmetic graphs, In-
1: D ← diag(Dnn = m=0 Wmn , n = 0, 1, . . . , N − 1) ternational Journal of Computer Applications 138 (3) (2016)
2: L←D−W 26–29.
3: [U, Λ] ← eig(L) [15] M. I. Jordan, et al., Graphical models, Statistical Science 19 (1)
(2004) 140–155.
4: uk (n) ← U (n, k), for k = 1, . . . , M , n = 0, 1, . . . , N−1. [16] J. M. Moura, Graph signal processing, in: Cooperative and
5: M ← maxn (U (n, 1 : L)), m ← minn (U (n, 1 : L)) Graph Signal Processing, P. Djuric and C. Richard, Editors,
6: qn ← [u1 (n), u2 (n), . . . , uL (n)], for all n Elsevier, 2018, pp. 239–259.
7: If Map=1, qn ← qn Λ̄−1/2 , end [17] M. Vetterli, J. Kovačević, V. Goyal, Foundations of Signal Pro-
t cessing, Cambridge University Press., 2014.
8: qn ← qn (I − Λ̄) , end
If Map=2, [18] A. Sandryhaila, J. M. Moura, Discrete signal processing on
qn , for S = 0, graphs, IEEE Transactions on Signal Processing 61 (7) (2013)
1644–1656.
qn /||qn ||2 , for S = 1,
[19] V. N. Ekambaram, Graph-structured data viewed through a
9: yn ← sign(qn ), for S = 2, Fourier lens, University of California, Berkeley, 2014.
sign(qn − (M + m)/2), for S = 3,
[20] A. Sandryhaila, J. M. Moura, Discrete signal processing on
graphs: Frequency analysis, IEEE Transactions on Signal Pro-
(q − m)./(M − m), for S = 4 cessing 62 (12) (2014) 3042–3054.
n
10: Y ←( yn , as the rows of Y [21] A. Sandryhaila, J. M. Moura, Big data analysis with signal
processing on graphs: Representation and processing of mas-
X, for P = 0, sive data sets with irregular structure, IEEE Signal Processing
11: Z← Magazine 31 (5) (2014) 80–90.
Y, for P = 1
( [22] D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, P. Van-
Constant, for C = 0, dergheynst, The emerging field of signal processing on graphs:
12: ColorMap ← , Extending high-dimensional data analysis to networks and other
(Y + 1)/2, for C = 1 irregular domains, IEEE Signal Processing Magazine 30 (3)
13: GraphPlot(W, Z, ColorMap) (2013) 83–98.
14: Cluster the vertices according to Y and refine using [23] R. Hamon, P. Borgnat, P. Flandrin, C. Robardet, Extraction
of temporal network structures from graph-based signals, IEEE
the k-means algorithm (Remark 30) or the normalized Transactions on Signal and Information Processing over Net-
cut recalculation algorithm (Remark 33). works 2 (2) (2016) 215–226.
[24] S. Chen, A. Sandryhaila, J. M. Moura, J. Kovačević, Signal
Output: denoising on graphs via graph filtering, in: Proc. 2014 IEEE
Global Conference on Signal and Information Processing (Glob-
• New graph alSIP), 2014, pp. 872–876.
• Subsets of vertex clusters [25] A. Gavili, X.-P. Zhang, On the shift operator, graph frequency,
and optimal filtering in graph signal processing, IEEE Transac-
˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ tions on Signal Processing 65 (23) (2017) 6303–6318.
Comments on the Algorithm: For the normalized Lapla-
[26] M. J. Wainwright, M. I. Jordan, et al., Graphical models, ex-
cian, Line 2 should be replaced by L ← I − D−1/2 WD−1/2 ,
ponential families, and variational inference, Foundations and
while for the generalized eigenvectors Line 3 should be replaced
Trends
R in Machine Learning 1 (1–2) (2008) 1–305.
by [U, Λ] ← eig(L, D), see also Table 1. The indicator values
[27] D. M. Cvetković, M. Doob, H. Sachs, Spectra of graphs: Theory
of vertex positions in the output graph are: P = 0, for the
and application, Vol. 87, Academic Press, 1980.
original vertex space, and P = 1, for the spectral vertex space.
[28] D. M. Cvetković, M. Doob, Developments in the theory of graph
The indicator of mapping is: M ap = 1, for the commute time
spectra, Linear and Multilinear Algebra 18 (2) (1985) 153–181.
mapping (matrix Λ̄ is obtained from Λ, by omitting the trivial
[29] D. M. Cvetković, I. Gutman, Selected topics on applications of
element λ0 = 0), and M ap = 2, for the diffusion mapping (in
graph spectra, Matematički Institut SANU (Serbian Academy
this case the generalized eigenvectors must be used in Line 3,
of Scences and Arts), 2011.
[U, Λ] ← eig(L, D) and the diffusion step t should be given as an
[30] A. E. Brouwer, W. H. Haemers, Spectra of graphs, Springer-
additional input parameter), otherwise M ap = 0. The indicator
Verlag New York, 2012.
of the eigenvectors normalization is: S = 0, for the case with-
[31] F. Chung, Spectral graph theory, AMS, Providence, RI, 1997.
out normalization, S = 1, for two-norm normalization, S = 2,
[32] O. Jones, Spectra of simple graphs[Online]. Available:
for the case of binary normalization, S = 3, for binary normal-
https://www.whitman.edu/Documents/Academics /Mathe-
ization with the mean as a reference, and S = 4, for marginal
matics/Jones.pdf, Whitman College, 2013.
normalization. The indicator of vertex coloring is: C = 0, for
[33] D. Mejia, O. Ruiz-Salguero, C. A. Cadavid, Spectral-based
the same color for all vertices is used, and C = 1, when the
mesh segmentation, International Journal on Interactive Design
spectral vector defines the vertex colors.
and Manufacturing (IJIDeM) 11 (3) (2017) 503–514.
[34] L. Stanković, E. Sejdić, M. Daković, Vertex-frequency energy
distributions, IEEE Signal Processing Letters 25 (3) (2017) 358–
362.
48
[35] L. Stanković, M. Daković, E. Sejdić, Vertex-frequency en- partmental Papers (CIS) (2000) 107.
ergy distributions, in: L. Stanković, E. Sejdić (Eds.), Vertex- [60] B. Mohar, Isoperimetric numbers of graphs, Journal of combi-
Frequency Analysis of Graph Signals, Springer, 2019, pp. 377– natorial theory, Series B 47 (3) (1989) 274–291.
415. [61] M. Fiedler, Algebraic connectivity of graphs, Czechoslovak
[36] H. Lu, Z. Fu, X. Shu, Non-negative and sparse spectral cluster- mathematical journal 23 (2) (1973) 298–305.
ing, Pattern Recognition 47 (1) (2014) 418–426. [62] J. Malik, S. Belongie, T. Leung, J. Shi, Contour and texture
[37] X. Dong, P. Frossard, P. Vandergheynst, N. Nefedov, Clustering analysis for image segmentation, International Journal of Com-
with multi-layer graphs: A spectral perspective, IEEE Transac- puter Vision 43 (1) (2001) 7–27.
tions on Signal Processing 60 (11) (2012) 5820–5831. [63] A. Y. Ng, M. I. Jordan, Y. Weiss, On spectral clustering: Analy-
[38] R. Horaud, A short tutorial on graph Laplacians, sis and an algorithm, in: Proc. Advances in Neural Information
Laplacian embedding, and spectral clustering, [Online], Processing Systems, 2002, pp. 849–856.
Available: http://csustan.csustan.edu/ tom/Lecture- [64] D. A. Spielman, S.-H. Teng, Spectral partitioning works: Pla-
Notes/Clustering/GraphLaplacian-tutorial.pdf (2009). nar graphs and finite element meshes, Linear Algebra and its
[39] R. Hamon, P. Borgnat, P. Flandrin, C. Robardet, Relabelling Applications 421 (2-3) (2007) 284–305.
vertices according to the network structure by minimizing the [65] M. Belkin, P. Niyogi, Laplacian eigenmaps for dimensionality
cyclic bandwidth sum, Journal of Complex Networks 4 (4) reduction and data representation, Neural Computation 15 (6)
(2016) 534–560. (2003) 1373–1396.
[40] M. Masoumi, A. B. Hamza, Spectral shape classification: A [66] F. Chung, Laplacians and the cheeger inequality for directed
deep learning approach, Journal of Visual Communication and graphs, Annals of Combinatorics 9 (1) (2005) 1–19.
Image Representation 43 (2017) 198–211. [67] L. Trevisan, Lecture notes on expansion, sparsest cut, and spec-
[41] M. Masoumi, C. Li, A. B. Hamza, A spectral graph wavelet tral graph theory (2013).
approach for nonrigid 3D shape retrieval, Pattern Recognition [68] Y. Weiss, Segmentation using eigenvectors: A unifying view, in:
Letters 83 (2016) 339–348. Proceedings of the Seventh IEEE International Conference on
[42] L. Stanković, M. Daković, E. Sejdić, Vertex-frequency analysis: Computer Vision, Vol. 2, IEEE, 1999, pp. 975–982.
A way to localize graph spectral components [lecture notes], [69] P. Perona, W. Freeman, A factorization approach to grouping,
IEEE Signal Processing Magazine 34 (4) (2017) 176–182. in: Prof. of European Conference on Computer Vision, Springer,
[43] L. Stanković, E. Sejdić, M. Daković, Reduced interference 1998, pp. 655–670.
vertex-frequency distributions, IEEE Signal Processing Letters [70] G. L. Scott, H. C. Longuet-Higgins, Feature grouping by relo-
25 (9) (2018) 1393–1397. calisation of eigenvectors of the proximity matrix., in: Proc. of
[44] L. Stankovic, D. Mandic, M. Dakovic, I. Kisil, E. Sejdic, A. G. the British Machine Vision Conference (BMVC), 1990, pp. 1–6.
Constantinides, Understanding the basis of graph signal pro- [71] Z. Wang, E. P. Simoncelli, A. C. Bovik, Multiscale structural
cessing via an intuitive example-driven approach, IEEE Signal similarity for image quality assessment, in: Proc. of The Thrity-
Processing Magazine, arXiv preprint arXiv:1903.11179, Novem- Seventh Asilomar Conference on Signals, Systems & Computers,
ber. 2003, Vol. 2, 2003, pp. 1398–1402.
[45] F. R. Chung, R. P. Langlands, A combinatorial laplacian with [72] M. Mijalkov, E. Kakaei, J. B. Pereira, E. Westman,
vertex weights, journal of combinatorial theory, Series A 75 (2) G. Volpe, BRAPH: A graph theory software for the
(1996) 316–327. analysis of brain connectivity, PLOS ONE 12.8 (2017):
[46] A. Duncan, Powers of the adjacency matrix and the walk ma- e0178798doi:https://doi.org/10.1371/journal.pone.0178798.
trix, The Collection (2004) 1–11. [73] M. Rubinov, O. Sporns, Complex network measures of brain
[47] S. Saito, D. P. Mandic, H. Suzuki, Hypergraph p-Laplacian: A connectivity: Uses and interpretations, NeuroImage 52 (3)
differential geometry view, in: Thirty-Second AAAI Conference (2010) 1059 – 1069, computational Models of the Brain.
on Artificial Intelligence, 2018. doi:https://doi.org/10.1016/j.neuroimage.2009.10.003.
[48] E. R. Van Dam, W. H. Haemers, Which graphs are determined [74] H. Qiu, E. R. Hancock, Clustering and embedding using com-
by their spectrum?, Linear Algebra and Its Applications 373 mute times, IEEE Transactions on Pattern Analysis and Ma-
(2003) 241–272. chine Intelligence 29 (11) (2007) 1873–1890.
[49] S. E. Schaeffer, Graph clustering, Computer Science Review [75] R. Horaud, A short tutorial on graph Laplacians, Laplacian
1 (1) (2007) 27–64. embedding, and spectral clustering (2012).
[50] J. N. Mordeson, P. S. Nair, Fuzzy graphs and fuzzy hypergraphs, [76] A. K. Chandra, P. Raghavan, W. L. Ruzzo, R. Smolensky, P. Ti-
Vol. 46, Physica, 2012. wari, The electrical resistance of a graph captures its commute
[51] J. Kleinberg, E. Tardos, Algorithm design, Pearson Education and cover times, Computational Complexity 6 (4) (1996) 312–
India, 2006. 340.
[52] O. Morris, M. d. J. Lee, A. Constantinides, Graph theory for [77] R. R. Coifman, S. Lafon, Diffusion maps, Applied and compu-
image analysis: An approach based on the shortest spanning tational harmonic analysis 21 (1) (2006) 5–30.
tree, IEE Proceedings F (Communications, Radar and Signal [78] J. Leskovec, C. Faloutsos, Sampling from large graphs, in: Pro-
Processing) 133 (2) (1986) 146–152. ceedings of the 12th ACM SIGKDD International Conference
[53] S. Khuller, Approximation algorithms for finding highly con- on Knowledge Discovery and Data Mining, ACM, 2006, pp.
nected subgraphs, Tech. rep. (1998). 631–636.
[54] A. K. Jain, Data clustering: 50 years beyond k-means, Pattern [79] M. Tammen, I. Kodrasi, S. Doclo, Complexity reduction of
Recognition Letters 31 (8) (2010) 651–666. eigenvalue decomposition-based diffuse power spectral density
[55] I. S. Dhillon, Y. Guan, B. Kulis, Kernel k-means: Spectral clus- estimators using the power method, in: Proc. of the IEEE Inter-
tering and normalized cuts, in: Proceedings of the Tenth ACM national Conference on Acoustics, Speech and Signal Processing
SIGKDD International Conference on Knowledge Discovery and (ICASSP), IEEE, 2018, pp. 451–455.
Data Mining, ACM, 2004, pp. 551–556.
[56] M. Stoer, F. Wagner, A simple min-cut algorithm, Journal of
the ACM (JACM) 44 (4) (1997) 585–591.
[57] G. Kron, Diakoptics: the piecewise solution of large-scale sys-
tems, Vol. 2, MacDonald, 1963.
[58] L. Hagen, A. B. Kahng, New spectral methods for ratio cut par-
titioning and clustering, IEEE Trans. Computer-Aided Design
of Int. Circuits and Systems 11 (9) (1992) 1074–1085.
[59] J. Shi, J. Malik, Normalized cuts and image segmentation, De-
49