0% found this document useful (0 votes)
21 views

Spectral Clustering Survey

Uploaded by

Reuben Chin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Spectral Clustering Survey

Uploaded by

Reuben Chin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

The latest research progress on spectral

clustering

Hongjie Jia, Shifei Ding, Xinzheng Xu &


Ru Nie

Neural Computing and Applications

ISSN 0941-0643

Neural Comput & Applic


DOI 10.1007/s00521-013-1439-2

1 23
Your article is protected by copyright and
all rights are held exclusively by Springer-
Verlag London. This e-offprint is for personal
use only and shall not be self-archived
in electronic repositories. If you wish to
self-archive your article, please use the
accepted manuscript version for posting on
your own website. You may further deposit
the accepted manuscript version in any
repository, provided it is only made publicly
available 12 months after official publication
or later and provided acknowledgement is
given to the original source of publication
and a link is inserted to the published article
on Springer's website. The link must be
accompanied by the following text: "The final
publication is available at link.springer.com”.

1 23
Author's personal copy
Neural Comput & Applic
DOI 10.1007/s00521-013-1439-2

REVIEW

The latest research progress on spectral clustering


Hongjie Jia • Shifei Ding • Xinzheng Xu •

Ru Nie

Received: 3 January 2013 / Accepted: 23 May 2013


Ó Springer-Verlag London 2013

Abstract Spectral clustering is a clustering method based while data points in different groups are dissimilar to each
on algebraic graph theory. It has aroused extensive atten- other [56]. Traditional clustering methods, such as k-means
tion of academia in recent years, due to its solid theoretical algorithm [30, 41], FCM algorithm [21], and EM algorithm
foundation, as well as the good performance of clustering. [13], are simple, but lack the ability to handle complex data
This paper introduces the basic concepts of graph theory structures. When the sample space is non-convex, these
and reviews main matrix representations of the graph, then algorithms are easy to fall into local optimum [29].
compares the objective functions of typical graph cut In recent years, spectral clustering has aroused more and
methods and explores the nature of spectral clustering more attention of academia, due to its good clustering
algorithm. We also summarize the latest research performance and solid theoretical foundation [47]. Spectral
achievements of spectral clustering and discuss several key clustering does not make any assumptions on the global
issues in spectral clustering, such as how to construct structure of the data. It can converge to global optimum
similarity matrix and Laplacian matrix, how to select and performs well for the sample space of arbitrary shape,
eigenvectors, how to determine cluster number, and the especially suitable for non-convex dataset [16]. The idea of
applications of spectral clustering. At last, we propose spectral clustering is based on spectral graph theory. It
several valuable research directions in light of the defi- treats data clustering problem as a graph partitioning
ciencies of spectral clustering algorithms. problem and constructs an undirected weighted graph with
each point in the dataset being a vertex and the similarity
Keywords Spectral clustering  Graph theory  value between any two points being the weight of the edge
Graph cut  Laplacian matrix  Eigen-decomposition connecting the two vertices [8]. Then, we can decompose
the graph into connected components by certain graph cut
method and call those components as clusters.
1 Introduction There are a variety of traditional graph cut methods,
such as minimum cut, ratio cut, normalized cut and min/
Clustering is an important research field in data mining. max cut. The optimal clustering results can be obtained by
The purpose of clustering is to divide a dataset into natural minimizing or maximizing the objective function of the
groups so that data points in the same group are similar graph cut methods. However, for various graph cut meth-
ods, seeking the optimal solution of the objective function
is often NP-hard. With the help of spectral method, the
H. Jia  S. Ding (&)  X. Xu  R. Nie
original problem can be solved in polynomial time by
School of Computer Science and Technology, China University
of Mining and Technology, Xuzhou 221116, China relaxing the original discrete optimization problem to the
e-mail: [email protected] real domain [17]. In graph partitioning, a point can be
considered part belonging to subset A and part belonging to
S. Ding
subset B, rather than strictly belonging to one cluster. It can
Key Laboratory of Intelligent Information Processing, Institute
of Computing Technology, Chinese Academy of Science, be proved that the classification information of vertices is
Beijing 100190, China contained in the eigenvalues and eigenvectors of graph

123
Author's personal copy
Neural Comput & Applic

Laplacian matrix. And we can get good clustering results, graphs. Algebraic graph theory is a cross-field combining
if we make full use of the classification information during graph theory, linear algebra, and matrix computation the-
the clustering process [35]. Spectral clustering can get the ory. As one of the branches of the graph theory, the
relaxation solution of graph cut objective function, which is research of algebraic graph theory began in the 1850s,
an approximate optimal solution. aiming to use algebraic methods to study the graph, convert
The earliest study on spectral clustering began in 1973. graph characteristics into algebraic characteristics, and
Donath and Hoffman put forward that graph partitioning then use the algebraic characteristics and algebraic meth-
can be built based on the eigenvectors of adjacency matrix ods to deduce the theorems about graphs [19]. In fact, the
[18]. In the same year, Fiedler proved that the bipartition of main content of algebraic graph theory is spectrum. Here,
a graph is closely related to the second eigenvector of spectrum represents the eigenvalues and their multiplicities
Laplacian matrix [23]. He suggested using this eigenvector of matrix. The earliest research on algebraic graph theory is
to conduct graph partitioning. Hagen and Kahng found the made by Fiedler [23]. He derived the algebraic criterion
relations among clustering, graph partitioning and the about the connectivity of graph. Whether a graph is con-
eigenvectors of similarity matrix, and constructed a prac- nected or not can be judged by the second smallest
tical algorithm first [25]. They proposed ratio cut method in eigenvalue of Laplacian matrix. Later, the eigenvector
1992. In 2000, Shi and Malik proposed normalized cut corresponding to the second smallest eigenvalue is named
[55]. This method not only considers the external con- Fiedler vector, which contains the instruction information
nections between clusters, but also considers the internal about dividing a graph into two parts.
connections within a cluster, so it can produce balanced Adjacency matrix (denoted as A) and Laplacian matrix
clustering results. Then, Ding et al. [14] proposed min/max (denoted as L) are commonly used representations for
cut; Ng et al. [51] proposed classic NJW algorithm. These graph. The adjacency matrix of weighted graph uses real
algorithms are based on matrix spectral theory to classify numbers to reflect the different relations between vertices.
data points, so they are called spectral clustering. Since Laplacian matrix L = D - A, where D is a diagonal
2000, spectral clustering has gradually become a research matrix, the diagonal values equal to the absolute row sums
hotspot of data mining. At present, spectral clustering has of A, and the non-diagonal elements are 0. Most spectral
been successfully applied to many fields, such as computer clustering algorithms are based on the spectrum of Lapla-
vision [42, 76], integrated circuit design [2], load balancing cian matrix to split graphs. There are two kinds of Lapla-
[20, 27], biological information [33, 52], and text classifi- cian matrixes: un-normalized Laplacian matrix (L) and
cation [69], etc. Spectral clustering algorithms provide a normalized Laplacian matrix. Normalized Laplacian
new idea to solve the problem of clustering and can matrix includes symmetric form (denoted as Ls) and ran-
effectively deal with many practical problems, so their dom walk form (denoted as Lr). Their expressions are
research has great scientific value and application potential. shown in Table 1.
This paper is organized as follows: Sect. 2 introduces Mohar introduced some characteristics of un-normalized
the basic concepts of algebraic graph theory, compares the Laplacian matrix [46]. Shi and Malik studied the charac-
objective functions of typical graph cut methods, and teristics of normalized Laplacian matrix [55]. The spec-
explores the nature of spectral clustering algorithm; Sect. 3 trum of Laplacian matrix provides very useful information
summarizes the latest research achievements of spectral for graph partitioning. Based on Fiedler vector, we can
clustering and discusses several key issues in spectral
clustering, such as how to construct similarity matrix and
Laplacian matrix, how to select eigenvectors, how to Table 1 Graph matrix for spectral clustering
determine cluster number, and the applications of spectral Graph Expression Bipartition of a graph Multi-partition
clustering; finally, several valuable research directions are matrix of a graph
proposed, in light of the deficiencies of spectral clustering
Laplacian L=D-A Based on fielder vector Based on
algorithms. matrix multiple
Lr = D-1L
main
Ls = D-1/2
eigenvectors
LD-1/2
2 Basic concepts of spectral clustering
Probability P = D-1A Based on the
transition eigenvector of the
2.1 Algebraic graph theory matrix second largest
eigenvalue
T
Graph theory originated in the famous problem of Ko- Modularity B ¼ A  dd Based on the
2m
nigsberg Seven Bridges, which is an important branch of matrix eigenvector of the
largest eigenvalue
mathematics. It is the study of theories and methods about

123
Author's personal copy
Neural Comput & Applic

divide a graph into two parts; based on multiple main 2.2 Graph cut methods
eigenvectors, we can divide a graph into k parts. Luxburg
made a comprehensive summary about the characteristics Spectral clustering uses the similarity graph to deal with
of Laplacian matrix [60]. Whether we should use un-nor- the problem of clustering. Its final purpose is to find a
malized Laplacian matrix or normalized Laplacian matrix partition of the graph such that the edges between different
is still under discussion. Ng et al. [51] provided the evi- groups have very low weights, which means that points in
dence that normalized Laplacian matrix is more suitable for different clusters are dissimilar from each other; and the
spectral clustering, which means that the performance of edges within a group have high weights, which means that
normalized spectral clustering is better than un-normalized points within the same cluster are similar to each other. The
spectral clustering. Higham Kibble pointed out that under prototype of graph cut clustering is minimum spanning tree
certain conditions, un-normalized spectral clustering can (MST) method [59, 72]. MST clustering method is pro-
produce better clustering results [28]. While, Luxburg et al. posed by Zahn [72]. This algorithm builds the minimum
[40] proved that from the view of statistical consistency spanning tree by the adjacency matrix of graph, and then
theory, normalized Laplacian matrix is superior to un- removes the edges with large weights from the minimum
normalized Laplacian matrix. spanning tree to get a set of connected components. This
Table 1 also shows the expression of probability tran- method is successful in detecting clearly separated clusters,
sition matrix (denoted as P). Probability transition matrix is but if the density of nodes is changed, its performance will
essentially the normalized form of similarity matrix. Since deteriorate. Another disadvantage is Zahn’s research is
the row sum of normalized similarity matrix is 1, the ele- under the circumstance that the cluster structure (such as
ments in P can be understood as the Markov transition separating cluster, contact cluster, density cluster, etc.) is
probability. The larger the transition probability between known in advance.
two nodes, the possibility that they belong to the same Cutting a graph means to divide a graph into multiple
cluster is greater. The spectrum of probability transition connected components by removing certain edges, and the
matrix also contains the necessary information to split a sum of weight of the removed edges is called cut. Bames
graph, but it is slightly different from the spectrum of first proposed minimum cut clustering criteria [6]. Its basic
Laplacian matrix. In probability transition matrix, the idea is seeking the minimum cut while dividing a graph
eigenvector corresponding to the second largest eigenvalue into k connected sub-graphs. Then Alpert and Yao put
can indicate the bipartition of a graph, and multiple forward the spectral method to solve the minimum cut
eigenvectors corresponding to the main eigenvalues can criteria, which laid an important foundation for the later
indicate the multi-partition of a graph [43]. development of spectral clustering [3]. Wu and Leahy
Another novel matrix is modularity matrix (denoted as applied minimum cut to image segmentation and based on
B), which comes from the study of community structure in the maximum network flow theory to calculate the mini-
complex networks [48, 49, 34]. It has a clear physical mum cut [66]. Minimum cut clustering is successful in
meaning, and its expression is shown in Table 1. In the some applications of image segmentation, but the biggest
expression, d represents the column vector, whose ele- problem is it may lead to serious uneven split, such as
ments are the degree of nodes; m represents the total ‘‘solitary point’’ or ‘‘small cluster’’. In order to solve this
weight of graph edges; the elements of B is the difference problem, Wei and Cheng proposed ratio cut [65], Sarkar
between the actual edge number and the desired edge and Soundararajan proposed average cut [54], Shi and
number of pairwise nodes, which also represents the Malik proposed normalized cut [55], Ding et al. [14] pro-
extent that actual edge number beyond desired edge posed Min/Max cut. The expressions of their objective
number. Therefore, this matrix leads directly to an functions are shown in Table 2. These optimization
objective function that the optimal partition should make objectives are able to produce more balanced split.
the edges of communities (corresponding to ‘‘clusters’’) as Take graph bipartition for example. Assume V is a given
dense as possible, preferably beyond expected. As for set of data points. A represents a subset of V, and B repre-
matrix characteristics, modularity matrix and Laplacian sents V/A. For illustrative purposes, we define the following
matrix have some similarities, such as row sum (column four terms:
sum) is 0 and 0 is the eigenvalue. But they also have an P
1. CutðA; BÞ ¼ i2A;j2B wij denotes the sum of connec-
obvious difference that modularity matrix is not a positive
tion weights between cluster A and B.
semi-definite matrix, so, some of its eigenvalues may be P
negative. As for graph partitioning, the bipartition of a 2. CutðA; AÞ ¼ i2A;j2A wij denotes the sum of connec-
network is based on the eigenvector of the largest eigen- tion weights within cluster A.
P
value, and the multi-partition of a network is based on 3. VolðAÞ ¼ i2A di denotes the total degrees of the
multiple main eigenvectors. vertices in cluster A.

123
Author's personal copy
Neural Comput & Applic

4. j Aj ¼ nA denotes the number of vertices in cluster A, matrix trace. To complete the minimizing or maximizing
which is used to describe the size of cluster A. tasks need to rely on spectral clustering algorithm.
Usually most spectral clustering algorithms are formed
The study of complex networks has caused great atten-
of the following three stages: preprocessing, spectral rep-
tion in the past decade. Complex networks possess a rich,
resentation and clustering [26]. First construct the graph
multi-scale structure reflecting the dynamical and func-
and similarity matrix to represent the dataset; then form the
tional organization of the systems they model [44]. New-
associated Laplacian matrix, compute eigenvalues and
man systematically studied the spectral algorithm for
eigenvectors of the Laplacian matrix, and map each point
community structure in non-weighted network, weighted
to a lower-dimensional representation, based on one or
network, as well as directed network [34, 48, 50]. He uses
more eigenvectors; at last, assign points to two or more
modularity function to detect communities. Modularity
classes, based on the new representation. In order to par-
criterion is a novel idea. Take non-weighted graph for
tition a dataset or graph into k (k [ 2) clusters, there are
example: only if the real proportion of edges in commu-
two basic approaches: recursive 2-way partitioning and k-
nities is greater than the ‘‘expected’’ proportion, the split is
way partitioning. The comparison of these two methods is
considered to be reasonable. The ‘‘expected’’ edge number
shown in Table 3.
is derived from a random graph model, which is based on
the configuration model. This is obviously different from
the starting point of traditional graph cut clustering meth-
ods. The modularity function is shown in Table 2, where 3 The latest development of spectral clustering
Q represents modularity, m represents the number of edges
contained in the graph, ki represents the degree of node i (kj Spectral clustering is a large family of grouping methods,
is similar), vi and vj is -1 or 1. vi = vj indicate node i and and its research is very active in machine learning and data
j belong to different communities; vi = vj indicate node mining, because of the universality, efficiency, and theo-
i and j belong to the same community. retical support of spectral analysis. Next, we will discuss
Minimizing ratio cut, average cut, normalized cut, and the latest development of spectral clustering from the fol-
maximizing modularity are all NP discrete optimization lowing aspects: ‘‘construct similarity matrix’’, ‘‘form La-
problems. Fortunately, spectral method can provide a loose placian matrix’’, ‘‘select eigenvectors’’, ‘‘the number of
solution within polynomial time for these optimization clusters’’ and ‘‘the applications of spectral clustering’’.
problems. Here, ‘‘loose’’ means relax the discrete optimi-
zation problem to the real number field, and then using 3.1 Construct similarity matrix
some heuristic approach to re-convert it to a discrete
solution. The essence of graph partitioning can be sum- The key to spectral clustering is to select a good distance
marized as the minimization or maximization problem of measurement, which can well describe the intrinsic

Table 2 Comparison of graph cut methods


References Graph cut Objective function Advantage Disadvantage

Bames [6] Minimum Mcut(A, B) = Cut (A, B) The algorithm is simple and easy Does not consider the cluster size;
cut to implement may lead to serious uneven split
Wei and Cheng Ratio cut Rcut ðA; BÞ ¼ CutðA;BÞ
j AjjBj
Introduces the size of clusters, Only focus on the reducing the
[65] which reduces the possibility of similarity between clusters
over-split
Sarkar and Average cut Acut ðA; BÞ ¼ CutðA;BÞ
j Aj þ CutðA;BÞ
jBj
Can produce more accurate Only take into account the
Soundararajan classification connections between clusters, while
[54] ignore the connections within a
cluster
Shi and Malik Normalized Ncut ðA; BÞ ¼ CutðA;BÞ CutðA;BÞ
Vol(AÞ þ Vol(BÞ
Take into account both inter- The algorithm efficiency is low and
[55] cut cluster connections and intra- unable to deal with the clustering
cluster connections problem of big data
Ding et al. [14] Min/Max CutðA;BÞ CutðA;BÞ Tend to produce balanced The algorithm complexity is
Mmcut ðA; BÞ ¼ Cut( A;AÞ
þ Cut( B;BÞ
cut clusters and can avoid the relatively high with slow running
clusters containing only a few speed
vertices
Newman [50] Modularity 1
P  ki kj

vi vj þ1

Suitable for the partition of Does not perform well when there is
Q ¼ 2m i;j Ai;j  2m 2
complex networks, and can serious overlap between clusters
efficiently find communities

123
Author's personal copy
Neural Comput & Applic

Table 3 Comparison of recursive 2-way partitioning and k-way partitioning


Partition Basic idea Advantage Disadvantage
method

Recursive Divide the graph into two parts by a The idea of this algorithm is simple and This method is unstable, and costs a large
2-way certain 2-way partitioning algorithm, easy to realize by programming computation with low efficiency; it
partitioning and then recursively apply the same only utilizes the information of single
procedure to the sub-graphs in a eigenvector (such as Fiedler vector)
hierarchical way, until the number of
clusters is enough or the recursive
conditions are violated
k-way First select several main eigenvectors of Make full use of the information of The optimization of its objective
partitioning Laplacian matrix that contain multiple eigenvectors; it has less function is usually more difficult; not
classification information in a heuristic computational complexity and the easy to select the appropriate
way, and then use these eigenvectors to clustering results are quite satisfactory eigenvectors
map the original data points to a
spectral space to conduct the clustering

structure of data points. Data in the same groups should Wang et al. [64] present spectral multi-manifold clus-
have high similarity and follow space consistency. Simi- tering (SMMC), based on the analysis that spectral clus-
larity measurement is crucial to the performance of spectral tering is able to work well when the similarity values of the
clustering [62]. The Gaussian kernel function is usually points belonging to different clusters are relatively low. In
adopted as the similarity measure. However, with a fixed their model, the data are assumed to lie on or close to
scaling parameter r, the similarity between two data points multiple smooth low-dimensional manifolds, where some
is only determined by their Euclidean distance and is not data manifolds are separated but some are intersecting.
adaptive to their surroundings. While dealing with complex Then, local geometric information of the sampled data is
dataset, the similarity simply based on Euclidean distance incorporated to construct a suitable similarity matrix.
cannot reflect the data distribution accurately and in turn Finally, spectral method is applied to this similarity matrix
resulting in the poor performance of spectral clustering. to group the data. SMMC achieves good performance over
Zhang et al. [74] propose a local density adaptive sim- a broad range of parameter settings and is able to handle
ilarity measure—CNN (Common-Near-Neighbor), which intersections, but its robustness remains to be improved.
uses the local density between two data points to scale the In order to better describe the data distribution, Zhang
Gaussian kernel function. CNN method is based on the and You propose a random walk based approach to process
following observation: if two points are distributed in the the Gaussian kernel similarity matrix [75]. In this method,
same cluster, they are in the same region which has a the pairwise similarity between two data points is not only
relatively high density. It has an effect of amplifying intra- related to the two points, but also related to their neighbors.
cluster similarity thus making the affinity matrix clearly Li and Guo develop a new affinity matrix generation
block diagonal. Experimental results show that the spectral method using neighbor relation propagation principle [36].
clustering algorithm with local density adaptive similarity The affinity matrix generated can increase the similarity of
measure outperforms the traditional spectral clustering point pairs that should be in same cluster and can well
algorithm, the path-based spectral clustering algorithm and detect the structure of data. Blekas and Lagaris introduce
the self-tuning spectral clustering algorithm. Newtonian spectral clustering based on Newton’s equa-
Yang et al. [70] develop a density sensitive distance tions of motion [7]. They build an underlying interaction
measure. This measure defines the adjustable line segment model for trajectory analysis and employ Newtonian pre-
length, which can adjust the distance in regions with processing to gain valuable affinity information, which can
different density. It squeezes the distances in high density be used to enrich the affinity matrix.
regions while widen them in low density regions. By the
distance measure, they design a new similarity function 3.2 Form Laplacian matrix
for spectral clustering. Compared with the spectral clus-
tering based on conventional Euclidean distance or After the similarity matrix is constructed, the next step is
Gaussian kernel function, the proposed algorithm with forming the corresponding Laplacian matrix according to
density sensitive similarity measure can obtain desirable different graph cut methods. There are three forms of tra-
clusters with high performance on both synthetic and real ditional Laplacian matrix, which are, respectively, suitable
life datasets. for different clustering conditions. The selection of graph

123
Author's personal copy
Neural Comput & Applic

cut methods and the establishment of Laplacian matrix (Dcut) [10]. It normalizes the similarity matrix with the
both have an important impact on the performance of corresponding regularized Laplacian matrix. Dcut can
spectral clustering algorithms. reveal the internal relationships among data and produce
As a natural nonlinear generalization of graph Lapla- exciting clustering results.
cian, p-Laplacian has recently been applied to two-class
cases. Luo et al. [39] propose full eigenvector analysis of 3.3 Select eigenvectors
p-Laplacian and obtain a natural global embedding for
multi-class clustering problems, instead of using greedy The eigenvalues and eigenvectors of Laplacian matrix can
search strategy implemented by previous researchers. An be obtained by eigen-decomposition. An analysis of the
efficient gradient descend optimization approach is intro- characteristics of eigenspace is carried out which shows
duced to obtain the p-Laplacian embedding space, which is that: (a) not every eigenvector of a Laplacian matrix is
guaranteed to converge to feasible local solutions. Empir- informative and relevant for clustering; (b) eigenvector
ical results suggest that the greedy search method often selection is critical because using uninformative/irrelevant
fails in many real-world applications with non-trivial data eigenvectors could lead to poor clustering results; (c) the
structures, but this approach consistently gets robust clus- corresponding eigenvalues cannot be used to select rele-
tering results and preserves the local smooth manifold vant eigenvectors given a realistic dataset. NJW algorithm
structures of real-world data in the embedding space. partitions data using the largest k eigenvectors of the nor-
Yang et al. [71] propose a new image clustering algo- malized Laplacian matrix derived from the dataset. How-
rithm, referred to as clustering using local discriminant ever, some experiments demonstrate that the top k
models and global integration (LDMGI). A new Laplacian eigenvectors cannot always detect the structure of the data
matrix is learnt in LDMGI by exploiting both manifold for real pattern recognition problems. So it is necessary to
structure and local discriminant information. This algo- find a better way to select eigenvectors for spectral
rithm constructs a local clique for each data point sampled clustering.
from a nonlinear manifold, and uses a local discriminant Xiang and Gong propose the concept of ‘‘eigenvector
model to evaluate the clustering performance of samples relevance’’, which differs from previous approaches in that
within the local clique. Then a unified objective function is only informative/relevant eigenvectors are employed for
proposed to globally integrate the local models of all the determining the number of clusters and performing clus-
local cliques. Compared with Normalized cut, LDMGI is tering [67]. The key element of their algorithm is a simple
more robust to algorithmic parameter and is more appeal- but effective relevance learning method, which measures
ing for the real image clustering applications, in which the the relevance of an eigenvector according to how well it
algorithmic parameters are generally not available for can separate the dataset into different clusters. Experi-
tuning. mental results show that this algorithm is able to estimate
Most graph Laplacians are base on the Euclidean dis- the cluster number correctly and reveal natural grouping of
tance, which does not necessarily reflect the inherent dis- the input data/patterns even given sparse and noisy data.
tribution of the data. So Xie et al. [68] propose a method to Zhao et al. [77] propose an eigenvector selection method
directly optimize the normalized graph Laplacian by using based on entropy ranking for spectral clustering (ESBER).
pairwise constraints. The learned graph is consistent with In this method, first all the eigenvectors are ranked
equivalence and non-equivalence pairwise relationships, according to their importance on clustering, and then a
which can better represent the similarity between samples. suitable eigenvector combination is obtained from the
Meanwhile, this approach automatically determines the ranking list. There are two strategies to select eigenvectors
scaling parameter during the optimization. The learned in the ranking list of eigenvectors: one is directly adopting
normalized Laplacian matrix can be directly applied in the first k eigenvectors in the ranking list, which are the
spectral clustering and semi-supervised learning most important eigenvectors among all the eigenvectors,
algorithms. different to the largest k eigenvectors of NJW method; the
Frederix and Van Barel use linear algebra techniques to other strategy is to search a suitable eigenvector combi-
solve the eigenvalue problem of a graph Laplacian and nation among the first km (km [ k) eigenvectors in the
propose a novel sparse spectral clustering method [24]. ranking list, which can reflect the structure of the original
This method exploits the structure of the Laplacian to data. ESBER method is more robust than NJW method and
construct an approximation, not in terms of a low-rank can obtain satisfying clustering results in most cases.
approximation but in terms of capturing the structure of the Rebagliati and Verri find a fundamental working
matrix. With this approximation, the size of the eigenvalue hypothesis of NJW algorithm, that the optimal partition of
problem can be reduced. Chen and Feng present a novel k clusters can be obtained from the largest k eigenvectors
k-way spectral clustering algorithm called discriminant cut of matrix Lsym, only if the gap between the k-th and the

123
Author's personal copy
Neural Comput & Applic

k ? 1-th eigenvalue of Lsym is sufficiently large. If the gap Tepper et al. [57] introduce a perceptually driven clus-
is small, a perturbation may swap the corresponding tering method, in which the number of clusters is auto-
eigenvectors and the results can be very different from the matically determined by setting parameter e that controls
optimal ones. So they suggest a weaker working hypothe- the average number of false detections. The detection
sis: the optimal partition of k clusters can be obtained from thresholds are well adapted to accept/reject non-clustered
a k-dimensional subspace of the first m (m [ k) eigenvec- data and can help find the right number of clusters. This
tors, where m is a parameter chosen by the user [53]. The method only takes into account inter-point distances and
bound of m is based upon the gap between the m ? 1-th has no random steps. Besides, it is independent from the
and the k-th eigenvalue and ensures stability of the solu- original data dimensionality, which means that its running
tion. This algorithm is robust to small changes of the time is not affected by an increase in dimensionality. The
eigenvalues, and gives satisfying results on real-world combination of this method with normalized cuts performs
graphs by selecting correct k-dimensional subspaces of the well on both synthetic and real-world datasets and the
linear span of the first m eigenvectors. detected clusters are perceptually significant.

3.4 The number of clusters 3.5 The applications of spectral clustering

In most spectral clustering algorithms, the number of Nowadays, spectral clustering has been successfully
clusters must be manually set and it is very sensitive to applied to many areas, such as data analysis, speech sep-
initialization. How to accurately estimate the number of aration, video indexing, character recognition, image pro-
clusters is one of the major challenges faced by spectral cessing, etc. In such applications, the number of data points
clustering. Some existing approaches attempt to get the to cluster can be enormous. Indeed, a small 256 9 256
optimal cluster number by minimizing some distance- gray level image leads to a dataset of 65,536 points, while
based dissimilarity measure within clusters. 4 s of speech sampled at 5 kH leads to more than 20,000
Wang proposes a novel selection criterion, whose key spectrogram samples. In addition, real-world datasets often
idea is to select the number of clusters as the one maxi- contain a lot of outliers and noise points with complex data
mizing the clustering stability [61]. Since maximizing the structures [38]. All these issues should be carefully con-
clustering stability is equivalent to minimizing the cluster- sidered when dealing with the specific clustering problems.
ing instability, they develop an estimation scheme for the Bach and Jordan use spectral clustering to solve the
clustering instability based on modified cross-validation. problem of speech separation and present a blind separa-
The idea of this scheme is to split the data into two training tion algorithm, which separates speech mixtures from a
datasets and one validation dataset, where the two training single microphone without requiring models of specific
datasets are used to construct two clustering and the vali- speakers [5]. It works within a time–frequency represen-
dation dataset is used to measure the clustering instability. tation (a spectrogram), and exploits the sparsity of speech
This selection criterion is applicable to all kinds of clus- signals in this representation. That is, although two
tering algorithms, including distance-based or non-dis- speakers might speak simultaneously, there is relatively
tance-based algorithms. However, the data splitting reduces little overlap in the time–frequency plane if the speakers
the sizes of training datasets, so the effectiveness of the are different. Thus, speech separation can be formulated as
cross-validation method is remained to be further studied. a problem in segmentation in the time–frequency plane.
The concept of clustering stability can measure the Bach et al. have successfully demixed speech signals from
robustness of any given clustering algorithm. Inspired by two speakers using this approach.
Wang’s method [61], Fang and Wang develop a new Video indexing requires the efficient segmentation of
estimation scheme for clustering instability based on the video into scenes. A scene can be regarded as a series of
bootstrap, in which the number of clusters is selected so semantically correlated shots. The visual content of each
that the corresponding estimated clustering instability is shot can be represented by one or multiple frames, called
minimized [22]. The implementation of the bootstrap key-frames. Chasanis et al. [9] develop a new approach for
method is straightforward, and it has a number of advan- video scene segmentation. To cluster the shots into groups,
tages. First, the bootstrap samples are of the same size as they propose an improved spectral clustering method that
that of the original data, so the bootstrap method is more both estimates the number of clusters and employs the fast
efficient. Second, the bootstrap estimate of the clustering global k-means algorithm in the clustering stage after the
instability is the nonparametric maximum likelihood esti- eigenvector computation of the similarity matrix. The shot
mate (MLE). Third, the bootstrap method can provide the similarity is computed based only on visual features and a
instability path of a clustering algorithm for any given label is assigned to each shot according to the group that it
number of clusters. belongs to. Then, a sequence alignment algorithm is applied

123
Author's personal copy
Neural Comput & Applic

to detect when the pattern of shot labels changes, providing Experiments show that compounds of extremely different
the final scene segmentation result. This scene detection chemotypes clustering together, which can help reveal the
method can efficiently summarize the content of each shot internal relations of these compounds.
and detect most of the scene boundaries accurately, while
preserving a good tradeoff between recall and precision.
Zeng et al. [73] apply spectral clustering to recognize 4 Conclusion and prospect
handwritten numerals and obtain satisfying results. They
first select the Zernike moment features of handwritten Spectral clustering is an elegant and powerful approach for
numerals based on the principles that the distinction degree clustering, which has been widely used in many fields.
of inside-cluster features is small and the dividing of the Especially in the graph and network areas, a lot of per-
features between clusters is huge; then construct the simi- sonalized improved algorithms have emerged. Why spec-
larity matrix between handwritten numerals by the simi- tral clustering attracts so many researchers, there are three
larity measure based on Grey relational analysis and make most important reasons: firstly, it has a solid theoretical
transitivity transformation to the similarity matrix for better foundation—algebraic graph theory; secondly, for complex
block symmetry after reformation; finally make spectrum cluster structure, it can get a global loose solution; thirdly,
decomposition to the Laplacian matrix derived from the it can solve the problem within a polynomial time. How-
reformed similarity matrix, and recognize the handwritten ever, as a novel clustering method, spectral clustering is
numerals with the eigenvectors corresponding to the sec- still in the development stage, and there are many problems
ond minimal eigenvalue of Laplacian matrix as the spectral worthy of further study.
features. This algorithm is robust to outliers and its rec-
1. Semi-supervised spectral clustering
ognition is also very effective.
Spectral clustering has been broadly used in image Traditional spectral clustering is an unsupervised
segmentation. Liu et al. [37] take into account the spatial learning that does not take into account the clustering
information of the pixels in image and propose a novel intention of the user. User intention is actually a priori
non-local spatial spectral clustering algorithm, which is knowledge, also known as the supervised information. The
robust to noise and other imaging artifacts. Nowadays, clustering algorithm guided by supervised information is
High-Definition (HD) images are widely used in television called semi-supervised clustering. Limited priori knowl-
broadcasting and movies. Segmenting these high resolution edge can be easily obtained from samples in practice, such
images presents a grand challenge because of significant as the pairwise constraints of samples. A large number of
computational demands. Wang and Dong develop a multi- studies have shown that making full use of priori knowl-
level low-rank approximation-based spectral clustering edge in the process of searching clusters can significantly
method, which can effectively segment high resolution improve the performance of the clustering algorithm [11,
images [63]. They also develop a fast sampling strategy to 32]. Therefore, it will be very meaningful to combine priori
select sufficient data samples, leading to accurate approx- knowledge with spectral clustering and carry out the
imation and segmentation. In order to deal with large research of semi-supervised spectral clustering.
images, Tung et al. [58] propose an enabling scalable
2. Fuzzy spectral clustering
spectral clustering algorithm, which combines a blockwise
segmentation strategy with stochastic ensemble consensus. Most of the existing spectral clustering algorithms are
The purpose of using stochastic ensemble consensus is to hard partition methods. They strictly divide each object
integrate both global and local image characteristics in into a class and the class of an object is either this or that,
determining the pixel classifications. which belongs to the scope of classical set theory. In fact,
Ding et al. [15] focus on controlled islanding problem and most objects do not have strict properties. These objects are
use spectral clustering to find a suitable islanding solution for intermediary in forms and classes, suitable for soft division
preventing the initiation of wide area blackouts. The objec- [45]. The classical set theory often cannot completely solve
tive function used in this controlled islanding algorithm is the the classification problems of ambiguity. Fuzzy set theory
minimal power-flow disruption. It is demonstrated that this proposed by Zadeh, provides a powerful tool for this soft
algorithm is computationally efficient when solving the division. Applying fuzzy set theory to spectral clustering
controlled islanding problem, particularly in the case of a and studying effective fuzzy spectral clustering algorithms
large power system. Adefioye et al. [1] develop a multi-view are also very significant.
spectral clustering algorithm for chemical compound clus-
3. Kernel spectral clustering
tering. The tensor-based spectral methods provide chemi-
cally appropriate and statistically significant results when Spectral clustering algorithms are mostly based on some
attempting to cluster compounds from multiple data sources. similarity measure to classify the samples, so that similar

123
Author's personal copy
Neural Comput & Applic

samples are gathered into the same cluster, while dissimilar 2. Alpert CJ, Kahng AB (1995) Multi-way partitioning via geo-
samples are separated into different clusters. Thus, the metric embeddings, orderings and dynamic programming. IEEE
Trans Comput-Aaid Des Integr Circuits Syst 14(11):1342–1358
clustering process is mainly dependent on the characteristic 3. Alpert CJ, Yao SZ (1995) Spectral partitioning: the more
difference between the samples. If the distribution of eigenvectors, the better. In: Proceedings of the 32nd annual
samples is very complex, conventional methods may not be ACM/IEEE design automation conference. ACM, New York,
able to get ideal clustering results. Using kernel methods pp 195–200
4. Alzate C, Suykens JAK (2012) Hierarchical kernel spectral
can enlarge the useful features of samples by mapping clustering. Neural Netw 35:21–30
them to a high dimensional feature space, so that the 5. Bach FR, Jordan MI (2006) Learning spectral clustering, with
samples are easier to clustering, and the convergence speed application to speech separation. J Mach Learn Res 7:1963–2001
of the algorithm will be accelerated [4]. Therefore, when 6. Bames ER (1982) An algorithm for partitioning the nodes of a
graph. SIAM J Algebraic Discrete Methods 17(5):541–550
dealing with complex clustering problems, the combination 7. Blekas K, Lagaris IE (2013) A spectral clustering approach based
of kernel methods and spectral techniques can be consid- on Newton’s equations of motion. Int J Intell Syst 28(4):394–410
ered as a valuable research direction. 8. Cai XY, Dai GZ, Yang LB (2008) Survey on spectral clustering
algorithms. Comput Sci 35(7):14–18
4. Clustering of large datasets 9. Chasanis VT, Likas AC, Galatsanos NP (2009) Scene detection in
videos using shot clustering and sequence alignment. IEEE Trans
Spectral clustering algorithms involve the calculation of Multimed 11(1):89–100
eigenvalues and eigenvectors. The underlying eigen- 10. Chen WF, Feng GC (2012) Spectral clustering with discriminant
decomposition takes cubic time and quadratic space with cuts. Knowl-Based Syst 28:27–37
11. Chen WF, Feng GC (2012) Spectral clustering: a semi-supervised
regard to the dataset size [12]. These can be reduced by the approach. Neurocomputing 77(1):229–242
Nystrom method which samples only a subset of columns 12. Chen WY, Song YQ, Bai HJ et al (2011) Parallel spectral clus-
from the matrix. However, the manipulation and storage of tering in distributed systems. IEEE Trans Patt Anal Mach Intell
these sampled columns can still be expensive when the 33(3):568–586
13. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood
dataset is large. Time and space complexity has become an from incomplete data via the EM algorithm. J R Stat Soc Ser B
obstacle for the generalization of spectral clustering in Stat Methodol 39(1):1–38
practical applications. So, it is worthy of deep study to 14. Ding CHQ, He X, Zha H et al (2001) A min-max cut algorithm
explore an effective method to reduce the computation for graph partitioning and data clustering. In: Proceedings of
IEEE international conference on data mining (ICDM’ 2001),
complexity of spectral clustering algorithms, and make pp 107–114
them suitable for massive learning problems. 15. Ding L, Gonzalez-Longatt FM, Wall P, Terzija V (2013) Two-
step spectral clustering controlled islanding algorithm. IEEE
5. Spectral clustering ensemble Trans Power Syst 28(1):75–84
16. Ding SF, Jia HJ, Zhang LW et al (2012) Research of semi-
Traditional spectral clustering algorithms are sensitive to supervised spectral clustering algorithm based on pairwise con-
the scaling parameter and have inherent randomness. In order straints. Neural Comput Appl. doi:10.1007/s00521-012-1207-8
to overcome these problems, clustering ensemble strategy can 17. Ding SF, Qi BJ, Jia HJ et al (2013) Research of semi-supervised
be introduced to spectral clustering [31]. Because it is pos- spectral clustering based on constraints expansion. Neural Com-
put Appl 22(Suppl 1):S405–S410
sible to get better clusters by searching the combination of 18. Donath WE, Hoffman AJ (1973) Lower bounds for the parti-
multiple clustering results. Clustering ensemble can make full tioning of graph. IBM J Res Dev 17(5):420–425
use of the results of learning algorithms under different 19. Dong XW, Frossard P, Vandergheynst P, Nefedov N (2012)
conditions and find the cluster combinations that cannot be Clustering with multi-layer graphs: a spectral perspective. IEEE
Trans Sig Process 60(11):5820–5831
obtained by a single clustering algorithm. Thus, the spectral 20. Driessche RV, Roose D (1995) An improved spectral bisection
clustering ensemble algorithm is able to improve the quality algorithm and its application to dynamic load balancing. Parallel
and stability of the clustering results, with strong robustness Comput 21(1):29–48
to noise, outliers, and sample changes. 21. Dunn JC (1974) Well-separated clusters and the optimal fuzzy
partitions. J Cybern 4(1):95–104
22. Fang YX, Wang JH (2012) Selection of the number of clusters
Acknowledgments This work is supported by the National Key via the bootstrap method. Comput Stat Data Anal 56(3):468–477
Basic Research Program of China (No.2013CB329502), and the 23. Fiedler M (1973) Algebraic connectivity of graphs. Czechoslov
Fundamental Research Funds for the Central Universities Math J 23(2):298–305
(No.2013XK10). 24. Frederix K, Van Barel M (2013) Sparse spectral clustering
method based on the incomplete Cholesky decomposition.
J Comput Appl Math 237(1):145–161
25. Hagen L, Kahng AB (1992) New spectral methods for radio cut
References partitioning and clustering. IEEE Trans Comput-aid Des Integr
Circuits Syst 11(9):1074–1085
1. Adefioye AA, Liu XH, Moor BD (2013) Multi-view spectral 26. Hamad D, Biela P (2008) Introduction to spectral clustering. In:
clustering and its chemical application. Int J Comput Biol Drug 3rd International conference on information and communication
Des 6(1–2):32–49 technologies: from theory to applications, 1–5, pp 490–495

123
Author's personal copy
Neural Comput & Applic

27. Hendrickson B, Leland R (1995) An improved spectral graph 51. Ng AY, Jordan MI, Weiss Y (2002) On spectral clustering:
partitioning algorithm for mapping parallel computations. SIAM analysis and an algorithm. Adv Neural Inf Process Syst 14:
J Sci Comput 16(2):452–459 849–856
28. Higham DJ, Kibble M (2004) A unified view of spectral clus- 52. Paccanaro A, Chennubhotla C, Casbon JA (2006) Spectral clus-
tering. In: University of Strathclyde Mathematics Research tering of protein sequences. Nucl Acids Res 34(5):1571–1580
Report 02 53. Rebagliati N, Verri A (2011) Spectral clustering with more than
29. Huang Z (1997) A fast clustering algorithm to cluster very large K eigenvectors. Neurocomputing 74(9):1391–1401
categorical data sets in data mining. In: Proceedings of the 54. Sarkar S, Soundararajan P (2000) Supervised learning of large
SIGMOD workshop on research issues on data mining and perceptual organization: graph spectral partitioning and learning
knowledge discovery. Tucson, pp 146–151 automata. IEEE Trans Patt Anal Mach Intell 22(5):504–525
30. Huang Z (1998) Extensions to the k-means algorithm for clus- 55. Shi J, Malik J (2000) Normalized cuts and image segmentation.
tering large data sets with categorical values. Data Min Knowl IEEE Trans Patt Anal Mach Intell 22(8):888–905
Discov 2(3):283–304 56. Sun JG, Liu J, Zhao LY (2008) Clustering algorithms research.
31. Jia JH, Xiao X, Liu BX, Jiao LC (2011) Bagging-based spectral J Softw 19(1):48–61
clustering ensemble selection. Patt Recogn Lett 32(10): 57. Tepper M, Muse P, Almansa A, Mejail M (2011) Automatically
1456–1467 finding clusters in normalized cuts. Patt Recogn 44(7):1372–1386
32. Jiao LC, Shang FH, Wang F, Liu YY (2012) Fast semi-supervised 58. Tung F, Wong A, Clausi DA (2010) Enabling scalable spectral
clustering with enhanced spectral embedding. Patt Recogn clustering for image segmentation. Patt Recogn 43(12):4069–4076
45(12):4358–4369 59. Urquhart R (1982) Graph theoretical clustering based on limited
33. Kluger Y, Basri R, Chang JT et al (2003) Spectral biclustering of neighborhood sets. Pattern Recogn 15(3):173–187
microarray data: coclustering genes and conditions. Genome Res 60. von Luxburg U (2007) A tutorial on spectral clustering. Stat
13(4):703–716 Comput 17(4):395–416
34. Leicht EA, Newman MEJ (2008) Community structure in 61. Wang JH (2010) Consistent selection of the number of clusters
directed networks. Phys Rev Lett 100(11):118703 via cross validation. Biometrika 97(4):893–904
35. Li JY, Zhou JG, Guan JH et al (2011) A survey of clustering 62. Wang L, Bo LF, Jiao LC (2007) Density-sensitive spectral
algorithms based on spectra of graphs. CAAI Trans Intell Syst 6(5): clustering. Acta Electronica Sinica 35(8):1577–1581
405–414 63. Wang LJ, Dong M (2012) Multi-level low-rank approximation-
36. Li XY, Guo LJ (2012) Constructing affinity matrix in spectral based spectral clustering for image segmentation. Patt Recogn
clustering based on neighbor propagation. Neurocomputing 97: Lett 33(16):2206–2215
125–130 64. Wang Y, Jiang Y, Wu Y, Zhou ZH (2011) Spectral clustering on
37. Liu HQ, Jiao LC, Zhao F (2010) Non-local spatial spectral multiple manifolds. IEEE Trans Neural Netw 22(7):1149–1161
clustering for image segmentation. Neurocomputing 74(1–3): 65. Wei YC, Cheng CK (1989) Toward efficient hierarchical designs
461–471 by ratio cut partitioning. In: IEEE international conference on
38. Liu HQ, Zhao F, Jiao LC (2012) Fuzzy spectral clustering with CAD. New York, pp 298–301
robust spatial information for image segmentation. Appl Soft 66. Wu Z, Leahy R (1993) An optimal graph theoretic approach to
Comput 12(11):3636–3647 data clustering: theory and its application to image segmentation.
39. Luo DJ, Huang H, Ding C, Nie FP (2010) On the eigenvectors of IEEE Trans Patt Anal Mach Intell 15(11):1101–1113
p-Laplacian. Mach Learn 81(1):37–51 67. Xiang T, Gong S (2008) Spectral clustering with eigenvector
40. Luxburg U, Belkin M, Bousquet O (2008) Consistency of spectral selection. Patt Recogn 41(3):1012–1029
clustering. Ann Stat 36(2):555–586 68. Xie B, Wang M, Tao DC (2011) Toward the optimization of nor-
41. MacQueen J (1967) Some methods for classification and analysis malized graph Laplacian. IEEE Trans Neural Netw 22(4):660–666
of multivariate observations. In: Proceedings of 5th Berkeley 69. Xie YK, Zhou YQ, Huang XJ (2009) A spectral clustering based
symposium on mathematical statistics, 1, pp 281–297 conference resolution method. J Chin Inf Process 23(3):10–16
42. Malik J, Belongie S, Leung T et al (2001) Contour and texture 70. Yang P, Zhu QS, Huang B (2011) Spectral clustering with density
analysis for image segmentation. Int J Comput Vis 43(1):7–27 sensitive similarity function. Knowl-Based Syst 24(5):621–628
43. Meila M, Shi JB (2001) Learning segmentation by random walks. 71. Yang Y, Xu D, Nie FP, Yan SC, Zhuang YT (2010) Image
Advances in neural information processing systems. MIT Press, clustering using local discriminant models and global integration.
Cambridge, pp 873–879 IEEE Trans Image Process 19(10):2761–2773
44. Michoel T, Nachtergaele B (2012) Alignment and integration of 72. Zahn CT (1971) Graph-theoretic methods for detecting and
complex networks by hypergraph-based spectral clustering. Phys describing gestalt clusters. IEEE Trans Comput 20(1):68–86
Rev E 86(5):056111 73. Zeng S, Sang N, Tong XJ (2011) Hand-written numeral recog-
45. Mirkin B, Nascimento S (2012) Additive spectral method for nition based on spectrum clustering. In: MIPPR 2011: pattern
fuzzy cluster analysis of similarity data including community recognition and computer vision, Proceedings of SPIE, p 8004
structure and affinity matrices. Inf Sci 183(1):16–34 74. Zhang XC, Li JW, Yu H (2011) Local density adaptive similarity
46. Mohar B (1997) Some applications of Laplace eigenvalues of measurement for spectral clustering. Patt Recogn Lett 32(2):
graphs. Graph Symmetry Algebraic Methods Appl 497(22): 352–358
227–275 75. Zhang XC, You QZ (2011) An improved spectral clustering
47. Nascimento MCV, de Carvalho ACPLF (2011) Spectral methods algorithm based on random walk. Frontiers Comput Sci China
for graph clustering: a survey. Eur J Oper Res 211(2):221–231 5(3):268–278
48. Newman MEJ (2004) Analysis of weighted networks. Phys Rev E 76. Zhang XR, Jiao LC, Liu F (2008) Spectral clustering ensemble
70(5):056131 applied to SAR image segmentation. IEEE Trans Geosci Rem
49. Newman MEJ (2006) Finding community structure in networks Sens 46(7):2126–2136
using the eigenvectors of matrices. Phys Rev E 74(3):036104 77. Zhao F, Jiao LC, Liu HQ et al (2010) Spectral clustering with
50. Newman MEJ (2006) Modularity and community structure in eigenvector selection based on entropy ranking. Neurocomputing
networks. Proc Nat Acad Sci US 103(23):8577–8582 73(10–12):1704–1717

123

You might also like