0% found this document useful (0 votes)
3 views

Deep Clustering

Uploaded by

muneebke
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Deep Clustering

Uploaded by

muneebke
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

1

Learning Structure Aware Deep Spectral Embedding


Hira Yaseen, Arif Mahmood
This is a draft version of the paper accepted in IEEE Transactions on Image Processing, 2023

Abstract—Spectral Embedding (SE) has often been used to networks for the computation of self-expression matrices such
map data points from non-linear manifolds to linear subspaces as DSC [20], PSSC [41], NCSC [83], S2 ConvSCN [81],
for the purpose of classification and clustering. Despite significant MLRDSC [25], MLRDSC-DA [1], DASC [84], SENet [82],
advantages, the subspace structure of data in the original space
is not preserved in the embedding space. To address this issue and ODSC [59]. In all of these algorithms computation of
subspace clustering has been proposed by replacing the SE graph self-expression matrices have been improved by using various
arXiv:2305.08215v1 [cs.CV] 14 May 2023

affinity with a self-expression matrix. It works well if the data constraints and this matrix is then utilized to construct the
lies in a union of linear subspaces however, the performance affinity matrix for spectral clustering. This arrangement works
may degrade in real-world applications where data often spans well if the data spans a union of linear subspaces however,
non-linear manifolds. To address this problem we propose a
novel structure-aware deep spectral embedding by combining the performance may degrade if the underlying space is a
a spectral embedding loss and a structure preservation loss. nonlinear manifold. Local neighborhood relationships are also
To this end, a deep neural network architecture is proposed ignored which were originally used for the construction of
that simultaneously encodes both types of information and aims a fully connected graph and thus the strength of spectral
to generate structure-aware spectral embedding. The subspace embedding is not fully utilized. In the current work, we
structure of the input data is encoded by using attention-based
self-expression learning. The proposed algorithm is evaluated on propose to integrate the strength of spectral embedding along
six publicly available real-world datasets. The results demon- with subspace structure preservation using the self-expression
strate the excellent clustering performance of the proposed matrix. For this purpose, we propose a deep neural network-
algorithm compared to the existing state-of-the-art methods. The based architecture that learns embedding by simultaneous
proposed algorithm has also exhibited better generalization to minimization of spectral embedding-based loss as well as
unseen data points and it is scalable to larger datasets without
requiring significant computational resources. ensuring self-expression property in the latent space.
The objective function of spectral clustering is to find
Index Terms—Unsupervised learning, Subspace clustering, an embedding of the data points by eigendecomposition of
Spectral clustering, Deep spectral embedding, Self-expression
learning. the laplacian matrix encoding pairwise similarities. That data
representation is then clustered to assign them to different
categories. Despite many advantages, the subspace structure
I. I NTRODUCTION of data in the original space is not preserved in the embedding
IGH-DIMENIONAL data often spans low-dimensional
H manifolds instead of being uniformly distributed across
the ambient space. Recovering these low-dimensional mani-
space. Structure preservation during data transformation aims
to keep a set of embeddings in a common subspace that shares
the same subspace in the ambient space. To address this,
folds reduces the computational cost, memory requirements, many subspace clustering algorithms have been proposed. The
and the effect of noise and thus improves the performance of assumption of data spanning linear subspaces made by these
learning, inference, and recognition tasks. Subspace clustering existing subspace clustering methods often gets violated in
refers to the problem of separating data according to their many real-world applications. The data may be corrupted by
underlying manifolds. Subspace clustering algorithms have a errors or because of missing trajectories, occlusion, shadows,
wide range of applications in computer vision such as image and specularities [11]. These algorithms compute the self-
clustering [1, 8, 11, 54], motion segmentation [53, 70], co- expression coefficient matrix either using `1 -norm, `2 -norm,
segmentation of 3D bodies [18, 69], DNA sequencing [58, 64], nuclear norm, or a combination of these norms to preserve
omics data clustering [9, 57], and gene expression [66]. data structure. However, local neighborhood information may
Structure of a data is often encoded by using a self-expression be lost in these methods because the graph structure and local
matrix which is based on the observation that a data point node connectivity are not included, resulting in sub-optimal
in a union of subspaces can be efficiently represented by clustering performance [65].
a combination of other data points in the same manifold. In the current work, we explicitly encode both the local
Over the years, many variants of the subspace clustering and the global input data structures. For encoding the local
have been proposed such as LSR [39], LRR [37], SSC [11], structure, the supervision of spectral embedding is used to train
LS3C [46], Kernel SSC [47], BDR [40], SC-LALRG [75], and a deep network preserving local neighborhood graph affinities
S3 COMP-C [8]. In these methods, the self-expression matrix of data points. For global data structure preservation, the
has been computed using conventional optimization tech- learned embedding is constrained to minimize self-expression
niques. Recently, some algorithms have also used deep neural loss. The proposed neural architecture is trained in a batch-
by-batch fashion by computing both spectral supervision and
H. Yaseen and A Mahmood are with the Department of Computer Science,
Information Technology University (ITU), 346-B, Ferozepur Road, Lahore, self-expression in batches. Compared to full data supervision
Pakistan. E-mails: [email protected], [email protected] as used by many existing techniques, our proposed approach
2

saves computational time & memory resources. over a set of small-scale problems using orthogonal matching
The existing self-expressiveness sparse representations may pursuit. You et al. [76] introduced scalability by dealing with
limit connectivity between data points belonging to the same class-imbalanced data using a greedy algorithm that selects an
subspace, which may not form a single connected compo- exemplar subset of data using sparse subspace clustering. With
nent [8]. To handle this issue `2 -norm based dense solution has the resurgence of deep learning in subspace clustering, many
been proposed however it requires the underlying subspaces deep network-based methods have also been proposed.
to be independent [77]. In the current work, we propose a Ji et al. [20] proposed a convolutional auto-encoder for
self-attention-based global structure encoding technique. For reconstruction and introduced an additional fully connected
this purpose, we use two multi-layer fully connected networks self-expressive layer between latent-space data points. Zhang
including a query net and a key net. These networks are et al. [81] further extended the concept of a self-expressive
learned by minimizing elastic net-constrained self-expression trainable layer by incorporating spectral clustering to compute
loss. The learned structure encoding matrix is made sparse pseudo labels which are then used to train a classification
by using a nearest-neighbor-based approach removing less layer using the latent space representation. Zhang et al. [83]
probable links in latent space representations. collaboratively used two affinity matrices, one from a trainable
Overall the proposed algorithm consists of an end-to-end self-expressive layer, and another from a binary classifier
deep neural architecture that learns structure-aware deep spec- applying softmax on latent representations, to further improve
tral embedding via simultaneous minimization of a Laplacian the self-expression layer. Kheirandishfard et al. [24] implicitly
eigenvector-based loss and a self-attention-based structure trained DNN to impose a low-rank constraint on latent space
encoding loss. As the network gets trained, it iteratively learns and the self-expressive layer is then used to compute the
the local and global structures of the input data. The proposed affinity matrix. Valanarasu et al. [59] used over-complete and
algorithm is applied on six publicly available datasets includ- under-complete auto-encoder networks to get latent represen-
ing EYaleB [12], COIL-100 [44], MNIST [28], ORL [52], tations to input into a trainable self-expressive layer. Lv et
CIFAR-100 [27], and ImageNet-10 [7], and compared with al. [41] used weighted reconstruction loss and a learnable self-
51 SOTA methods including deep learning-based methods as expressive layer to compute spectral clustering pseudo labels
well as traditional spectral and subspace clustering techniques. to be compared with the predictions of a classification layer
The proposed algorithm has consistently shown improved on top of latent representations. These methods have reported
performance over the compared methods. The following are excellent results, however, learning self-expressive coefficients
the main contributions of the current work: matrix using a full dataset requires high memory complexity.
• We propose to learn non-linear spectral embedding by us- Also if new unseen data points are added, these methods
ing the supervision of eigenvectors of the graph Laplacian require computing a full self-expressive matrix. Therefore such
matrix. methods are neither scalable to larger datasets nor generaliz-
• The learned embedding is constrained to be structure- able to unseen data points.
aware by using a self-expression-based loss. For this Peng et al. [48] computed a prior self-expressive matrix
purpose, a self-attention-based structure encoding is ex- from input data and used it to train an auto-encoder for struc-
ploited. ture preservation in the latent space. This method preserves
• To reduce the complexity both the graph Laplacian and structure however, it requires high computational & memory
the self-expression matrices are computed batch by batch. complexity to compute self-expressive matrix and train net-
• The proposed deep embedding network is capable of find- work using a full training dataset. Shaham et al. [55] proposed
ing effective representations for unseen data points thus a DNN to embed input data points into the eigenspace of its as-
enabling better generalization than the existing methods. sociated graph Laplacian matrix. Despite using deep learning,
The rest of the paper is organized as follows: Section orthogonality in latent space is ensured via QR decomposition
II contains related work, Section III presents the proposed instead of using the learning-based framework. Also, input
methodology, and experiments are given in Section IV. Con- data structure preservation is not explicitly enforced in the
clusions and future directions follow in Section V. latent space.
In contrast to the existing algorithms, we propose to train
our network using local neighborhood information using a
II. R ELATED W ORK
fully connected graph in addition to the global structure
Due to numerous applications of subspace clustering in of the data captured by the self-expressive matrix and the
computer vision and related fields, several researchers have reconstruction loss. We use batch-wise training of a fully
aimed to improve it in various dimensions such as reducing connected network to produce a subspace-preserving self-
time complexity [17, 31], memory complexity [34, 56], learn- expressive matrix at input space enabling scalability to larger
ing self-expressive coefficients matrix to preserve the linear datasets and generalization to unseen data points.
structure of data [20], while some others have intended to
generalize it to unseen data [22, 55], and some others have
III. P ROPOSED S TRUCTURE P RESERVING D EEP S PECTRAL
tried to make it scalable [2, 8, 22]. Chen et al. [8] proposed
C LUSTERING
random dropout in a self-expressive model to deal with the
over-segmentation issues in traditional SSC algorithms. They In order to implement the proposed solution, we directly
also used a consensus algorithm to produce a scalable solution train an end-to-end network in a batch-by-batch fashion that
3

learns structure-aware spectral representations of data points. be scalable to significantly larger datasets without incurring
Once this network is trained, new unseen data points can computational or memory costs.
be input into the network, and the corresponding embedding
is computed. Since the network is supervised by spectral B. Structure Aware Spectral Embedding
clustering-based loss, its brief overview is given below.
For larger datasets having millions of data points, the size
of the graph and the corresponding Laplacian matrix grows by
A. Batch-Based Spectral Clustering O(p2 ), where p are the total data points. Eigendecomposition
Traditionally spectral clustering has been performed on all on such large matrices incurs a high computational cost. There-
data simultaneously however, we propose spectral clustering fore, traditional spectral clustering lacks scalability to larger
to be performed on mini-batches and to combine the results datasets [38]. Similarly, applications with online data arrival
using our proposed deep neural network. Compared to the require computing an embedding in the low dimensional space
existing methods such an approach would significantly reduce without going through the complete process which is also not
the computational and memory complexity. For this purpose, possible with most of the existing methods. Our proposed
we divide a given dataset into a large number of random solution addresses both of these issues and also incorporates
batches, each having m data points. Let X= {xi }m i=1 ∈ R
n×m the structure information within the spectral embedding. Fig. 1
n shows all the important steps in our proposed Structure Aware
be a batch data matrix such that xi ∈ R be a data point
spanning n dimensional non-linear manifold. The data batch Deep Spectral Embedding (SADSE) algorithm.
is mapped to a batch graph Gb having adjacency matrix Ab A simple strategy to simulate spectral clustering with an
∈ Rm×m computed as follows: auto-encoder is to use the spectral embedding as a direct
 2  supervisory signal and train the network to minimize the error:
d
(
exp − 2σi,j2 if i 6= j
Ab (i, j) = (1) min||Uk> − Zb ||2F , (3)
0 if i == j θe

where di,j is some distance measure between data points xi where Zb = {zi }m i=1 ∈ R
k×m
is a k-dimensional latent
and xj within the same batch, and in our work we consider network embedding such that k << n, and θe are the
di,j = ||xi − xj ||2 . The graph Gb contains an edge between encoder network parameters. We empirically observe that
all nodes (mi , mj ) representing the local neighborhood, and instead of training only an encoder network, simultaneously
parameter σ controls the width of the neighbourhood [63]. training a pair of encoder and decoder networks provides
Laplacian matrix for the graph Gb is computed as Lb = Db − better initialization and improves accuracy. The encoder and
Ab , and Db ∈ Rm×m is a batch based degree matrix defined decoder pair referred to as Auto-Encoder is used to project
as: high dimensional data xi ∈ Rn to low dimensional latent
space zi ∈ Rk and then back to the original space. The
(
Σm
j=1 Ab (i, j) if i == j
Db (i, j) = (2) encoder may be considered as a nonlinear projector from a
0 Otherwise.
higher dimensional to a low dimensional space. Auto-Encoders
The Laplacian matrix Lb is semi-positive definite with at least are often trained to minimize the reconstruction error over a
one zero eigenvalue for a fully connected graph. Eigenvalue training dataset Xb :
decomposition of Lb is given by: Lb = Ub Λb Ub> , where m
X
Ub = {ui }m i=1 is a matrix of eigenvectors of Lb , such that LR := min ||xi − x
bi ||2 , (4)
θe ,θd
ui ∈ Rm are arranged in the decreasing order of eigenvalues: i=1
vm ≥ vm−1 ≥ · · ·v2 ≥ v1 , and Λb is a matrix having these where x bi is the back-projected output of the decoder and θd
eigenvalues on the diagonal. For dividing the graph into k are the parameters of the decoder network. Reconstruction loss
partitions, only k eigenvectors corresponding to the minimum aims to preserve the locality of input data space. We propose to
non-zero k eigenvalues are considered. If in a particular batch, train the deep auto-encoder such that the latent space Zb be in
the actual number of clusters is less than k, even then k some sense close to the low dimensional spectral embedding.
eigenvectors are selected. Since the number of clusters is not In the current work, instead of (3), we minimize both `1 and
directly used in the loss function, a varying number of clusters `2 losses between the latent space and spectral embedding as
across batches has no effect on the training process. given below:
Let Uk = {ui }ki=1 ∈ Rm×k be the matrix of these eigenvec- m
tors. The columns of Uk> represent an embedding of original LS := min
X
bi ||2 +λ1 ||zi − Uk> (i)||1 +
(||xi − x
data in a k dimensional space such that the embedding space θe ,θd
i=1
is linear and therefore any linear clustering algorithm, such
λ2 ||zi − Uk> (i)||2 ) (5)
as K-means, will be able to reveal the groups in the original
data. Thus spectral clustering can be considered a projection where Uk> (i) shows a column of Uk> consisting of the corre-
of data from high-dimensional nonlinear manifolds to low- sponding coefficients from the set of selected k eigenvectors,
dimensional linear subspaces. Since we perform spectral clus- and λ1 , and λ2 are hyperparameters assigning relative weights
tering batch by batch, to combine all batches to get a unified to different loss terms, ||.||1 shows `1 , and ||.||2 is `2 norm. The
solution we propose to train a deep neural network to simulate `1 loss ensures the error is sparse while the `2 loss minimizes
spectral embedding. Such a spectral clustering method would the distance between the latent space representation zi and
4

Feature 𝒁𝒃
map
||𝑋𝑏 − 𝑋෠𝑏 ||2𝐹

𝑿𝒃

||𝑧𝑖 − 𝑈𝐾𝑇 (𝑖)||1 Linear


+ Clustering
||𝑧𝑖 − 𝑈𝐾𝑇 (𝑖)||2

𝑮𝒃 …. 𝑳𝒃 … 𝐔𝐊
+
||𝑍𝑏𝑇 𝑧𝑖 − 𝐼𝑏 (𝑖)||2𝐹
+
||𝑧𝑖 − 𝑍𝑏 𝑆𝑏 (𝑖)||2𝐹
𝐐𝐍 𝐱ത 𝐢
𝑺𝒃
𝐱𝐢
||xi − 𝑋𝑖 𝐻𝑏 ||2 + 𝐻𝑏 1
+ | 𝐻𝑏 |2 xത iT തxj = 𝐻𝑏 KNN
𝐊𝐍
𝐱𝐣 𝐱ത 𝐣

Fig. 1: The proposed Structure Aware Deep Spectral Embedding (SADSE) network is trained using self-expressiveness and
spectral supervision. Batch data matrix Xb is input and the latent space matrix Zb is constrained to minimize spectral loss
and structural losses. The graph Gb is computed batch-wise and is used to compute the Laplacian matrix Lb and eigenvector
matrix Ub . The latent space of the network is k-dimensional therefore k smallest eigenvectors are selected in Uk . QN and KN
networks are used to compute Sb .

the spectral embedding Uk> (i) in the least squares sense. Due the following loss:
to the convexity enforced by elastic-net regularization, the m
optimization of the proposed loss is more stable and robust
X
LH := min bi ||2 +λ1 ||zi − Uk> (i)||1 +λ2 ||zi −
(||xi − x
in the presence of noise. θe ,θd
i=1
Orthogonality on the rows of the matrix Zb may also Uk (i)||2 +λ3 ||Zb> zi
>
− Ib (i)||2F +λ4 ||zi − Zb Sb (i)||2F ) (7)
be enforced during network training. Since each row of Zb
where Sb (i) is a column of Sb , which is self-expressive
corresponds to an eigenvector of Lb , therefore each row of
representation of data point xi in the input space.
Zb is normalized to the unit norm and constrained to be
orthogonal to other rows:
C. Attention-Based Self-expressive Matrix Learning
Inspired by the self-attention model in transformer net-
m
X works, a batch-based self-expressive matrix Hb is learned
LO := min bi ||2 +λ1 ||zi − Uk> (i)||1 +
(||xi − x using two fully connected learnable networks QN and KN
θe ,θd
i=1 [61, 82]. Given a query data point xi which needs to be
λ2 ||zi − Uk> (i)||2 +λ3 ||Zb> zi − Ib (i)||2F ) (6) synthesized using remaining key data points xj in that batch,
where j 6= i, we forward xi through QN : x̄i = QN (xi ) ∈ Rt
and all xj through KN : x̄j = KN (xj ) ∈ Rt . Attention score
The spectral embedding does not ensure the input space between x̄i and x̄j is used to get self-expressive coefficients:
structure is preserved in the latent space. To this end, we Hb (i, j) = x̄>
i x̄j . To learn the parameters of QN and KN , the
propose a self-expression matrix-based loss function to be following objective function is minimized:
simultaneously minimized with the spectral embedding-based min γ||xi −Xi Hb (i)||22 +β||Hb (i)||1 +(1−β)||Hb (i)||2 , (8)
θQ ,θK
loss. In manifold learning, it has been observed that manifold
properties may be invariant to some projection spaces [51]. where Xi = [x1 , x2 , · · · xi−1 , 0, xi+1 , · · · , xm ] contains the
We aim to find a spectral projection preserving input data batch data except xi , Hb (i) is the i-th column of Hb , γ > 0
structure. For this purpose, a structure-preserving loss simul- and 1 ≥ β ≥ 0. In Eq. (8) elastic-net regularizer [77] is
taneously along with spectral embedding loss is minimized. used to avoid over-segmentation in Hb . Once Hb is learned,
A pre-computed batch-based self-expressive matrix Sb in the a sparse binary coefficient matrix Sb is computed using KNN
input space is utilized for this purpose. The spectral embedding algorithm. For each query xi only the coefficients correspond-
is forced to preserve the input data structure by minimizing ing to the few nearest neighbors are retained as 1.00, while
5

EYALEB MNIST
the remaining coefficients are suppressed to 0.00. Thus Sb Methods Acc. NMI Acc. NMI
is made sparse enabling only a few nearest neighbors of xi S5 C [43] 60.70 - 59.60 -
to contribute. Our choice of batch-wise training of QN and SSCOMP [78] 77.59 83.25 - -
SC-LALRG [75] 79.66 84.52 78.20 76.01
KN , for the computation of Hb , is scalable to larger datasets KCRSC [67] 81.40 88.10 64.70 64.30
and also computationally efficient, and memory requirement S3 COMP-C [8] 87.41 86.32 96.32 -
is reduced compared to full-scale implementations. FTRR [42] - - 70.70 66.72
PSSCl [41] - - 78.50 72.76
PSSC [41] - - 84.30 76.76
IV. E XPERIMENTAL E VALUATIONS DCFSC [54] 93.87 - - -
Struct-AE [48] 94.70 - 65.70 68.98
We extensively evaluated the proposed structure-aware DEC [71] - - 84.30 -
deep spectral embedding (SADSE) algorithm on six pub- IDEC [15] - - 88.06 86.72
SR-SSC [2] - - 91.09 93.06
licly available datasets including EYaleB [12], Coil-100 [44], EDESC [5] - - 91.30 86.20
MNIST [28], ORL [52], CIFAR-100 [27], and ImageNet- EnSC-ORGEN [77] - - 93.79 -
10 [7], and compare with fifty one existing state-of-the-art NCSC [83] - - 94.09 86.12
DSC-Net-L1 [20] 96.67 - - -
approaches including EDESC [5], PSSC [41], SENet [82], ACC_CN [32] 97.31 99.34 78.60 74.21
NCSC [83], DCFSC [54], S3 COMP-C [8], ODSC [59], SR- DSC-Net-L2 [20] 97.33 - - -
SSC [2], SSCOMP [78], EnSC-ORGEN [77], DSC [20], DLRSC [24] 97.53 - - -
RGRL-L2 [23] 97.53 96.61 81.40 75.52
DEPICT [14], Struct-AE [48], DASC [84], S2 Conv-SCN [81], ODSC [59] 97.78 - 81.20 -
MLRDSC [25], MLRDSC-DA [1], S5 C [43], SC-LALRG MESC-Net[49] 98.03 97.27 81.11 82.26
[75], KCRSC [67], SpecNet [55], ACC_CN [32], DLRSC Cluster-GAN [13] - - 96.40 92.10
DEPICT [14] - - 96.50 91.70
[24], RGRL-L2 [23], MESC-Net[49], RED-SC [74], RCFE SENet [82] - - 96.80 91.80
[36], FTRR [42], Cluster-GAN [13], SSRSC [72], S2 ESC [85], SpecNet [55] - - 97.10 92.40
DSC-DL[19], DAE [62], IDEC [15], DCGAN [50], DeCNN S2 Conv-SCN-L2 [81] 98.44 - - -
[80], VAE [26], ADC [16], AE [3], DEC [71], DAC [7], IIC S2 Conv-SCN-L1 [81] 98.48 - - -
RED-SC [74] 98.52 - 74.34 73.16
[21], DCCM [68], PICA [19], CC [33], SPICE [45], JULE DASC [84] 98.56 98.01 80.40 78.00
[73], DDC [6], SCAN [60], PCL [30], and TCL [35]. MLRDSC [25] 98.64 - - -
For all datasets, we experimented with two settings in- DSC-DL[19] 98.90 97.40 81.20 76.10
MLRDSC-DA [1] 99.18 - - -
cluding full data to be used as train and test, SADSEF , and SADSEF 99.95 99.95 97.35 92.81
using an unseen 20% test data, referred to as SADSET . The
proposed approach is compared with SOTA using the measures TABLE I: Comparison of the proposed SADSEF algorithm
used by the original authors including classification accuracy with existing SOTA on the EYaleB and MNIST datasets using
(Acc.) and normalized mutual information (NMI). A detailed full data as train and test.
ablation study is also performed to show the effectiveness of
each proposed component.

A. Experimental Settings
In all of our experiments, we used a four-layer encoder-
decoder network with ‘tanh’ activation function. First of all,
the full network is trained for 100 epochs using only the re-
construction loss then the encoder network is trained using the
remaining proposed losses with Adadalta [79] optimizer and a (a) DLRSC (b) DSC (c) SADSEF
learning rate of 1e−3 . For all experiments, the encoder network Fig. 2: Visual cluster compactness comparison of the proposed
is further trained for 1000 epochs. The hyperparameters in Eq. SADSEF algorithm with SOTA methods using t-SNE on
(7) are empirically set to λ1 = λ3 =0.002, λ2 = λ4 = 0.02 in all EYaleB dataset
experiments. Each of QN and KN is an FC network with three
layers beyond the input layer of size {1024,1024,1024} and
is trained batch by batch. KNN algorithm is then used with
3-nearest neighbors for all datasets. Adam optimizer with a into 1946/486 train/test splits in SADSET . Deep features are
learning rate of 1e−3 , β = 0.9, and γ = 200 are used for extracted from the second last layer of Densenet-201 and
the training of QN and KN in all experiments. As a linear then PCA is used to reduce dimensions to 784. A batch
clustering method, K-means is employed. size of 486 is used. The proposed SADSEF algorithm has
obtained the best accuracy of 99.95%, outperforming the
compared methods as shown in Table I. Fig. 2 shows the visual
B. Evaluations on Different Datasets comparison of the proposed SADSE algorithm with compared
EYaleB dataset contains 64 images having size 192×168 methods on this dataset. All 2432 images are plotted using the
of each of the 38 subjects under 9 different illumination con- t-SNE algorithm by assigning each cluster a different color.
ditions [29]. Following the other SOTA methods, we consider The clusters obtained by the proposed SADSE algorithm are
only 2432 frontal face images which are then randomly split more compact than the compared methods.
6

Methods Acc. NMI EYaleB Coil-100


100 90
S5 C [43] 54.10 -
DSC-Net-L1[20] 66.38 - 90 80
DSC-Net-L2[20] 69.04 -
EnSC-ORGEN [77] 69.24 - 80 70

Accuracy

Accuracy
DLRSC [24] 71.86 -
70 60
MESC-Net[49] 71.88 90.76 SSCOMP [epoch 0-20] MESC-Net [epoch 4880-5000]
S3COMPC [epoch 0-20] S3COMPC [epoch 0-20]
S2 Conv-SCN-L2 [81] 72.17 - 60 DLRSC [epoch 4880-5000] 50 DLRSC [epoch 4880-5000]
SADSEF [epoch 210-230] SADSEF [epoch 210-230]
DCFSC [54] 72.70 - DSC [epoch 980-1000] DSC [epoch 380-400]
S2 Conv-SCN-L1 [81] 73.33 - 50
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
40
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
MLRDSC [25] 76.72 - Number of epochs Number of epochs
S3 COMP-C [8] 78.89 - Fig. 4: Stability comparison of SADSEF with different SOTA
MLRDSC-DA [1] 79.33 -
RCFE [36] 79.63 96.23 methods during training on EyaleB and Coil-100.
SADSEF 84.95 93.91

TABLE II: Comparison of the proposed SADSEF algorithm Methods Acc. NMI
with existing SOTA methods on Coil-100 dataset. KCRSC [67] 72.30 86.30
SSRSC [72] 78.25 -
DCFSC [54] 85.20 -
PSSCl [41] 85.25 92.58
DSC-Net-L1[20] 85.75 -
DSC-Net-L2[20] 86.00 -
RED-SC [74] 86.13 91.16
PSSC [41] 86.75 93.49
DASC [84] 88.25 93.15
S2 Conv-SCN-L2 [81] 88.75 -
MLRDSC [25] 88.75 -
S2 ESC [85] 89.00 93.52
S2 Conv-SCN-L1 [81] 89.50 -
(a) DLRSC (b) DSC (c) SADSEF SADSEF 90.75 94.66
Fig. 3: Visual cluster compactness comparison of the proposed
SADSEF with SOTA methods using t-SNE on Coil-100 TABLE III: Comparison of the proposed SADSEF algorithm
dataset. with existing SOTA methods on ORL dataset.

Coil-100 dataset has 7200 gray-scale images having size The dataset exhibits face images with open or closed eyes,
128×128 of 100 different objects taken at pose intervals wearing glasses or not, and having a smile or not. A random
of 5 degrees. Deep features are extracted from the second train/test split of 320/80 is used in SADSET . Deep features
last layer of DenseNet-201 and then PCA is used to reduce are extracted from the second last layer of Densenet-121 and
dimensionality to 3000. We used a batch size of 720, and a ran- PCA is used to reduce dimensionality to 400 with a batch size
dom train/test split of 5760/1440 in SADSET . The proposed of 400. The proposed SADSEF has obtained an accuracy of
SADSEF has obtained an accuracy of 84.95% outperforming 90.75% outperforming SOTA methods as shown in Table III.
SOTA methods as shown in Table II. Fig. 3 shows a visual CIFAR-100 dataset contains 60000 different object images
comparison of cluster compactness for different algorithms. of size 32 × 32 of 100 different subjects, categorized into 20
Training stability of SADSE algorithm: In Fig. 4, we super-classes which are considered as ground-truth [27]. A
compare the training performance stability of different SOTA random train/test split of 50000/10000 is used in SADSET .
methods on EYaleB and Coil-100 datasets, respectively. The Deep features of dimension 512 are extracted from the second
existing compared methods were trained using a single batch last layer of the baseline algorithm which is Contrastive
over the full dataset in each epoch while SADSE is trained Clustering (CC) [33] for this dataset. A batch size of 1000
using 5 batches for EYaleB and 10 batches for Coil-100 in is used in these experiments. The proposed SADSEF has
each epoch. Despite batch-based training, the proposed method
does not fluctuate much compared to the SOTA methods. The Datasets Methods Acc. NMI F1 score Precision
stability of SADSE demonstrates better convergence as the EyaleB
SADSEF 99.95 99.95 99.95 99.95
model gets trained. SADSET 99.95 99.95 99.95 99.95
MNIST dataset consists of 70000 grayscale images of size SADSEF 84.95 93.91 84.95 84.57
Coil-100
SADSET 86.04 94.17 86.04 86.26
28×28 with 10 different classes [28]. For this dataset, we
SADSEF 97.35 92.81 97.35 97.35
computed scattered convolutional features using [4] as a pre- MNIST
SADSET 97.43 93.23 97.43 97.43
processing step, as used by other SOTA [8]. PCA is then SADSEF 90.75 94.66 90.75 92.61
ORL
used to reduce dimensionality to 2000 and a batch size of 500 SADSET 87.50 95.91 87.50 88.75
is used. The standard 60000/10000 train/test split is used in CIFAR-100
SADSEF 47.75 45.77 47.75 47.86
SADSET 47.79 46.47 47.79 52.30
SADSET . The proposed SADSEF has obtained an accuracy
SADSEF 91.69 87.53 91.69 91.78
of 97.35% compared to SOTA methods as shown in Table I. ImageNet-10
SADSET 90.43 87.33 90.43 90.55
ORL dataset contains 400 face images of size 112 × 92 of TABLE IV: Performance comparison of the proposed
40 different subjects, where each subject presents 10 images of SADSET algorithm over unseen test split and full dataset used
different facial expressions under varying light conditions [52]. for both training and testing (SADSEF ).
7

100 2000 4.00


SADSEF SADSEF SADSEF
SADSET 1750 SADSET 3.75 SADSET
98
1500 3.50
1250 3.25

Memory (GB)
96

Time (min.)
Accuracy

1000 3.00
94 2.75
750
2.50
92 500
2.25
250
90 2.00
10K 20K 30K 40K 50K 60K 70K 10K 20K 30K 40K 50K 60K 70K 10K 20K 30K 40K 50K 60K 70K
Training Data points Training Data points Training Data points

(a) (b) (c)


Fig. 5: Scalability of the SADSE algorithm on the MNIST dataset is evaluated by observing the variation of accuracy, execution
time, and memory consumption, by increasing dataset size. This experiment shows SADSE algorithm is scalable to larger
datasets.

CIFAR-100 ImageNet-10 Datasets DSC Struct-AE DLRSC SADSEF


Methods Acc. NMI Acc. NMI Time Mem Time Mem Time Mem Time Mem
EYaleB 201.0 32.1 175.2 10.66 570.4 0.62 107.1 2.18
DAE [62] 15.1 11.1 30.4 20.6 Coil-100 1653.8 50.6 - - 3307.6 3.25 458.5 3.58
DCGAN [50] 15.1 12.0 34.6 22.5
DeCNN [80] 13.3 9.2 31.3 18.6 TABLE VI: Time (min.) and memory (GB) comparison of
JULE [73] 13.7 10.3 30.0 17.5 different SOTA methods with SADSE.
VAE [26] 15.2 10.8 33.4 19.3
ADC [16] 16.0 - - -
AE [3] 16.5 10.00 31.7 21.0
DEC [71] 18.5 13.6 38.1 28.2
DAC [7] 23.8 18.5 52.7 39.4
test data. Overall, we used four measures to demonstrate the
IIC [21] 25.7 - - - generalization of the proposed SADSE algorithm to unseen
DCCM [68] 32.7 28.5 71.0 60.8 test data points as shown in Table IV. In the ORL dataset, the
PICA [19] 33.7 31.0 87.0 80.2
CC [33] 42.9 43.1 89.3 85.9
test/train split is 80/320. Due to the very small training data
SPICE [45] 46.8 44.8 (96.9) (92.7) size, SADSET performance is less than SADSEF . While for
SCAN [60] 48.3 48.5 - - the other datasets, its performance is the same or better than
PCL [30] 52.6 52.8 90.7 84.1
TCL [35] 53.1 52.9 89.5 87.5
the full training dataset.
SADSEF 47.75 45.77 91.69 87.53
TABLE V: Comparison of the proposed SADSEF with exist- D. Scalabilty to Varying Training Data Size
ing SOTA methods on CIFAR-100 and ImageNet-10 datasets.
"()" denotes that pre-training was done using labeled data. Due to batch-based training, both SADSEF and SADSET
are scalable to larger datasets. To demonstrate this, we
have used an increase of 10000 data points starting from
10000/60000 train/test splits on MNIST dataset. We observed
obtained an accuracy of 47.75% which is 4.85% better than
an increase in the accuracy of SADSEF as training data points
the baseline CC. It is also better than SPICE, PICA, IIC, and
are increased, and when testing the same models for SADSET
many other methods as shown in Table V. Some methods such
the accuracy on unseen test data splits also increases accord-
as TCL, PCL, SCAN have obtained even better performance
ingly (Fig. 5a). We observed an increase in the execution time
which may be attributed to careful fine-tuning after contrastive
of the SADSEF as the training dataset becomes larger, and
clustering.
also for SADSET when tested on the unseen same number of
ImageNet-10 dataset [7] contains 13000 different ob-
data points (Fig. 5b). The memory consumption of SADSEF
ject images of size 224 × 224 of 10 objects chosen from
increases linearly with an increasing dataset as the model gets
ILSVRC2012 1K [10]. A random train/test split of 10000/3000
trained, and also for SADSET when tested on the same number
is used in SADSET while SADSEF is trained/tested on the full
of unseen data points (Fig. 5c). The scalability of our proposed
dataset. The same deep features and batch size are employed
method is exhibited when the accuracy of SADSET reaches
as in the CIFAR-10 dataset. The proposed SADSEF has
95% even when the training data set is only 10000 and all
obtained an accuracy of 91.69% outperforming SOTA methods
remaining data is used for testing. In Table VI we have also
as shown in Table V.
compared execution time (minutes) and memory consumption
(GB) of SADSEF with SOTA methods. Due to batch-based
C. Generalization to Unseen Data sets
training our proposed method, has consumed fewer memory
We evaluate our proposed algorithm in train/test splits as resources and smaller execution time.
SADSET , and the full dataset used for both training and testing
as SADSEF . During testing, we don’t need to repeat any
training step and SADSET has achieved better test accuracy E. Ablation Study
on all datasets. Many existing methods such as Struct-AE, To evaluate the contribution of each loss term in SADSE
DSC, and S3 COMP-C have not reported results on unseen algorithm, we have performed a detailed ablation study. We
8

Data sets Eyaleb Coil-100 MNIST ORL


LR + LS 99.95 83.59 95.61 88.75 aware, a self-expression matrix is batch-wise computed on the
Lo 99.95 84.75 96.58 89.00 input data. To this end, a self-attention-based global structure
LH 99.95 84.95 97.35 90.75
encoding technique is proposed using deep neural networks.
TABLE VII: Ablation study of the proposed SADSEF al- The learned self-expression matrix is made sparse by using
gorithm in terms of classification accuracy. The addition of a nearest-neighbor-based approach. The SADSE algorithm is
each loss term has caused an increase in the accuracy of the made scalable to larger datasets by applying loss terms in
SADSEF algorithm. a batch-wise fashion. Our trained network can also estimate
Datasets Sb Hs spectral representation for unseen data points coming from
EyaleB 99.95 99.95
MNIST 97.35 97.01 distributions similar to the training data. Experiments are
Coil-100 84.95 82.26 performed on four publicly available datasets and compared
ORL 90.75 89.25
CIFAR-100 47.75 46.70 with existing SOTA methods. Our experiments demonstrate
ImageNet-10 91.69 92.25 the excellent performance of the proposed SADSE algorithm
TABLE VIII: Accuracy comparison using Sb and Hs on compared to the existing methods.
different datasets.
R EFERENCES
[1] Mahdi Abavisani, Alireza Naghizadeh, Dimitris Metaxas,
used different loss terms’ combinations to train our proposed
and Vishal Patel. Deep subspace clustering with data
model and report the accuracy of SADSEF in Table VII.
augmentation. Adv. in Neural Info. Processing Systems,
We observe each loss term has contributed an increase in the
33, 2020. 1, 5, 6
accuracy, and overall loss term LH provides the best accuracy. [2] Maryam Abdolali, Nicolas Gillis, and Mohammad Rah-
Ablation on learning of self-expressive matrix: Many mati. Scalable and robust sparse subspace clustering us-
existing methods compute self-expressive matrix Hs using ing randomized clustering and multilayer graphs. Signal
Lasso for sparse subspace clustering [11, 48]. We implemented Processing, 163:166–180, 2019. 2, 5
the same in a batch fashion to compute Hs as follows: [3] Yoshua Bengio, Pascal Lamblin, Dan Popovici, and Hugo
min||Xb − Xb Hs ||2F +λ||Hs ||1 s.t. Hs (i, i) = 0, (9) Larochelle. Greedy layer-wise training of deep networks.
Hs Adv. in Neural Info. Processing Systems, 19, 2006. 5, 7
and compared it with self-attention-based Sb . Table VIII shows [4] Joan Bruna and Stéphane Mallat. Invariant scatter-
accuracy results on all datasets when Sb in (7) is replaced ing convolution networks. IEEE transactions on pat-
by Hs . Hyper Parameters Tuning: In order to select the tern analysis and machine intelligence, 35(8):1872–1886,
best values of λi , i = {1, 2, 3, 4} we have performed exper- 2013. 6
iments on MNIST dataset, and accuracy was observed. We [5] Jinyu Cai, Jicong Fan, Wenzhong Guo, Shiping Wang,
observe that for λ1 =λ2 =λ3 =λ4 =1, the accuracy of SADSEF Yunhe Zhang, and Zhao Zhang. Efficient deep embedded
is 96.23%, and with λ1 =λ2 =λ3 =λ4 =0.02, the accuracy is subspace clustering. In Proceedings of the IEEE/CVF
96.05%. When we reduce λ1 and λ3 ten times smaller than the Conference on Computer Vision and Pattern Recognition,
others (λ1 =λ3 =0.002, λ2 =λ4 =0.02) the accuracy increases to pages 1–10, 2022. 5
97.35%. So for all datasets, we used the values of hyperparam- [6] Jianlong Chang, Yiwen Guo, Lingfeng Wang, Gaofeng
eters as λ1 =λ3 =0.002, and λ2 =λ4 =0.02. Though, further fine- Meng, Shiming Xiang, and Chunhong Pan. Deep
tuning may increase the accuracy of the proposed algorithm. discriminative clustering analysis. arXiv preprint
Performance variation is observed by varying k=1-6 in KNN arXiv:1905.01681, 2019. 5
on EyaleB dataset and the same performance is observed for [7] Jianlong Chang, Lingfeng Wang, Gaofeng Meng, Shim-
3-6 neighbors. Therefore, for all datasets, k = 3 is used though ing Xiang, and Chunhong Pan. Deep adaptive image
fine-tuning may have further improved the results. clustering. In Proceedings of the IEEE international
conference on computer vision, pages 5879–5887, 2017.
2, 5, 7
V. C ONCLUSION
[8] Ying Chen, Chun-Guang Li, and Chong You. Stochas-
A structure-aware deep spectral embedding (SADSE) algo- tic sparse subspace clustering. In Proceedings of the
rithm is proposed to learn the spectral representation of input IEEE/CVF Conference on Computer Vision and Pattern
data spanning non-linear manifolds. The proposed SADSE Recognition, pages 4155–4164, 2020. 1, 2, 5, 6
algorithm is based on deep neural networks which are trained [9] Madalina Ciortan and Matthieu Defrance. Optimiza-
by using direct supervision of spectral embedding while pre- tion algorithm for omic data subspace clustering. In
serving the input data structure. The trained network sim- The 12th International Conference on Computational
ulates a spectral clustering algorithm including the eigen- Systems-Biology and Bioinformatics, pages 69–89, 2021.
vector computation of the Laplacian matrix. Therefore the 1
learned representations need not be subjected once again [10] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li,
to the traditional spectral clustering as often done by the and Li Fei-Fei. Imagenet: A large-scale hierarchical
existing SOTA methods. Thus the learned representations are image database. In 2009 IEEE conference on computer
directly clustered using a linear clustering method such as k- vision and pattern recognition, pages 248–255. Ieee,
means. To make the learned spectral representation structure 2009. 7
9

[11] E. Elhamifar and R. Vidal. Sparse subspace clustering: Kamangar. Multi-level representation learning for deep
Algorithm, theory, and applications. IEEE Transactions subspace clustering. In Proceedings of the IEEE/CVF
on Pattern Analysis and Machine Intelligence, 2013. 1, Winter Conference on Applications of Computer Vision,
8 pages 2039–2048, 2020. 1, 5, 6
[12] Athinodoros S. Georghiades, Peter N. Belhumeur, and [26] Diederik P Kingma and Max Welling. Auto-encoding
David J. Kriegman. From few to many: Illumination variational bayes. arXiv preprint arXiv:1312.6114, 2013.
cone models for face recognition under variable lighting 5, 7
and pose. IEEE Trans on Pattern Analysis and Mach. [27] Alex Krizhevsky, Geoffrey Hinton, et al. Learning
Int., 23(6):643–660, 2001. 2, 5 multiple layers of features from tiny images. 2009. 2, 5,
[13] Kamran Ghasedi, Xiaoqian Wang, Cheng Deng, and 6
Heng Huang. Balanced self-paced learning for generative [28] Yann LeCun, Léon Bottou, Yoshua Bengio, Patrick
adversarial clustering network. In Proc. of the Confer- Haffner, et al. Gradient-based learning applied to
ence on Computer Vision and Pattern Recognition, pages document recognition. Proceedings of the IEEE,
4391–4400, 2019. 5 86(11):2278–2324, 1998. 2, 5, 6
[14] Kamran Ghasedi Dizaji, Amirhossein Herandi, Cheng [29] Kuang-Chih Lee, Jeffrey Ho, and David J Kriegman.
Deng, Weidong Cai, and Heng Huang. Deep cluster- Acquiring linear subspaces for face recognition under
ing via joint convolutional autoencoder embedding and variable lighting. IEEE Transactions on Pattern Analysis
relative entropy minimization. In Proceedings of the & Machine Intelligence, (5):684–698, 2005. 5
IEEE international conference on computer vision, pages [30] Junnan Li, Pan Zhou, Caiming Xiong, and Steven CH
5736–5745, 2017. 5 Hoi. Prototypical contrastive learning of unsupervised
[15] Xifeng Guo, Long Gao, Xinwang Liu, and Jianping Yin. representations. arXiv preprint arXiv:2005.04966, 2020.
Improved deep embedded clustering with local structure 5, 7
preservation. In IJCAI, pages 1753–1759, 2017. 5 [31] Mu Li, Xiao-Chen Lian, James T Kwok, and Bao-Liang
[16] Philip Haeusser, Johannes Plapp, Vladimir Golkov, Elie Lu. Time and space efficient spectral clustering via
Aljalbout, and Daniel Cremers. Associative deep clus- column sampling. In CVPR 2011, pages 2297–2304.
tering: Training a classification network with no labels. IEEE, 2011. 2
In Pattern Recognition: 40th German Conference, Oct. [32] Xuelong Li, Rui Zhang, Qi Wang, and Hongyuan Zhang.
9-12,, pages 18–32, 2019. 5, 7 Autoencoder constrained clustering with adaptive neigh-
[17] Li He, Nilanjan Ray, Yisheng Guan, and Hong Zhang. bors. IEEE transactions on neural networks and learning
Fast large-scale spectral clustering via explicit feature systems, 32(1):443–449, 2020. 5
mapping. IEEE transactions on cybernetics, 49(3):1058– [33] Yunfan Li, Peng Hu, Zitao Liu, Dezhong Peng,
1071, 2018. 2 Joey Tianyi Zhou, and Xi Peng. Contrastive clustering.
[18] R. Hu, L. Fan, and L. Liu. Co‚ Aesegmentation of In Proceedings of the AAAI Conference on Artificial
3d shapes via subspace clustering. Computer graphics Intelligence, volume 35, pages 8547–8555, 2021. 5, 6, 7
forum, 31(5):1703–1713, 2012. 1 [34] Yeqing Li, Junzhou Huang, and Wei Liu. Scalable se-
[19] J Huang, S Gong, and X Zhu. Deep semantic clustering quential spectral clustering. In Thirtieth AAAI conference
by partition confidence maximisation. In Pro. Conference on artificial intelligence, 2016. 2
on Computer Vision and Pattern Recognition, 2020. 5, 7 [35] Yunfan Li, Mouxing Yang, Dezhong Peng, Taihao Li,
[20] P. Ji, T. Zhang, H. Li, M. Salzmann, and I. Reid. Deep Jiantao Huang, and Xi Peng. Twin contrastive learning
subspace clustering networks. In Proc. NIPS, 2017. 1, for online clustering. International Journal of Computer
2, 5, 6 Vision, 130(9):2205–2221, 2022. 5, 7
[21] Xu Ji, Joao F Henriques, and Andrea Vedaldi. Invariant [36] Zhihui Li, Feiping Nie, Xiaojun Chang, Liqiang Nie,
information clustering for unsupervised image classifica- Huaxiang Zhang, and Yi Yang. Rank-constrained spectral
tion and segmentation. In Proceedings of the IEEE/CVF clustering with flexible embedding. IEEE transactions
International Conference on Computer Vision, pages on neural networks and learning systems, 29(12):6073–
9865–9874, 2019. 5, 7 6082, 2018. 5, 6
[22] Zhao Kang, Zhiping Lin, Xiaofeng Zhu, and Wenbo Xu. [37] G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, and Y. Ma. Robust
Structured graph learning for scalable subspace cluster- recovery of subspace structures by low-rank representa-
ing: From single view to multiview. IEEE Transactions tion. IEEE NLM, 35(1):171–184, 2013. 1
on Cybernetics, 2021. 2 [38] He J. Liu, W. and S. F. Chang. Large graph construction
[23] Zhao Kang, Xiao Lu, Jian Liang, Kun Bai, and Zenglin for scalable semi-supervised learning. In ICML-10, pages
Xu. Relation-guided representation learning. Neural 679–686, 2010. 3
Networks, 131:93–102, 2020. 5 [39] C. Y. Lu, H. Min, Z. Q. Zhao, L. Zhu, D. S. Huang, and
[24] Mohsen Kheirandishfard, Fariba Zohrizadeh, and Farhad S. Yan. Robust and efficient subspace segmentation via
Kamangar. Deep low-rank subspace clustering. In least squares regression. In Proc. ECCV, pages 347–360,
Proceedings of the IEEE/CVF Conference on Computer 2012. 1
Vision and Pattern Recognition Workshops, pages 864– [40] Feng J. Lin Z. Mei T. Lu, C. and S. Yan. Subspace
865, 2020. 2, 5, 6 clustering by block diagonal representation. IEEE trans-
[25] Mohsen Kheirandishfard, Fariba Zohrizadeh, and Farhad actions on pattern analysis and machine intelligence
10

(PAMI), 41(2):487–501, 2019. 1 622–631, 2016. 2


[41] Juncheng Lv, Zhao Kang, Xiao Lu, and Zenglin Xu. [57] Qianqian Shi, Bing Hu, Tao Zeng, and Chuanchao
Pseudo-supervised deep subspace clustering. IEEE Zhang. Multi-view subspace clustering analysis for
Transactions on Image Processing, 30:5252–5263, 2021. aggregating multiple heterogeneous omics data. Frontiers
1, 2, 5, 6 in genetics, page 744, 2019. 1
[42] Zhengrui Ma, Zhao Kang, Guangchun Luo, Ling Tian, [58] Alain B Tchagang, Fazel Famili, and Youlian Pan. Sub-
and Wenyu Chen. Towards clustering-friendly repre- space clustering of dna microarray data: Theory, evalua-
sentations: subspace clustering via graph filtering. In tion, and applications. International Journal of Compu-
Proceedings of the 28th ACM International Conference tational Models and Algorithms in Medicine (IJCMAM),
on Multimedia, pages 3081–3089, 2020. 5 4(2):1–52, 2014. 1
[43] Shin Matsushima and Maria Brbic. Selective sampling- [59] Jeya Maria Jose Valanarasu and Vishal M Patel. Over-
based scalable sparse subspace clustering. Advances complete deep subspace clustering networks. In Proceed-
in Neural Information Processing Systems, 32:12416– ings of the IEEE/CVF Winter Conference on Applications
12425, 2019. 5, 6 of Computer Vision, pages 746–755, 2021. 1, 2, 5
[44] SA Nene, SK Nayar, and H Murase. Columbia university [60] Wouter Van Gansbeke, Simon Vandenhende, Stamatios
image library (coil-20). Technical Report CUCS-005-96, Georgoulis, Marc Proesmans, and Luc Van Gool. Scan:
1996. 2, 5 Learning to classify images without labels. In Com-
[45] Chuang Niu, Hongming Shan, and Ge Wang. Spice: puter Vision–ECCV 2020: 16th European Conference,
Semantic pseudo-labeling for image clustering. IEEE Glasgow, UK, August 23–28, 2020, Proceedings, Part X,
Transactions on Image Processing, 31:7264–7278, 2022. pages 268–285. Springer, 2020. 5, 7
5, 7 [61] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob
[46] V. M. Patel, H. Van Nguyen, and R. Vidal. Latent space Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser,
sparse subspace clustering. In Proc. ICCV, pages 225– and Illia Polosukhin. Attention is all you need. Advances
232, 2013. 1 in neural information processing systems, 30, 2017. 4
[47] V. M. Patel and R. Vidal. Kernel sparse subspace [62] Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua
clustering. In Proc. ICIP, pages 2849–2853, 2014. 1 Bengio, Pierre-Antoine Manzagol, and Léon Bottou.
[48] X. Peng, J. Feng, S. Xiao, W. Y. Yau, J. T. Zhou, and S. Stacked denoising autoencoders: Learning useful rep-
Yang. Structured autoencoders for subspace clustering. resentations in a deep network with a local denoising
IEEE Transactions on Image Processing, 27(10):5076– criterion. Journal of machine learning research, 11(12),
5086, 2018. 2, 5, 8 2010. 5, 7
[49] Zhihao Peng, Yuheng Jia, Hui Liu, Junhui Hou, and [63] Ulrike Von Luxburg. A tutorial on spectral clustering.
Qingfu Zhang. Maximum entropy subspace clustering Statistics and computing, 17(4):395–416, 2007. 3
network. IEEE Transactions on Circuits and Systems for [64] Tim Wallace, Ali Sekmen, and Xiaofei Wang. Appli-
Video Technology, 2021. 5, 6 cation of subspace clustering in dna sequence analy-
[50] Alec Radford, Luke Metz, and Soumith Chintala. Un- sis. Journal of Computational Biology, 22(10):940–952,
supervised representation learning with deep convolu- 2015. 1
tional generative adversarial networks. arXiv preprint [65] Tong Wang, Junhua Wu, Zhenquan Zhang, Wen Zhou,
arXiv:1511.06434, 2015. 5, 7 Guang Chen, and Shasha Liu. Multi-scale graph attention
[51] Sam T Roweis and Lawrence K Saul. Nonlinear dimen- subspace clustering network. Neurocomputing, 459:302–
sionality reduction by locally linear embedding. science, 314, 2021. 1
290(5500):2323–2326, 2000. 4 [66] Tongxin Wang, Jie Zhang, and Kun Huang. Generalized
[52] Ferdinando S Samaria and Andy C Harter. Parameterisa- gene co-expression analysis via subspace clustering using
tion of a stochastic model for human face identification. low-rank representation. BMC bioinformatics, 20(7):17–
In Proceedings of 1994 IEEE workshop on applications 27, 2019. 1
of computer vision, pages 138–142. IEEE, 1994. 2, 5, 6 [67] Xiaobo Wang, Zhen Lei, Hailin Shi, Xiaojie Guo, Xi-
[53] Ali Sekmen and Akram Aldroubi. Subspace and motion angyu Zhu, and Stan Z Li. Co-referenced subspace
segmentation via local subspace estimation. In 2013 clustering. In 2018 IEEE International Conference on
IEEE Workshop on Robot Vision (WORV), pages 27–33. Multimedia and Expo (ICME), pages 1–6. IEEE, 2018.
IEEE, 2013. 1 5, 6
[54] Junghoon Seo, Jamyoung Koo, and Taegyun Jeon. Deep [68] Jianlong Wu, Keyu Long, Fei Wang, Chen Qian, Cheng
closed-form subspace clustering. In Proceedings of Li, Zhouchen Lin, and Hongbin Zha. Deep compre-
the IEEE International Conference on Computer Vision hensive correlation mining for image clustering. In
Workshops, pages 0–0, 2019. 1, 5, 6 Proceedings of the IEEE/CVF international conference
[55] Uri Shaham, Kelly Stanton, Henry Li, Boaz Nadler, on computer vision, pages 8150–8159, 2019. 5, 7
Ronen Basri, and Yuval Kluger. Spectralnet: Spectral [69] Zizhao Wu, Yunhai Wang, Ruyang Shou, Baoquan Chen,
clustering using deep neural networks. ICLR, 2018. 2, 5 and Xinguo Liu. Unsupervised co-segmentation of 3d
[56] Jie Shen, Ping Li, and Huan Xu. Online low-rank shapes via affinity aggregation spectral clustering. Com-
subspace clustering by basis dictionary pursuit. In puters & Graphics, 37(6):628–637, 2013. 1
International Conference on Machine Learning, pages [70] Guiyu Xia, Huaijiang Sun, Lei Feng, Guoqing Zhang,
11

and Yazhou Liu. Human motion segmentation via robust and Hongdong Li. Neural collaborative subspace cluster-
kernel sparse subspace clustering. IEEE Transactions on ing. In International Conference on Machine Learning,
Image Processing, 27(1):135–150, 2017. 1 pages 7384–7393. PMLR, 2019. 1, 2, 5
[71] Junyuan Xie, Ross Girshick, and Ali Farhadi. Unsu- [84] Pan Zhou, Yunqing Hou, and Jiashi Feng. Deep adver-
pervised deep embedding for clustering analysis. In sarial subspace clustering. In Proceedings of the IEEE
International conference on machine learning, pages Conference on Computer Vision and Pattern Recognition,
478–487, 2016. 5, 7 pages 1596–1604, 2018. 1, 5, 6
[72] Jun Xu, Mengyang Yu, Ling Shao, Wangmeng Zuo, [85] Wenjie Zhu, Bo Peng, and Chunchun Chen. Self-
Deyu Meng, Lei Zhang, and David Zhang. Scaled supervised embedding for subspace clustering. In Pro-
simplex representation for subspace clustering. IEEE ceedings of the 30th ACM International Conference on
Transactions on Cybernetics, 51(3):1493–1505, 2019. 5, Information & Knowledge Management, pages 3687–
6 3691, 2021. 5, 6
[73] Jianwei Yang, Devi Parikh, and Dhruv Batra. Joint
unsupervised learning of deep representations and image
clusters. In Proceedings of the IEEE conference on
computer vision and pattern recognition, pages 5147–
5156, 2016. 5, 7
[74] Shuai Yang, Wenqi Zhu, and Yuesheng Zhu. Residual
encoder-decoder network for deep subspace clustering.
In 2020 IEEE International Conference on Image Pro-
cessing (ICIP), pages 2895–2899. IEEE, 2020. 5, 6
[75] Ming Yin, Shengli Xie, Zongze Wu, Yun Zhang, and Hira Yaseen is a Ph.D. fellow in the Center for
Robot Vision, Department of Computer Science, In-
Junbin Gao. Subspace clustering via learning an adaptive formation Technology University, Lahore, Pakistan.
low-rank graph. IEEE Transactions on Image Processing, Previously she did her BS in Computer Engineering
27(8):3716–3728, 2018. 1, 5 from the University of Engineering and Technology
[76] Chong You, Chi Li, Daniel P Robinson, and René Lahore and her Master’s in Computer Science from
Comsats University Islamabad. Her research areas
Vidal. Scalable exemplar-based subspace clustering on include Computer Vision and Machine Learning.
class-imbalanced data. In Proceedings of the European During her Ph.D., she is working on unsupervised
Conference on Computer Vision (ECCV), pages 67–83, representation learning, data clustering, and object
classification.
2018. 2
[77] Chong You, Chun-Guang Li, Daniel P Robinson, and
René Vidal. Oracle based active set algorithm for
scalable elastic net subspace clustering. In Proceedings
of the IEEE conference on computer vision and pattern
recognition, pages 3928–3937, 2016. 2, 4, 5, 6
[78] Chong You, Daniel Robinson, and René Vidal. Scalable
sparse subspace clustering by orthogonal matching pur-
suit. In Proceedings of the IEEE conference on computer
vision and pattern recognition, pages 3918–3927, 2016. Arif Mahmood is a Professor in the Computer
5 Science Department, and Dean Faculty of Sciences
[79] Matthew D. Zeiler. Adadelta: An adaptive learning rate at the Information Technology University and also
Director of the Center for Robot Vision. His current
method. ArXiv, abs/1212.5701, 2012. 5 research directions in Computer Vision are person
[80] Matthew D Zeiler, Dilip Krishnan, Graham W Taylor, pose detection and segmentation, crowd counting
and Rob Fergus. Deconvolutional networks. In 2010 and flow detection, background-foreground model-
IEEE Computer Society Conference on computer vision ing in complex scenes, object detection, human-
object interaction detection, and abnormal events
and pattern recognition, pages 2528–2535. IEEE, 2010. detection. He is also actively working in diverse Ma-
5, 7 chine Learning applications including cancer grading
[81] Junjian Zhang, Chun-Guang Li, Chong You, Xianbiao and prognostication using histology images, predictive auto-scaling of services
hosted on the cloud and the fog infrastructures, and environmental monitoring
Qi, Honggang Zhang, Jun Guo, and Zhouchen Lin. Self- using remote sensing. He has also worked as a Research Assistant Professor
supervised convolutional subspace clustering network. In with the School of Mathematics and Statistics, University of Western Australia
Proceedings of the IEEE Conference on Computer Vision (UWA) where he worked on Complex Network Analysis. Before that, he
was a Research Assistant Professor at the School of Computer Science and
and Pattern Recognition, pages 5473–5482, 2019. 1, 2, Software Engineering, UWA, and performed research on face recognition,
5, 6 object classification, and action recognition.
[82] Shangzhi Zhang, Chong You, René Vidal, and Chun-
Guang Li. Learning a self-expressive network for
subspace clustering. In Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition,
pages 12393–12403, 2021. 1, 4, 5
[83] Tong Zhang, Pan Ji, Mehrtash Harandi, Wenbing Huang,

You might also like