Deep Clustering
Deep Clustering
Abstract—Spectral Embedding (SE) has often been used to networks for the computation of self-expression matrices such
map data points from non-linear manifolds to linear subspaces as DSC [20], PSSC [41], NCSC [83], S2 ConvSCN [81],
for the purpose of classification and clustering. Despite significant MLRDSC [25], MLRDSC-DA [1], DASC [84], SENet [82],
advantages, the subspace structure of data in the original space
is not preserved in the embedding space. To address this issue and ODSC [59]. In all of these algorithms computation of
subspace clustering has been proposed by replacing the SE graph self-expression matrices have been improved by using various
arXiv:2305.08215v1 [cs.CV] 14 May 2023
affinity with a self-expression matrix. It works well if the data constraints and this matrix is then utilized to construct the
lies in a union of linear subspaces however, the performance affinity matrix for spectral clustering. This arrangement works
may degrade in real-world applications where data often spans well if the data spans a union of linear subspaces however,
non-linear manifolds. To address this problem we propose a
novel structure-aware deep spectral embedding by combining the performance may degrade if the underlying space is a
a spectral embedding loss and a structure preservation loss. nonlinear manifold. Local neighborhood relationships are also
To this end, a deep neural network architecture is proposed ignored which were originally used for the construction of
that simultaneously encodes both types of information and aims a fully connected graph and thus the strength of spectral
to generate structure-aware spectral embedding. The subspace embedding is not fully utilized. In the current work, we
structure of the input data is encoded by using attention-based
self-expression learning. The proposed algorithm is evaluated on propose to integrate the strength of spectral embedding along
six publicly available real-world datasets. The results demon- with subspace structure preservation using the self-expression
strate the excellent clustering performance of the proposed matrix. For this purpose, we propose a deep neural network-
algorithm compared to the existing state-of-the-art methods. The based architecture that learns embedding by simultaneous
proposed algorithm has also exhibited better generalization to minimization of spectral embedding-based loss as well as
unseen data points and it is scalable to larger datasets without
requiring significant computational resources. ensuring self-expression property in the latent space.
The objective function of spectral clustering is to find
Index Terms—Unsupervised learning, Subspace clustering, an embedding of the data points by eigendecomposition of
Spectral clustering, Deep spectral embedding, Self-expression
learning. the laplacian matrix encoding pairwise similarities. That data
representation is then clustered to assign them to different
categories. Despite many advantages, the subspace structure
I. I NTRODUCTION of data in the original space is not preserved in the embedding
IGH-DIMENIONAL data often spans low-dimensional
H manifolds instead of being uniformly distributed across
the ambient space. Recovering these low-dimensional mani-
space. Structure preservation during data transformation aims
to keep a set of embeddings in a common subspace that shares
the same subspace in the ambient space. To address this,
folds reduces the computational cost, memory requirements, many subspace clustering algorithms have been proposed. The
and the effect of noise and thus improves the performance of assumption of data spanning linear subspaces made by these
learning, inference, and recognition tasks. Subspace clustering existing subspace clustering methods often gets violated in
refers to the problem of separating data according to their many real-world applications. The data may be corrupted by
underlying manifolds. Subspace clustering algorithms have a errors or because of missing trajectories, occlusion, shadows,
wide range of applications in computer vision such as image and specularities [11]. These algorithms compute the self-
clustering [1, 8, 11, 54], motion segmentation [53, 70], co- expression coefficient matrix either using `1 -norm, `2 -norm,
segmentation of 3D bodies [18, 69], DNA sequencing [58, 64], nuclear norm, or a combination of these norms to preserve
omics data clustering [9, 57], and gene expression [66]. data structure. However, local neighborhood information may
Structure of a data is often encoded by using a self-expression be lost in these methods because the graph structure and local
matrix which is based on the observation that a data point node connectivity are not included, resulting in sub-optimal
in a union of subspaces can be efficiently represented by clustering performance [65].
a combination of other data points in the same manifold. In the current work, we explicitly encode both the local
Over the years, many variants of the subspace clustering and the global input data structures. For encoding the local
have been proposed such as LSR [39], LRR [37], SSC [11], structure, the supervision of spectral embedding is used to train
LS3C [46], Kernel SSC [47], BDR [40], SC-LALRG [75], and a deep network preserving local neighborhood graph affinities
S3 COMP-C [8]. In these methods, the self-expression matrix of data points. For global data structure preservation, the
has been computed using conventional optimization tech- learned embedding is constrained to minimize self-expression
niques. Recently, some algorithms have also used deep neural loss. The proposed neural architecture is trained in a batch-
by-batch fashion by computing both spectral supervision and
H. Yaseen and A Mahmood are with the Department of Computer Science,
Information Technology University (ITU), 346-B, Ferozepur Road, Lahore, self-expression in batches. Compared to full data supervision
Pakistan. E-mails: [email protected], [email protected] as used by many existing techniques, our proposed approach
2
saves computational time & memory resources. over a set of small-scale problems using orthogonal matching
The existing self-expressiveness sparse representations may pursuit. You et al. [76] introduced scalability by dealing with
limit connectivity between data points belonging to the same class-imbalanced data using a greedy algorithm that selects an
subspace, which may not form a single connected compo- exemplar subset of data using sparse subspace clustering. With
nent [8]. To handle this issue `2 -norm based dense solution has the resurgence of deep learning in subspace clustering, many
been proposed however it requires the underlying subspaces deep network-based methods have also been proposed.
to be independent [77]. In the current work, we propose a Ji et al. [20] proposed a convolutional auto-encoder for
self-attention-based global structure encoding technique. For reconstruction and introduced an additional fully connected
this purpose, we use two multi-layer fully connected networks self-expressive layer between latent-space data points. Zhang
including a query net and a key net. These networks are et al. [81] further extended the concept of a self-expressive
learned by minimizing elastic net-constrained self-expression trainable layer by incorporating spectral clustering to compute
loss. The learned structure encoding matrix is made sparse pseudo labels which are then used to train a classification
by using a nearest-neighbor-based approach removing less layer using the latent space representation. Zhang et al. [83]
probable links in latent space representations. collaboratively used two affinity matrices, one from a trainable
Overall the proposed algorithm consists of an end-to-end self-expressive layer, and another from a binary classifier
deep neural architecture that learns structure-aware deep spec- applying softmax on latent representations, to further improve
tral embedding via simultaneous minimization of a Laplacian the self-expression layer. Kheirandishfard et al. [24] implicitly
eigenvector-based loss and a self-attention-based structure trained DNN to impose a low-rank constraint on latent space
encoding loss. As the network gets trained, it iteratively learns and the self-expressive layer is then used to compute the
the local and global structures of the input data. The proposed affinity matrix. Valanarasu et al. [59] used over-complete and
algorithm is applied on six publicly available datasets includ- under-complete auto-encoder networks to get latent represen-
ing EYaleB [12], COIL-100 [44], MNIST [28], ORL [52], tations to input into a trainable self-expressive layer. Lv et
CIFAR-100 [27], and ImageNet-10 [7], and compared with al. [41] used weighted reconstruction loss and a learnable self-
51 SOTA methods including deep learning-based methods as expressive layer to compute spectral clustering pseudo labels
well as traditional spectral and subspace clustering techniques. to be compared with the predictions of a classification layer
The proposed algorithm has consistently shown improved on top of latent representations. These methods have reported
performance over the compared methods. The following are excellent results, however, learning self-expressive coefficients
the main contributions of the current work: matrix using a full dataset requires high memory complexity.
• We propose to learn non-linear spectral embedding by us- Also if new unseen data points are added, these methods
ing the supervision of eigenvectors of the graph Laplacian require computing a full self-expressive matrix. Therefore such
matrix. methods are neither scalable to larger datasets nor generaliz-
• The learned embedding is constrained to be structure- able to unseen data points.
aware by using a self-expression-based loss. For this Peng et al. [48] computed a prior self-expressive matrix
purpose, a self-attention-based structure encoding is ex- from input data and used it to train an auto-encoder for struc-
ploited. ture preservation in the latent space. This method preserves
• To reduce the complexity both the graph Laplacian and structure however, it requires high computational & memory
the self-expression matrices are computed batch by batch. complexity to compute self-expressive matrix and train net-
• The proposed deep embedding network is capable of find- work using a full training dataset. Shaham et al. [55] proposed
ing effective representations for unseen data points thus a DNN to embed input data points into the eigenspace of its as-
enabling better generalization than the existing methods. sociated graph Laplacian matrix. Despite using deep learning,
The rest of the paper is organized as follows: Section orthogonality in latent space is ensured via QR decomposition
II contains related work, Section III presents the proposed instead of using the learning-based framework. Also, input
methodology, and experiments are given in Section IV. Con- data structure preservation is not explicitly enforced in the
clusions and future directions follow in Section V. latent space.
In contrast to the existing algorithms, we propose to train
our network using local neighborhood information using a
II. R ELATED W ORK
fully connected graph in addition to the global structure
Due to numerous applications of subspace clustering in of the data captured by the self-expressive matrix and the
computer vision and related fields, several researchers have reconstruction loss. We use batch-wise training of a fully
aimed to improve it in various dimensions such as reducing connected network to produce a subspace-preserving self-
time complexity [17, 31], memory complexity [34, 56], learn- expressive matrix at input space enabling scalability to larger
ing self-expressive coefficients matrix to preserve the linear datasets and generalization to unseen data points.
structure of data [20], while some others have intended to
generalize it to unseen data [22, 55], and some others have
III. P ROPOSED S TRUCTURE P RESERVING D EEP S PECTRAL
tried to make it scalable [2, 8, 22]. Chen et al. [8] proposed
C LUSTERING
random dropout in a self-expressive model to deal with the
over-segmentation issues in traditional SSC algorithms. They In order to implement the proposed solution, we directly
also used a consensus algorithm to produce a scalable solution train an end-to-end network in a batch-by-batch fashion that
3
learns structure-aware spectral representations of data points. be scalable to significantly larger datasets without incurring
Once this network is trained, new unseen data points can computational or memory costs.
be input into the network, and the corresponding embedding
is computed. Since the network is supervised by spectral B. Structure Aware Spectral Embedding
clustering-based loss, its brief overview is given below.
For larger datasets having millions of data points, the size
of the graph and the corresponding Laplacian matrix grows by
A. Batch-Based Spectral Clustering O(p2 ), where p are the total data points. Eigendecomposition
Traditionally spectral clustering has been performed on all on such large matrices incurs a high computational cost. There-
data simultaneously however, we propose spectral clustering fore, traditional spectral clustering lacks scalability to larger
to be performed on mini-batches and to combine the results datasets [38]. Similarly, applications with online data arrival
using our proposed deep neural network. Compared to the require computing an embedding in the low dimensional space
existing methods such an approach would significantly reduce without going through the complete process which is also not
the computational and memory complexity. For this purpose, possible with most of the existing methods. Our proposed
we divide a given dataset into a large number of random solution addresses both of these issues and also incorporates
batches, each having m data points. Let X= {xi }m i=1 ∈ R
n×m the structure information within the spectral embedding. Fig. 1
n shows all the important steps in our proposed Structure Aware
be a batch data matrix such that xi ∈ R be a data point
spanning n dimensional non-linear manifold. The data batch Deep Spectral Embedding (SADSE) algorithm.
is mapped to a batch graph Gb having adjacency matrix Ab A simple strategy to simulate spectral clustering with an
∈ Rm×m computed as follows: auto-encoder is to use the spectral embedding as a direct
2 supervisory signal and train the network to minimize the error:
d
(
exp − 2σi,j2 if i 6= j
Ab (i, j) = (1) min||Uk> − Zb ||2F , (3)
0 if i == j θe
where di,j is some distance measure between data points xi where Zb = {zi }m i=1 ∈ R
k×m
is a k-dimensional latent
and xj within the same batch, and in our work we consider network embedding such that k << n, and θe are the
di,j = ||xi − xj ||2 . The graph Gb contains an edge between encoder network parameters. We empirically observe that
all nodes (mi , mj ) representing the local neighborhood, and instead of training only an encoder network, simultaneously
parameter σ controls the width of the neighbourhood [63]. training a pair of encoder and decoder networks provides
Laplacian matrix for the graph Gb is computed as Lb = Db − better initialization and improves accuracy. The encoder and
Ab , and Db ∈ Rm×m is a batch based degree matrix defined decoder pair referred to as Auto-Encoder is used to project
as: high dimensional data xi ∈ Rn to low dimensional latent
space zi ∈ Rk and then back to the original space. The
(
Σm
j=1 Ab (i, j) if i == j
Db (i, j) = (2) encoder may be considered as a nonlinear projector from a
0 Otherwise.
higher dimensional to a low dimensional space. Auto-Encoders
The Laplacian matrix Lb is semi-positive definite with at least are often trained to minimize the reconstruction error over a
one zero eigenvalue for a fully connected graph. Eigenvalue training dataset Xb :
decomposition of Lb is given by: Lb = Ub Λb Ub> , where m
X
Ub = {ui }m i=1 is a matrix of eigenvectors of Lb , such that LR := min ||xi − x
bi ||2 , (4)
θe ,θd
ui ∈ Rm are arranged in the decreasing order of eigenvalues: i=1
vm ≥ vm−1 ≥ · · ·v2 ≥ v1 , and Λb is a matrix having these where x bi is the back-projected output of the decoder and θd
eigenvalues on the diagonal. For dividing the graph into k are the parameters of the decoder network. Reconstruction loss
partitions, only k eigenvectors corresponding to the minimum aims to preserve the locality of input data space. We propose to
non-zero k eigenvalues are considered. If in a particular batch, train the deep auto-encoder such that the latent space Zb be in
the actual number of clusters is less than k, even then k some sense close to the low dimensional spectral embedding.
eigenvectors are selected. Since the number of clusters is not In the current work, instead of (3), we minimize both `1 and
directly used in the loss function, a varying number of clusters `2 losses between the latent space and spectral embedding as
across batches has no effect on the training process. given below:
Let Uk = {ui }ki=1 ∈ Rm×k be the matrix of these eigenvec- m
tors. The columns of Uk> represent an embedding of original LS := min
X
bi ||2 +λ1 ||zi − Uk> (i)||1 +
(||xi − x
data in a k dimensional space such that the embedding space θe ,θd
i=1
is linear and therefore any linear clustering algorithm, such
λ2 ||zi − Uk> (i)||2 ) (5)
as K-means, will be able to reveal the groups in the original
data. Thus spectral clustering can be considered a projection where Uk> (i) shows a column of Uk> consisting of the corre-
of data from high-dimensional nonlinear manifolds to low- sponding coefficients from the set of selected k eigenvectors,
dimensional linear subspaces. Since we perform spectral clus- and λ1 , and λ2 are hyperparameters assigning relative weights
tering batch by batch, to combine all batches to get a unified to different loss terms, ||.||1 shows `1 , and ||.||2 is `2 norm. The
solution we propose to train a deep neural network to simulate `1 loss ensures the error is sparse while the `2 loss minimizes
spectral embedding. Such a spectral clustering method would the distance between the latent space representation zi and
4
Feature 𝒁𝒃
map
||𝑋𝑏 − 𝑋𝑏 ||2𝐹
𝑿𝒃
𝑮𝒃 …. 𝑳𝒃 … 𝐔𝐊
+
||𝑍𝑏𝑇 𝑧𝑖 − 𝐼𝑏 (𝑖)||2𝐹
+
||𝑧𝑖 − 𝑍𝑏 𝑆𝑏 (𝑖)||2𝐹
𝐐𝐍 𝐱ത 𝐢
𝑺𝒃
𝐱𝐢
||xi − 𝑋𝑖 𝐻𝑏 ||2 + 𝐻𝑏 1
+ | 𝐻𝑏 |2 xത iT തxj = 𝐻𝑏 KNN
𝐊𝐍
𝐱𝐣 𝐱ത 𝐣
Fig. 1: The proposed Structure Aware Deep Spectral Embedding (SADSE) network is trained using self-expressiveness and
spectral supervision. Batch data matrix Xb is input and the latent space matrix Zb is constrained to minimize spectral loss
and structural losses. The graph Gb is computed batch-wise and is used to compute the Laplacian matrix Lb and eigenvector
matrix Ub . The latent space of the network is k-dimensional therefore k smallest eigenvectors are selected in Uk . QN and KN
networks are used to compute Sb .
the spectral embedding Uk> (i) in the least squares sense. Due the following loss:
to the convexity enforced by elastic-net regularization, the m
optimization of the proposed loss is more stable and robust
X
LH := min bi ||2 +λ1 ||zi − Uk> (i)||1 +λ2 ||zi −
(||xi − x
in the presence of noise. θe ,θd
i=1
Orthogonality on the rows of the matrix Zb may also Uk (i)||2 +λ3 ||Zb> zi
>
− Ib (i)||2F +λ4 ||zi − Zb Sb (i)||2F ) (7)
be enforced during network training. Since each row of Zb
where Sb (i) is a column of Sb , which is self-expressive
corresponds to an eigenvector of Lb , therefore each row of
representation of data point xi in the input space.
Zb is normalized to the unit norm and constrained to be
orthogonal to other rows:
C. Attention-Based Self-expressive Matrix Learning
Inspired by the self-attention model in transformer net-
m
X works, a batch-based self-expressive matrix Hb is learned
LO := min bi ||2 +λ1 ||zi − Uk> (i)||1 +
(||xi − x using two fully connected learnable networks QN and KN
θe ,θd
i=1 [61, 82]. Given a query data point xi which needs to be
λ2 ||zi − Uk> (i)||2 +λ3 ||Zb> zi − Ib (i)||2F ) (6) synthesized using remaining key data points xj in that batch,
where j 6= i, we forward xi through QN : x̄i = QN (xi ) ∈ Rt
and all xj through KN : x̄j = KN (xj ) ∈ Rt . Attention score
The spectral embedding does not ensure the input space between x̄i and x̄j is used to get self-expressive coefficients:
structure is preserved in the latent space. To this end, we Hb (i, j) = x̄>
i x̄j . To learn the parameters of QN and KN , the
propose a self-expression matrix-based loss function to be following objective function is minimized:
simultaneously minimized with the spectral embedding-based min γ||xi −Xi Hb (i)||22 +β||Hb (i)||1 +(1−β)||Hb (i)||2 , (8)
θQ ,θK
loss. In manifold learning, it has been observed that manifold
properties may be invariant to some projection spaces [51]. where Xi = [x1 , x2 , · · · xi−1 , 0, xi+1 , · · · , xm ] contains the
We aim to find a spectral projection preserving input data batch data except xi , Hb (i) is the i-th column of Hb , γ > 0
structure. For this purpose, a structure-preserving loss simul- and 1 ≥ β ≥ 0. In Eq. (8) elastic-net regularizer [77] is
taneously along with spectral embedding loss is minimized. used to avoid over-segmentation in Hb . Once Hb is learned,
A pre-computed batch-based self-expressive matrix Sb in the a sparse binary coefficient matrix Sb is computed using KNN
input space is utilized for this purpose. The spectral embedding algorithm. For each query xi only the coefficients correspond-
is forced to preserve the input data structure by minimizing ing to the few nearest neighbors are retained as 1.00, while
5
EYALEB MNIST
the remaining coefficients are suppressed to 0.00. Thus Sb Methods Acc. NMI Acc. NMI
is made sparse enabling only a few nearest neighbors of xi S5 C [43] 60.70 - 59.60 -
to contribute. Our choice of batch-wise training of QN and SSCOMP [78] 77.59 83.25 - -
SC-LALRG [75] 79.66 84.52 78.20 76.01
KN , for the computation of Hb , is scalable to larger datasets KCRSC [67] 81.40 88.10 64.70 64.30
and also computationally efficient, and memory requirement S3 COMP-C [8] 87.41 86.32 96.32 -
is reduced compared to full-scale implementations. FTRR [42] - - 70.70 66.72
PSSCl [41] - - 78.50 72.76
PSSC [41] - - 84.30 76.76
IV. E XPERIMENTAL E VALUATIONS DCFSC [54] 93.87 - - -
Struct-AE [48] 94.70 - 65.70 68.98
We extensively evaluated the proposed structure-aware DEC [71] - - 84.30 -
deep spectral embedding (SADSE) algorithm on six pub- IDEC [15] - - 88.06 86.72
SR-SSC [2] - - 91.09 93.06
licly available datasets including EYaleB [12], Coil-100 [44], EDESC [5] - - 91.30 86.20
MNIST [28], ORL [52], CIFAR-100 [27], and ImageNet- EnSC-ORGEN [77] - - 93.79 -
10 [7], and compare with fifty one existing state-of-the-art NCSC [83] - - 94.09 86.12
DSC-Net-L1 [20] 96.67 - - -
approaches including EDESC [5], PSSC [41], SENet [82], ACC_CN [32] 97.31 99.34 78.60 74.21
NCSC [83], DCFSC [54], S3 COMP-C [8], ODSC [59], SR- DSC-Net-L2 [20] 97.33 - - -
SSC [2], SSCOMP [78], EnSC-ORGEN [77], DSC [20], DLRSC [24] 97.53 - - -
RGRL-L2 [23] 97.53 96.61 81.40 75.52
DEPICT [14], Struct-AE [48], DASC [84], S2 Conv-SCN [81], ODSC [59] 97.78 - 81.20 -
MLRDSC [25], MLRDSC-DA [1], S5 C [43], SC-LALRG MESC-Net[49] 98.03 97.27 81.11 82.26
[75], KCRSC [67], SpecNet [55], ACC_CN [32], DLRSC Cluster-GAN [13] - - 96.40 92.10
DEPICT [14] - - 96.50 91.70
[24], RGRL-L2 [23], MESC-Net[49], RED-SC [74], RCFE SENet [82] - - 96.80 91.80
[36], FTRR [42], Cluster-GAN [13], SSRSC [72], S2 ESC [85], SpecNet [55] - - 97.10 92.40
DSC-DL[19], DAE [62], IDEC [15], DCGAN [50], DeCNN S2 Conv-SCN-L2 [81] 98.44 - - -
[80], VAE [26], ADC [16], AE [3], DEC [71], DAC [7], IIC S2 Conv-SCN-L1 [81] 98.48 - - -
RED-SC [74] 98.52 - 74.34 73.16
[21], DCCM [68], PICA [19], CC [33], SPICE [45], JULE DASC [84] 98.56 98.01 80.40 78.00
[73], DDC [6], SCAN [60], PCL [30], and TCL [35]. MLRDSC [25] 98.64 - - -
For all datasets, we experimented with two settings in- DSC-DL[19] 98.90 97.40 81.20 76.10
MLRDSC-DA [1] 99.18 - - -
cluding full data to be used as train and test, SADSEF , and SADSEF 99.95 99.95 97.35 92.81
using an unseen 20% test data, referred to as SADSET . The
proposed approach is compared with SOTA using the measures TABLE I: Comparison of the proposed SADSEF algorithm
used by the original authors including classification accuracy with existing SOTA on the EYaleB and MNIST datasets using
(Acc.) and normalized mutual information (NMI). A detailed full data as train and test.
ablation study is also performed to show the effectiveness of
each proposed component.
A. Experimental Settings
In all of our experiments, we used a four-layer encoder-
decoder network with ‘tanh’ activation function. First of all,
the full network is trained for 100 epochs using only the re-
construction loss then the encoder network is trained using the
remaining proposed losses with Adadalta [79] optimizer and a (a) DLRSC (b) DSC (c) SADSEF
learning rate of 1e−3 . For all experiments, the encoder network Fig. 2: Visual cluster compactness comparison of the proposed
is further trained for 1000 epochs. The hyperparameters in Eq. SADSEF algorithm with SOTA methods using t-SNE on
(7) are empirically set to λ1 = λ3 =0.002, λ2 = λ4 = 0.02 in all EYaleB dataset
experiments. Each of QN and KN is an FC network with three
layers beyond the input layer of size {1024,1024,1024} and
is trained batch by batch. KNN algorithm is then used with
3-nearest neighbors for all datasets. Adam optimizer with a into 1946/486 train/test splits in SADSET . Deep features are
learning rate of 1e−3 , β = 0.9, and γ = 200 are used for extracted from the second last layer of Densenet-201 and
the training of QN and KN in all experiments. As a linear then PCA is used to reduce dimensions to 784. A batch
clustering method, K-means is employed. size of 486 is used. The proposed SADSEF algorithm has
obtained the best accuracy of 99.95%, outperforming the
compared methods as shown in Table I. Fig. 2 shows the visual
B. Evaluations on Different Datasets comparison of the proposed SADSE algorithm with compared
EYaleB dataset contains 64 images having size 192×168 methods on this dataset. All 2432 images are plotted using the
of each of the 38 subjects under 9 different illumination con- t-SNE algorithm by assigning each cluster a different color.
ditions [29]. Following the other SOTA methods, we consider The clusters obtained by the proposed SADSE algorithm are
only 2432 frontal face images which are then randomly split more compact than the compared methods.
6
Accuracy
Accuracy
DLRSC [24] 71.86 -
70 60
MESC-Net[49] 71.88 90.76 SSCOMP [epoch 0-20] MESC-Net [epoch 4880-5000]
S3COMPC [epoch 0-20] S3COMPC [epoch 0-20]
S2 Conv-SCN-L2 [81] 72.17 - 60 DLRSC [epoch 4880-5000] 50 DLRSC [epoch 4880-5000]
SADSEF [epoch 210-230] SADSEF [epoch 210-230]
DCFSC [54] 72.70 - DSC [epoch 980-1000] DSC [epoch 380-400]
S2 Conv-SCN-L1 [81] 73.33 - 50
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
40
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
MLRDSC [25] 76.72 - Number of epochs Number of epochs
S3 COMP-C [8] 78.89 - Fig. 4: Stability comparison of SADSEF with different SOTA
MLRDSC-DA [1] 79.33 -
RCFE [36] 79.63 96.23 methods during training on EyaleB and Coil-100.
SADSEF 84.95 93.91
TABLE II: Comparison of the proposed SADSEF algorithm Methods Acc. NMI
with existing SOTA methods on Coil-100 dataset. KCRSC [67] 72.30 86.30
SSRSC [72] 78.25 -
DCFSC [54] 85.20 -
PSSCl [41] 85.25 92.58
DSC-Net-L1[20] 85.75 -
DSC-Net-L2[20] 86.00 -
RED-SC [74] 86.13 91.16
PSSC [41] 86.75 93.49
DASC [84] 88.25 93.15
S2 Conv-SCN-L2 [81] 88.75 -
MLRDSC [25] 88.75 -
S2 ESC [85] 89.00 93.52
S2 Conv-SCN-L1 [81] 89.50 -
(a) DLRSC (b) DSC (c) SADSEF SADSEF 90.75 94.66
Fig. 3: Visual cluster compactness comparison of the proposed
SADSEF with SOTA methods using t-SNE on Coil-100 TABLE III: Comparison of the proposed SADSEF algorithm
dataset. with existing SOTA methods on ORL dataset.
Coil-100 dataset has 7200 gray-scale images having size The dataset exhibits face images with open or closed eyes,
128×128 of 100 different objects taken at pose intervals wearing glasses or not, and having a smile or not. A random
of 5 degrees. Deep features are extracted from the second train/test split of 320/80 is used in SADSET . Deep features
last layer of DenseNet-201 and then PCA is used to reduce are extracted from the second last layer of Densenet-121 and
dimensionality to 3000. We used a batch size of 720, and a ran- PCA is used to reduce dimensionality to 400 with a batch size
dom train/test split of 5760/1440 in SADSET . The proposed of 400. The proposed SADSEF has obtained an accuracy of
SADSEF has obtained an accuracy of 84.95% outperforming 90.75% outperforming SOTA methods as shown in Table III.
SOTA methods as shown in Table II. Fig. 3 shows a visual CIFAR-100 dataset contains 60000 different object images
comparison of cluster compactness for different algorithms. of size 32 × 32 of 100 different subjects, categorized into 20
Training stability of SADSE algorithm: In Fig. 4, we super-classes which are considered as ground-truth [27]. A
compare the training performance stability of different SOTA random train/test split of 50000/10000 is used in SADSET .
methods on EYaleB and Coil-100 datasets, respectively. The Deep features of dimension 512 are extracted from the second
existing compared methods were trained using a single batch last layer of the baseline algorithm which is Contrastive
over the full dataset in each epoch while SADSE is trained Clustering (CC) [33] for this dataset. A batch size of 1000
using 5 batches for EYaleB and 10 batches for Coil-100 in is used in these experiments. The proposed SADSEF has
each epoch. Despite batch-based training, the proposed method
does not fluctuate much compared to the SOTA methods. The Datasets Methods Acc. NMI F1 score Precision
stability of SADSE demonstrates better convergence as the EyaleB
SADSEF 99.95 99.95 99.95 99.95
model gets trained. SADSET 99.95 99.95 99.95 99.95
MNIST dataset consists of 70000 grayscale images of size SADSEF 84.95 93.91 84.95 84.57
Coil-100
SADSET 86.04 94.17 86.04 86.26
28×28 with 10 different classes [28]. For this dataset, we
SADSEF 97.35 92.81 97.35 97.35
computed scattered convolutional features using [4] as a pre- MNIST
SADSET 97.43 93.23 97.43 97.43
processing step, as used by other SOTA [8]. PCA is then SADSEF 90.75 94.66 90.75 92.61
ORL
used to reduce dimensionality to 2000 and a batch size of 500 SADSET 87.50 95.91 87.50 88.75
is used. The standard 60000/10000 train/test split is used in CIFAR-100
SADSEF 47.75 45.77 47.75 47.86
SADSET 47.79 46.47 47.79 52.30
SADSET . The proposed SADSEF has obtained an accuracy
SADSEF 91.69 87.53 91.69 91.78
of 97.35% compared to SOTA methods as shown in Table I. ImageNet-10
SADSET 90.43 87.33 90.43 90.55
ORL dataset contains 400 face images of size 112 × 92 of TABLE IV: Performance comparison of the proposed
40 different subjects, where each subject presents 10 images of SADSET algorithm over unseen test split and full dataset used
different facial expressions under varying light conditions [52]. for both training and testing (SADSEF ).
7
Memory (GB)
96
Time (min.)
Accuracy
1000 3.00
94 2.75
750
2.50
92 500
2.25
250
90 2.00
10K 20K 30K 40K 50K 60K 70K 10K 20K 30K 40K 50K 60K 70K 10K 20K 30K 40K 50K 60K 70K
Training Data points Training Data points Training Data points
[11] E. Elhamifar and R. Vidal. Sparse subspace clustering: Kamangar. Multi-level representation learning for deep
Algorithm, theory, and applications. IEEE Transactions subspace clustering. In Proceedings of the IEEE/CVF
on Pattern Analysis and Machine Intelligence, 2013. 1, Winter Conference on Applications of Computer Vision,
8 pages 2039–2048, 2020. 1, 5, 6
[12] Athinodoros S. Georghiades, Peter N. Belhumeur, and [26] Diederik P Kingma and Max Welling. Auto-encoding
David J. Kriegman. From few to many: Illumination variational bayes. arXiv preprint arXiv:1312.6114, 2013.
cone models for face recognition under variable lighting 5, 7
and pose. IEEE Trans on Pattern Analysis and Mach. [27] Alex Krizhevsky, Geoffrey Hinton, et al. Learning
Int., 23(6):643–660, 2001. 2, 5 multiple layers of features from tiny images. 2009. 2, 5,
[13] Kamran Ghasedi, Xiaoqian Wang, Cheng Deng, and 6
Heng Huang. Balanced self-paced learning for generative [28] Yann LeCun, Léon Bottou, Yoshua Bengio, Patrick
adversarial clustering network. In Proc. of the Confer- Haffner, et al. Gradient-based learning applied to
ence on Computer Vision and Pattern Recognition, pages document recognition. Proceedings of the IEEE,
4391–4400, 2019. 5 86(11):2278–2324, 1998. 2, 5, 6
[14] Kamran Ghasedi Dizaji, Amirhossein Herandi, Cheng [29] Kuang-Chih Lee, Jeffrey Ho, and David J Kriegman.
Deng, Weidong Cai, and Heng Huang. Deep cluster- Acquiring linear subspaces for face recognition under
ing via joint convolutional autoencoder embedding and variable lighting. IEEE Transactions on Pattern Analysis
relative entropy minimization. In Proceedings of the & Machine Intelligence, (5):684–698, 2005. 5
IEEE international conference on computer vision, pages [30] Junnan Li, Pan Zhou, Caiming Xiong, and Steven CH
5736–5745, 2017. 5 Hoi. Prototypical contrastive learning of unsupervised
[15] Xifeng Guo, Long Gao, Xinwang Liu, and Jianping Yin. representations. arXiv preprint arXiv:2005.04966, 2020.
Improved deep embedded clustering with local structure 5, 7
preservation. In IJCAI, pages 1753–1759, 2017. 5 [31] Mu Li, Xiao-Chen Lian, James T Kwok, and Bao-Liang
[16] Philip Haeusser, Johannes Plapp, Vladimir Golkov, Elie Lu. Time and space efficient spectral clustering via
Aljalbout, and Daniel Cremers. Associative deep clus- column sampling. In CVPR 2011, pages 2297–2304.
tering: Training a classification network with no labels. IEEE, 2011. 2
In Pattern Recognition: 40th German Conference, Oct. [32] Xuelong Li, Rui Zhang, Qi Wang, and Hongyuan Zhang.
9-12,, pages 18–32, 2019. 5, 7 Autoencoder constrained clustering with adaptive neigh-
[17] Li He, Nilanjan Ray, Yisheng Guan, and Hong Zhang. bors. IEEE transactions on neural networks and learning
Fast large-scale spectral clustering via explicit feature systems, 32(1):443–449, 2020. 5
mapping. IEEE transactions on cybernetics, 49(3):1058– [33] Yunfan Li, Peng Hu, Zitao Liu, Dezhong Peng,
1071, 2018. 2 Joey Tianyi Zhou, and Xi Peng. Contrastive clustering.
[18] R. Hu, L. Fan, and L. Liu. Co‚ Aesegmentation of In Proceedings of the AAAI Conference on Artificial
3d shapes via subspace clustering. Computer graphics Intelligence, volume 35, pages 8547–8555, 2021. 5, 6, 7
forum, 31(5):1703–1713, 2012. 1 [34] Yeqing Li, Junzhou Huang, and Wei Liu. Scalable se-
[19] J Huang, S Gong, and X Zhu. Deep semantic clustering quential spectral clustering. In Thirtieth AAAI conference
by partition confidence maximisation. In Pro. Conference on artificial intelligence, 2016. 2
on Computer Vision and Pattern Recognition, 2020. 5, 7 [35] Yunfan Li, Mouxing Yang, Dezhong Peng, Taihao Li,
[20] P. Ji, T. Zhang, H. Li, M. Salzmann, and I. Reid. Deep Jiantao Huang, and Xi Peng. Twin contrastive learning
subspace clustering networks. In Proc. NIPS, 2017. 1, for online clustering. International Journal of Computer
2, 5, 6 Vision, 130(9):2205–2221, 2022. 5, 7
[21] Xu Ji, Joao F Henriques, and Andrea Vedaldi. Invariant [36] Zhihui Li, Feiping Nie, Xiaojun Chang, Liqiang Nie,
information clustering for unsupervised image classifica- Huaxiang Zhang, and Yi Yang. Rank-constrained spectral
tion and segmentation. In Proceedings of the IEEE/CVF clustering with flexible embedding. IEEE transactions
International Conference on Computer Vision, pages on neural networks and learning systems, 29(12):6073–
9865–9874, 2019. 5, 7 6082, 2018. 5, 6
[22] Zhao Kang, Zhiping Lin, Xiaofeng Zhu, and Wenbo Xu. [37] G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, and Y. Ma. Robust
Structured graph learning for scalable subspace cluster- recovery of subspace structures by low-rank representa-
ing: From single view to multiview. IEEE Transactions tion. IEEE NLM, 35(1):171–184, 2013. 1
on Cybernetics, 2021. 2 [38] He J. Liu, W. and S. F. Chang. Large graph construction
[23] Zhao Kang, Xiao Lu, Jian Liang, Kun Bai, and Zenglin for scalable semi-supervised learning. In ICML-10, pages
Xu. Relation-guided representation learning. Neural 679–686, 2010. 3
Networks, 131:93–102, 2020. 5 [39] C. Y. Lu, H. Min, Z. Q. Zhao, L. Zhu, D. S. Huang, and
[24] Mohsen Kheirandishfard, Fariba Zohrizadeh, and Farhad S. Yan. Robust and efficient subspace segmentation via
Kamangar. Deep low-rank subspace clustering. In least squares regression. In Proc. ECCV, pages 347–360,
Proceedings of the IEEE/CVF Conference on Computer 2012. 1
Vision and Pattern Recognition Workshops, pages 864– [40] Feng J. Lin Z. Mei T. Lu, C. and S. Yan. Subspace
865, 2020. 2, 5, 6 clustering by block diagonal representation. IEEE trans-
[25] Mohsen Kheirandishfard, Fariba Zohrizadeh, and Farhad actions on pattern analysis and machine intelligence
10
and Yazhou Liu. Human motion segmentation via robust and Hongdong Li. Neural collaborative subspace cluster-
kernel sparse subspace clustering. IEEE Transactions on ing. In International Conference on Machine Learning,
Image Processing, 27(1):135–150, 2017. 1 pages 7384–7393. PMLR, 2019. 1, 2, 5
[71] Junyuan Xie, Ross Girshick, and Ali Farhadi. Unsu- [84] Pan Zhou, Yunqing Hou, and Jiashi Feng. Deep adver-
pervised deep embedding for clustering analysis. In sarial subspace clustering. In Proceedings of the IEEE
International conference on machine learning, pages Conference on Computer Vision and Pattern Recognition,
478–487, 2016. 5, 7 pages 1596–1604, 2018. 1, 5, 6
[72] Jun Xu, Mengyang Yu, Ling Shao, Wangmeng Zuo, [85] Wenjie Zhu, Bo Peng, and Chunchun Chen. Self-
Deyu Meng, Lei Zhang, and David Zhang. Scaled supervised embedding for subspace clustering. In Pro-
simplex representation for subspace clustering. IEEE ceedings of the 30th ACM International Conference on
Transactions on Cybernetics, 51(3):1493–1505, 2019. 5, Information & Knowledge Management, pages 3687–
6 3691, 2021. 5, 6
[73] Jianwei Yang, Devi Parikh, and Dhruv Batra. Joint
unsupervised learning of deep representations and image
clusters. In Proceedings of the IEEE conference on
computer vision and pattern recognition, pages 5147–
5156, 2016. 5, 7
[74] Shuai Yang, Wenqi Zhu, and Yuesheng Zhu. Residual
encoder-decoder network for deep subspace clustering.
In 2020 IEEE International Conference on Image Pro-
cessing (ICIP), pages 2895–2899. IEEE, 2020. 5, 6
[75] Ming Yin, Shengli Xie, Zongze Wu, Yun Zhang, and Hira Yaseen is a Ph.D. fellow in the Center for
Robot Vision, Department of Computer Science, In-
Junbin Gao. Subspace clustering via learning an adaptive formation Technology University, Lahore, Pakistan.
low-rank graph. IEEE Transactions on Image Processing, Previously she did her BS in Computer Engineering
27(8):3716–3728, 2018. 1, 5 from the University of Engineering and Technology
[76] Chong You, Chi Li, Daniel P Robinson, and René Lahore and her Master’s in Computer Science from
Comsats University Islamabad. Her research areas
Vidal. Scalable exemplar-based subspace clustering on include Computer Vision and Machine Learning.
class-imbalanced data. In Proceedings of the European During her Ph.D., she is working on unsupervised
Conference on Computer Vision (ECCV), pages 67–83, representation learning, data clustering, and object
classification.
2018. 2
[77] Chong You, Chun-Guang Li, Daniel P Robinson, and
René Vidal. Oracle based active set algorithm for
scalable elastic net subspace clustering. In Proceedings
of the IEEE conference on computer vision and pattern
recognition, pages 3928–3937, 2016. 2, 4, 5, 6
[78] Chong You, Daniel Robinson, and René Vidal. Scalable
sparse subspace clustering by orthogonal matching pur-
suit. In Proceedings of the IEEE conference on computer
vision and pattern recognition, pages 3918–3927, 2016. Arif Mahmood is a Professor in the Computer
5 Science Department, and Dean Faculty of Sciences
[79] Matthew D. Zeiler. Adadelta: An adaptive learning rate at the Information Technology University and also
Director of the Center for Robot Vision. His current
method. ArXiv, abs/1212.5701, 2012. 5 research directions in Computer Vision are person
[80] Matthew D Zeiler, Dilip Krishnan, Graham W Taylor, pose detection and segmentation, crowd counting
and Rob Fergus. Deconvolutional networks. In 2010 and flow detection, background-foreground model-
IEEE Computer Society Conference on computer vision ing in complex scenes, object detection, human-
object interaction detection, and abnormal events
and pattern recognition, pages 2528–2535. IEEE, 2010. detection. He is also actively working in diverse Ma-
5, 7 chine Learning applications including cancer grading
[81] Junjian Zhang, Chun-Guang Li, Chong You, Xianbiao and prognostication using histology images, predictive auto-scaling of services
hosted on the cloud and the fog infrastructures, and environmental monitoring
Qi, Honggang Zhang, Jun Guo, and Zhouchen Lin. Self- using remote sensing. He has also worked as a Research Assistant Professor
supervised convolutional subspace clustering network. In with the School of Mathematics and Statistics, University of Western Australia
Proceedings of the IEEE Conference on Computer Vision (UWA) where he worked on Complex Network Analysis. Before that, he
was a Research Assistant Professor at the School of Computer Science and
and Pattern Recognition, pages 5473–5482, 2019. 1, 2, Software Engineering, UWA, and performed research on face recognition,
5, 6 object classification, and action recognition.
[82] Shangzhi Zhang, Chong You, René Vidal, and Chun-
Guang Li. Learning a self-expressive network for
subspace clustering. In Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition,
pages 12393–12403, 2021. 1, 4, 5
[83] Tong Zhang, Pan Ji, Mehrtash Harandi, Wenbing Huang,