0% found this document useful (0 votes)
7 views

Spectral Approach For Tabular and Graph Data Clustering

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Spectral Approach For Tabular and Graph Data Clustering

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Spectral approach for tabular and graph data

clustering

Dr.Debasis Mohapatra
Assistant Professor
Department of Computer Science and Engineering,
PMEC, Berhampur, Odisha, India, 761003

National Seminar on ”Emerging Applications of Artificial Intelligence and Data Sci-


ence”, Berhampur University, 26/09/2023

National Seminar(BU) 2023


Presentation Overview

▶ Introduction
▶ Tabular and Graph data
▶ Spectral Clustering for Tabular data
▶ Analysis on Spectral Clustering Algorithm
▶ Spectral Clustering for Tabular data (Illustration)
▶ Spectral Clustering for Graph data (Illustration)
▶ Measuring the strength of clustering
▶ Computation of modularity
▶ Implementation of Spectral Clustering for graph data
▶ Conclusions

National Seminar(BU) 2023


Introduction

▶ What is Clustering?
Clustering is an unsupervised machine-learning technique that
works on unlabeled data. It groups the objects based on their
similarity. The objects in a group are similar to each other and
dissimilar to the objects present in other groups.
▶ Why Clustering is important?
It helps in understanding the pattern or structure present in a
data set that is not visible otherwise.
▶ Some popular Clustering Algorithms
▶ Tabular data: K-means Clustering, Hierarchical Clustering, DB-
SCAN, etc.
▶ Graph data: Girvan-Newman, Louvain, Leiden, etc.

National Seminar(BU) 2023


Tabular and Graph data
Tabular Data:
Atr1 Atr2 ... Atrn
Record1 ...
Record2 ...
:
:
Recordm ...

Graph Data:

National Seminar(BU) 2023


Spectral Clustering of Tabular data

1. Given a dataset of n points, the first step is to construct a simi-


larity matrix, where the entries of the matrix represent the pairwise
similarity between the data points. Common similarity measures
include Euclidean distance, cosine similarity, and Gaussian kernel
similarity.
2. Next, the similarity matrix is transformed into a Laplacian matrix,
which is a measure of the connectivity between the data points.
There are different types of Laplacian matrices that can be used,
such as the unnormalized Laplacian, the normalized Laplacian, and
the symmetric normalized Laplacian.
3. The eigenvectors and eigenvalues of the Laplacian matrix are
then computed. The number of eigenvectors to be computed is a
hyperparameter that needs to be tuned.

National Seminar(BU) 2023


Spectral Clustering of Tabular data(Contd...)

4. The eigenvectors are arranged into a matrix, and the rows of


this matrix are used as the new feature representations of the data
points.
5. Finally, a clustering algorithm such as k-means is applied to the
new feature representations to obtain the final clustering.

National Seminar(BU) 2023


Analysis on Spectral Clustering Algorithm

The time complexity of Spectral Clustering Algorithm is O(n3 ) where


n is the number of nodes in the graph.

National Seminar(BU) 2023


Spectral Clustering of Tabular data(Illustration)

National Seminar(BU) 2023


Spectral Clustering of Graph data(Illustration)

National Seminar(BU) 2023


Measuring the quality/strength of clustering
Modularity: Modularity is a measure of the structure of networks or
graphs which measures the strength of the division of a network into
modules (also called groups, clusters, or communities). Networks
with high modularity have dense connections between the nodes
within modules but sparse connections between nodes in different
modules.
v
X Lc Kc 2
Q= ( − γ( ) ) (1)
m 2m
c=1
where the sum iterates over all communities/clusters c, m is the
number of edges, Lc number of intra-cluster links for cluster c, Kc is
the sum of degrees of the nodes in cluster c, and γ is the resolution
parameter. The resolution parameter sets an arbitrary tradeoff be-
tween intra-group edges and inter-group edges. It is very common
to simply use γ = 1.

National Seminar(BU) 2023


Computation of modularity (Example)

Pv Lc Kc 2
Q= c=1 ( m − γ( 2m ) )

v = 2, γ = 1, m = 7

Q = ( 37 − ( 14
7 2
) ) + ( 37 − ( 14
7 2
) )

= (0.42857 − 0.25) ∗ 2

= 0.35714

National Seminar(BU) 2023


Spectral Clustering of Graph data (Implementation)

Implementation of Spectral Clustering of Graph data

National Seminar(BU) 2023


Conclusion

▶ Spectral clustering algorithms are not applicable to large datasets.


▶ The idea is applicable in various fields. In computer science, the
concept of graph clustering is used in social network analysis,
image processing, natural language processing, etc. In biolog-
ical science, it is helpful in finding out the closeness present
among the various biological entities like organisms, cells, pro-
teins, etc.
▶ Apart from these two fields, this concept is very much applicable
to economics, sociology, political science, etc.

National Seminar(BU) 2023


References
1. Despalatović, L., Vojković, T., Vukicević, D. (2014, May). Commu-
nity structure in networks: Girvan-Newman algorithm improvement.
In 2014 37th international convention on information and commu-
nication technology, electronics and microelectronics (MIPRO) (pp.
997-1002). IEEE.
2. Newman M.E.J., Networks: An Introduction( Oxford University Press,
New York, 2010 )
3. Blondel, V. D., Guillaume, J. L., Lambiotte, R., Lefebvre, E. (2008).
Fast unfolding of communities in large networks. Journal of statistical
mechanics: theory and experiment, 2008(10), P10008.
4. Girvan, M., Newman, M. E. (2002). Community structure in social
and biological networks. Proceedings of the national academy of
sciences, 99(12), 7821-7826.
5. Schubert, E., Hess, S., Morik, K. (2018). The Relationship of DB-
SCAN to Matrix Factorization and Spectral Clustering. In LWDA
(pp. 330-334).

National Seminar(BU) 2023


Thank You

National Seminar(BU) 2023

You might also like