0% found this document useful (0 votes)

74 views

Clustering

This document summarizes a research paper that evaluates the quality of hierarchical clustering methods. It discusses three parameters used to measure cluster quality: cohesion, silhouette index, and elapsed time. Cohesion measures how closely related objects are within a cluster. Silhouette index represents cluster quality graphically and calculates how similar an object is to other objects in its cluster compared to objects in other clusters. Elapsed time refers to the total time taken for clustering, with lower times indicating better quality. The paper aims to use these parameters to evaluate the performance of different hierarchical clustering techniques and identify ones that produce high-quality clusters efficiently.

Uploaded by

mkmanojdevil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as RTF, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

74 views

Clustering

Uploaded by

mkmanojdevil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as RTF, PDF, TXT or read online on Scribd

You are on page 1/ 5

Cluster Quality Based Performance Evaluation of

Hierarchical Clustering Method

Nisha
Student

Puneet Jai Kaur

UIET, Panjab University

Assistant Professor
UIET, Panjab University

Chandigarh, India
[email protected]

Chandigarh, India

[email protected]
n
Abstract Clustering is an important phase in data mining. A
number of different clustering methods are used to perform cluster
analysis: Partitioning Clustering, hierarchical clustering, gridbased clustering, model-based, graph based clustering and density
based clustering and so on. Hierarchical method helps us to cluster
the data objects in the form of a tree known as hierarchy. And each
node in hierarchy is known as the cluster. Hierarchical clustering
can be performed in two ways: agglomerative clustering and
divisive clustering. Agglomerative clustering is always more
preferable. For a good cluster analysis, the quality of the clusters
should be high. In this paper, we will measure the quality of
clusters with the help of three parameters: Cohesion measurement,
Silhouette index and Elapsed time.
KeywordsData mining, Clustering, Hierarchical Clustering,
Quality, Quality parameters, Cohesion, Silhouette index, Elapsed
time.

INTRODUCTION

Cluster analysis is a method to detect the number of

clusters from a given data set. A data set can be defined as
the collection of objects or data [2].
In cluster analysis, the objects in the cluster are arranged
in such a way that they are similar to the objects within
the same clusters and are different from the objects that
are lying outside the same cluster [2]. Clustering is an
unsupervised learning because it does not need any
predefined clusters. Clustering is performed for a number
of reasons. Two main reasons to perform clustering are:
data interpretation, it means the data can be easily
understandable and data compression, it means that the
data can be easily optimized and data set can be used
efficiently. The main applications of cluster analysis are to
detect the different patterns, classification, data mining,
grouping the objects based on similarity or dissimilarity
criteria, knowledge discovery, searching the objects and
so on[3]. The major clustering techniques are classified

into the following categories: Partitioning clustering,

hierarchical clustering, density-based clustering, gridbased clustering, model based clustering etc. [4]. The
main method discussed in this paper is hierarchical
clustering in which a number of clusters are nested into
the form of a tree and at each level a cluster is obtained by
the union of its sub clusters. A method that obtains a high
quality cluster is always desirable. Therefore the main
idea of this paper is to discuss the various quality criterion
used for cluster analysis. A number of quality criterion
which will be discussed later are: Cluster cohesion,
silhouette index, time elapsed.
Cluster cohesion is becoming the most important aspect in
data mining. It can be used to determine the quality of the
clusters. Cluster cohesion is the degree of association
between the objects of the cluster. Therefore a high value
of cluster cohesion is desirable for cluster analysis [5]. A
number of different metrics have been implemented in the
past years to measure the cluster cohesion values, but no
results has yet been reached that which metrics best
calculates the cohesion. Existing cohesion metrics are
based on the similarity measures between the objects of
clusters [5]. It means clustering methods group the objects
based on the interconnection between the objects of the
class. As a result, a high value of similarity means the
high cohesion between the objects of the clusters and vice
versa.
The main index for measuring quality of a cluster in a
graphical form is silhouette index. This index is preferred
over all indices of graphical form, because silhouette
represents the clusters of a dataset in scattered form. This
index measures the amount of similarity of an object to

the objects within the cluster as compared to the

objects outside the cluster [11].

One of the quality criteria is calculation of time elapsed.
Time elapsed in cluster analysis is the amount of the time
taken to perform the clustering. A minimum value of time
elapsed is always desirable for any clustering method. For
large databases, the time taken is more as compared to the
small datasets.
II.

RELATED WORK

To improve software engineering, a number of data mining

techniques can be applied which includes association mining,
classification, generalization, clustering, decision trees, pattern
classification and so on [6]. Clustering (Cluster analysis) is an
important technique to improve software engineering.
According to [7], clustering is defined as the grouping of the
data objects in a data set in such a way that the objects in the
same cluster are similar to each other but they are different
from the objects outside the cluster. Hierarchical clustering in
[8] is defined as a technique in which the data objects are
grouped together to form a hierarchy known as a tree.
According to [10], the time taken by all the data objects to
make the final clusters is known as the elapsed time. For a
good cluster quality, the amount of elapsed time should be
low. In [11], cohesion is the extent to which the different
objects in the same cluster are associated with each other. The
main motive for good quality clusters is to obtain the more
cohesive clusters. In [12], silhouette index is a parameter
which is used to plot the cluster quality in the form of
scattered points. Silhouette index calculates the value in the
range [-1, 1].
III.

CLUSTERING METHOD

A. Hierarchical Clustering
Hierarchical clustering is defined as a method in which
clusters are formed in the form of a tree or hierarchy. Every
node in the tree represents the different cluster and the clusters
in the hierarchy are known as dendrograms. Hierarchical
clustering can be performed in two ways based on splitting
and merging of clusters: divisive method and agglomerative
method.
Divisive method of hierarchical clustering is also known as
top-down approach in which a large data set is given initially
and this data set is further divided into a number of smaller
subsets (known as clusters) until a threshold is reached [7] .
Agglomerative method works in the reverse direction of
divisive method. In this method, a number of clusters are
given initially and these clusters are merged in such a way that
the two clusters to be merged are very similar to each other
[7]. These clusters are merged together until a large cluster is

formed. Therefore this method is also known as bottom-up

approach.
The clusters are split or combine to a specific level. In order to
decide where splitting of a cluster will take place or which two
clusters should be combine, a measuring criteria known as
dissimilarity among the sets of data is required.
IV.

QUALITY MEASUREMENT

To evaluate the main approach of our study, we consider three

main parameters for clusters quality. The three parameters are:
Cohesion measurement, silhouette index and elapsed time.
These three parameters are discussed in detail below:
A. Elapsed Time
One of the important criteria in measuring the quality of
clusters is the time taken for performing cluster analysis. The
amount of total time taken by the dataset to make the clusters
is known as elapsed time. Lesser the amount of time taken by
the dataset to make clusters, better will be the quality of
clusters.
B. Cohesion Measurement
In paper [11], a new measure of similarity between the clusters
named as cohesion was introduced. Cohesion is a
measurement criteria used to determine that how well the
objects of a dataset are combined together to form the good
quality clusters. The main aim of the cohesion measurement is
to determine the inter-cluster distance, which is degree of
association between the objects of a dataset within the same
cluster. Therefore, a high value of cohesion within the cluster
is required for good quality clusters. In our approach, sum of
square error is used to determine the cohesion.
Sum of squared error (SSE) is the most widely used and
simplest method used for cluster analysis. SSE calculates the
distance within the cluster. This distance is measured as the
sum of square of all the distances between the objects in same
cluster.
SSE = (xi - xx)2
Where xx refers to the mean of all the objects in a dataset, Xi
refers to that object whose difference from the mean is to be
calculated and the symbol tells us to sum the differences
(xi - xx) for all i.
C. Silhouette Index
Silhouette index is the best and most preferred quality criteria
in cluster analysis. Silhouette index represents a graphical
form in which it shows that how similar an object of a data set
is to the other objects in the same cluster [12]. For each object,
the value of silhouette index known as silhouette width is
calculated and this value varies from -1 to +1. There are three
cases depending on the value of silhouette width which are as
follows:

A silhouette width with a value nearly equal to +1

means, the objects of a data set is in the correct
cluster.

A silhouette width with a value nearly equal to 0

means, the object can also be in some other cluster.

Lastly, a value nearly equal to -1 means, the object of

a data set is in wrong cluster.

Where D (A, B) is the distance to be calculated, C A is the

center of cluster A and CB is the center of cluster B.
B. Single Linkage Clustering
A Single Linkage Clustering is also known as the nearest
neighbor technique because it defines the distance between the
two closest objects known as minimum distance in clusters
[13].Mathematically, the single linkage distance can be
calculated as:

The silhouette width of an object can be calculated as:

D (A, B) = min {d (i, j)}

Where object i is in cluster A and j is in cluster B respectively.
C. Complete Linkage Clustering
A Complete Linkage Clustering measures the distance
between two farthest clusters known as the maximum cluster
distance [13]. It is also known as the farthest neighbor
technique. This distance can be calculated as:

Where n is total number of objects in a data set, a i is the

average distance between the object i and all the other objects
in the same cluster , b i is taken as the minimum of all the
average dissimilarities between the object i and all the other
elements outside the cluster.

D (A, B) = max {d (i, j)}

Silhouette index has an advantage over all indices used for

quality measure that it represents the clusters in the visually
scattered form and the clusters thus obtained are more accurate
than other indices.
V.

Where object i is in cluster A and j is in cluster B respectively.

D. Normal Linkage Clustering
In normal linkage clustering, the average distance between all
pairs of objects is calculated known as the average distance.
The mathematical expression is:

EXPERIMENTAL DESIGN

Hierarchical clustering is an effective method to evaluate the

clusters form a given dataset as it combines a number of
clusters into the form of a tree (called as dendrogram) in such
a way that the sub clusters are fundamentally similar to one
another.
The first phase in making dendrogram is to discover distances
between the objects of a dataset. When all the distances
between the clusters have discovered, the merging or splitting
operation is applied on the given. Agglomerative algorithm is
always more preferable than divisive algorithm. Therefore, a
merge operation will be applied on two nearest sub clusters
(sub clusters having minimum distance between them) to form
a hub. Again the same procedure will be followed for next two
sub clusters and so on until a single cluster will be obtained.
Note that once grouping begins, we work with genuine things
(e.g. a solitary quality) and pseudo-things that contain various
genuine things. A number of methods to discover the distances
when we deal with pseudo-things are: centroid linkage, single
linkage, complete linkage, and normal linkage.
A. Centroid Linkage Clustering
In Centroid Linkage Clustering, the distance between the
centers of the respective clusters is calculated and those
clusters with minimum distance between them will be
combined together to form a hub in the tree. The centroid
distance between two clusters can be calculated as:
D (A, B) = || CA CB ||

D (A, B) = average {d (i, j)}

Where object i is in cluster A and j is in cluster B respectively.
Algorithm 1: Hierarchical Clustering

Start with a number of n sub clusters at level L(0) = 0 and

a counter C = 0.

II.

Locate the nearest neighbors that is the neighbors with

minimum distance say pair (A), (B), as indicated by
D[(A),(B)] = min d[(i),(j)]

III.

Increase the counter number: C = C +1. Merge the

clusters (A) and (B) to a single group Set the level of this
hierarchy to
L(C) = D[(A),(B)]

IV.

Upgrade the similarity lattice, by erasing the lines and

segments comparing to clusters (A) and (B) and including
a column and segment relating to the recent hierarchy.
The similarity between the new cluster, most recent
cluster(A,B) and old cluster(k) can be calculated as:
D[(k), (A,B)] = min [D[(k),(A)], D[(k),(B)] ]

Stop, if only one clusters is remaining. Else, go to step 2.

VI.

METHODOLOGY AND RESULTS

that it gives us the relationship of an object with its

neighbor clusters and represent it in the form of scattered
points. Our result analysis is shown below:

The steps of methodology followed in this paper is given in

(see Fig.1):

Table 1. Elapsed Time for Different Population Size

Cluster Volume
Elapsed Time
100x3

0.092

200x3

0.010

300x3

0.012

400x3

0.020

500x3

0.031

Table 2. Cohesion Measurement for Different Population Size

Cluster Volume
Cohesion Measurement

Fig. 1. Steps of methodology used

The software requirement for our proposed work is
MATLAB R2009b. To implement the results, we have
taken population size as our dataset. We have performed
the cluster analysis on five different sizes of same dataset.
The first dataset is 100*3, where 100 represents the rows
(i.e. the total number of values for respective attribute) and
3 represents the number of total attributes in the dataset.
Then the size of the dataset increased to 200, 300 and so
on.
The next step is to perform all the three quality parameters
on these population sizes. And various results are obtained.
According to the result analysis, with increase in volume of
population, the elapsed time also increases except for the
first time. At first time, the time taken to make clusters is
maximum because initially the memory location of the
clusters are also identified. And the population volume with
minimum elapsed time is considered as the best quality
cluster. Therefore in our result analysis, the time taken by
the population volume of 200*3 is minimum, so we can say
that this volume is best suitable for clustering. Second
observation from our results is cohesion measurement
changes alternatively with increase in the volume of
population. That is in our result, cohesion measurement is
minimum at a volume of 100*3, and then cohesion
increases at a volume of 200*3, then again decreases at a
volume of 300*3 and so on. Therefore we can say that
cohesion only depends on the type of clusters formed by
hierarchical clustering method in our case, it does not
depend on the size or volume of dataset. But the main
advantage of cohesion measurement is that it gives us the
information about the association between the objects of a
dataset so a high value of cohesion results in a good cluster
quality. Therefore we can observe that a population volume
with 200 records is the best cluster volume for cluster
analysis. The third observation about our result analysis is
that the silhouette index for hierarchical clustering method
is worst among most of the clustering method, because it is
representing most of the objects in wrong clusters, the only
use of silhouette index in case of hierarchical clustering is

100x3

0.8210

200x3

0.8294

300x3

0.8219

400x3

0.8253

500x3

0.8215

There is no table generated for silhouette index because the

silhouette index represent values only in graphical form. The
corresponding graphical representations of three parameters is
(see Fig.2, Fig.3, Fig.4)

Fig. 2. Graphical representation of Elapsed Time

a low elapsed time always results in a good quality cluster.

The achieved results shows the relationship between all the
three parameters. However the approach performed in this can
be used for various object oriented systems to make more
general conclusions. In future work, we propose to: improve
the silhouette index values using some another method of
clustering, because we have seen that silhouette index is worst
for hierarchical clustering and then we will evaluate the
quality parameters of that clustering method also and finally
we will compare the results of both methods.

References
[1]

Fig. 3. Graphical representation of Cohesion Measurement

[2]

[3]

[4]

[5]

[6]

[7]

Fig.4. Silhouette Plotting for different population size.

VII.

CONCLUSION

The main technique in data mining to improve software

engineering is discussed in this paper. This paper explored
hierarchical clustering method to improve the quality of the
clusters.
The proposed work has been implemented using three quality
parameters such as cohesion measurement, silhouette index
and elapsed time. All the existing cohesion metrics like
LCOM (Lack of cohesion metric), LCC (Loose class
cohesion), TCC (tight class cohesion) etc. are based on the
attributes of the objects in a dataset. The cohesion
measurement is directly or indirectly based on the closeness of
the objects. Therefore, we used cluster analysis to produce
more efficient clusters of related objects in a dataset.
Silhouette index is preferred over all other indices because
only silhouette index represents the quality of clusters in the
form of scattered data points graphically. And the elapsed time
is taken as one of the parameter for identifying the quality of
clusters because a lesser amount of time taken results in a
better quality of clusters. So we can say that a cluster with
high cohesion, zero and positive values of silhouette index and

[8]

[9]
[10]

[11]

[12]

[13]

[14]

[15]

T. Xie, S. Thummalapenta et.al, Data mining for software

engineering, IEEE Computer Society August 2009, p. 55-62,
2009.
N.Griral , M. Crucianu et.al, Unsupervised and Semi-supervised
Clustering: a Brief Survey ,INRIA Rocquencourt, B.P 105,France,
pp. 1-12, August 15,2005.
H.Wahidah, L.V.Pey et.al., Application of Data Mining
Techniques for Improving Software Engineering, The 5th
International Conference on Information Technology, vol.2, p. 1-5
2011.
R.R.Henrique, E.A.A.Ahmed, Proposed Application of Data
Mining Technique for Clustering Software Projects, INFOCOMPspecial edition,p. 43-48, Jul 2010
C.Keith, A.Peter et.al., Towards Automating Class- Splitting
Using Betweenness Clustering, IEEE/ACM International
Conference in Automated Software Engineering, p. 595-599, Nov
2009.
K.J.Puneet, Pallavi, Data Mining Techniques for Software Defect
Prediction, International Journal of Software and Web Sciences,
Vol.3, p. 54-57, Feb 2013.
J.Aastha, K.Rajneet. Review :Comparative Study of Various
Clustering Techniques in Data Mining, International Journal of
Advanced Research in Computer Science and Software
Engineering, vol. 3, p.55-57, March 2013.
R.Yogita, Dr.R.Harish, A Study of Hierarchical Clustering
Algorithm, International Journal of Information and Computation
Technology, vol.3, p.1225-1232, Nov 2013.
M.Fionn, C.Pedro, Methods of Hierarchical Clustering, p. 1-21,
May 3 2011.
K.Mandeep, K.Usvir, Comparison Between K-mean and
Hierarchical Algorithm Using Query Redirection, IJARCSSE,
vol.3, p. 1454-1459, Jul 2013.
S.Lazhar, B.Mourad et.al., Improving Class Cohesion
Measurement: Towards a Novel Approach Using Hierarchical
Clustering, Journal of Software Engineering and Apllications, vol.
5, p. 449-458, Jul 2012.
S.Chatti, G.R. Krishna et.al, A Method to Find Optimum Number
of Clusters Based on Fuzzy Silhouette on Dynamic Data Set,
Procedia Computer Science, vol.46, p. 346-353 2015.
B.Ederson, F.G.Daniel et.al, Silhouette-Based Clustering using an
Immune Network, IEEE World Confress on Computational
Intelligence, Brisbane-Australia, p.1-9, June 2012.
P.K.Shraddha, M.Emmanuel, Review and Comparative Study of
Clustering Techniques, International Journal of Computer Science
and Information Technologies, vol.5, p.805-812, 2014.

N.Thanh, B.Asim et.al, Automatic

spike sorting by
unsupervised clustering with diffusion maps and silhouettes,
Neurocomputing, Vol. 153, p.199-210, April 2015.
[16] J.Han and M.Kamber, Data Mining: Concepts and Techniques,
The Morgankaufmann/ Elsevier,India.

Living Design Workbook (English)
79% (14)
Living Design Workbook (English)
119 pages
DM MODULE 4
No ratings yet
DM MODULE 4
17 pages
By Lior Rokach and Oded Maimon: Clustering Methods
No ratings yet
By Lior Rokach and Oded Maimon: Clustering Methods
5 pages
Cluster Analysis: Consumer Segmentation
No ratings yet
Cluster Analysis: Consumer Segmentation
17 pages
Clustering
No ratings yet
Clustering
6 pages
DA Seminar
No ratings yet
DA Seminar
29 pages
In Marketing, Cluster Analysis Is Used For: Statistical
No ratings yet
In Marketing, Cluster Analysis Is Used For: Statistical
3 pages
DM Unit 5
No ratings yet
DM Unit 5
15 pages
Data Mining Unit 5
No ratings yet
Data Mining Unit 5
30 pages
Cluster Analysis
No ratings yet
Cluster Analysis
2 pages
Cluster Analysis BRM Session 14
No ratings yet
Cluster Analysis BRM Session 14
25 pages
Data Mining and Data Warehouse
No ratings yet
Data Mining and Data Warehouse
9 pages
MA Unit 5
No ratings yet
MA Unit 5
7 pages
Screenshot 2024-05-17 at 3.30.05 PM
No ratings yet
Screenshot 2024-05-17 at 3.30.05 PM
31 pages
Hierarchical Clustering PDF
No ratings yet
Hierarchical Clustering PDF
5 pages
BA2 7 Cluster
No ratings yet
BA2 7 Cluster
33 pages
Knowledge Acquisition and Sharing - Data Mining: INF 791 Lecture 4: Cluster Analysis
No ratings yet
Knowledge Acquisition and Sharing - Data Mining: INF 791 Lecture 4: Cluster Analysis
43 pages
dm 4
No ratings yet
dm 4
76 pages
Data Mining-Unit IV
No ratings yet
Data Mining-Unit IV
15 pages
Cluster Analysis
No ratings yet
Cluster Analysis
15 pages
Cluster Is A Group of Objects That Belongs To The Same Class
No ratings yet
Cluster Is A Group of Objects That Belongs To The Same Class
12 pages
Data Mining Clustering Techniques
No ratings yet
Data Mining Clustering Techniques
3 pages
Cluster Analysis
No ratings yet
Cluster Analysis
20 pages
Dmbi Unit-4
No ratings yet
Dmbi Unit-4
18 pages
Cluster Analysis GP Seminar
No ratings yet
Cluster Analysis GP Seminar
13 pages
Iv Unit DM
No ratings yet
Iv Unit DM
26 pages
Market Segmentation - Cluster Analysis
No ratings yet
Market Segmentation - Cluster Analysis
18 pages
Unit - 4 DM
No ratings yet
Unit - 4 DM
24 pages
Cluster Analysis
No ratings yet
Cluster Analysis
9 pages
Cluster Analysis
No ratings yet
Cluster Analysis
9 pages
Cluster Analysis 6-7
No ratings yet
Cluster Analysis 6-7
7 pages
Unit-IV Cluster Outlier Analysis
No ratings yet
Unit-IV Cluster Outlier Analysis
21 pages
ML Unit 4 Notes - NJ
No ratings yet
ML Unit 4 Notes - NJ
15 pages
Presentation Malo
No ratings yet
Presentation Malo
65 pages
Unit 4
No ratings yet
Unit 4
21 pages
UNIT 4 Clustering and Applications
No ratings yet
UNIT 4 Clustering and Applications
5 pages
dwm exp6 a49
No ratings yet
dwm exp6 a49
7 pages
Cluster Analysis
No ratings yet
Cluster Analysis
25 pages
Lecture-9 Cluster Analysis_LAK
No ratings yet
Lecture-9 Cluster Analysis_LAK
4 pages
Unit 4
No ratings yet
Unit 4
4 pages
Prepared By: Dr. Poonam Khurana: Cluster Analysis
No ratings yet
Prepared By: Dr. Poonam Khurana: Cluster Analysis
10 pages
Cluster Analysis
No ratings yet
Cluster Analysis
33 pages
DM UNIT-5 NOTES
No ratings yet
DM UNIT-5 NOTES
16 pages
Review Paper On Clustering and Validation Techniques
No ratings yet
Review Paper On Clustering and Validation Techniques
5 pages
Clustering
No ratings yet
Clustering
8 pages
Cluster Analysis
No ratings yet
Cluster Analysis
3 pages
Unit 4 Descriptive Modeling
No ratings yet
Unit 4 Descriptive Modeling
18 pages
Unit-4 new
No ratings yet
Unit-4 new
36 pages
DM Cluster Analysis
No ratings yet
DM Cluster Analysis
3 pages
DataMining_Unit4_notes
No ratings yet
DataMining_Unit4_notes
27 pages
Introduction to Cluster Analysis.
No ratings yet
Introduction to Cluster Analysis.
53 pages
CLUSTRING
No ratings yet
CLUSTRING
13 pages
A_new_hierarchical_clustering_algorithm (1)
No ratings yet
A_new_hierarchical_clustering_algorithm (1)
5 pages
Cluster Analysis
No ratings yet
Cluster Analysis
34 pages
Expt-5
No ratings yet
Expt-5
3 pages
ML CO4 SESSION 30 Hierarchical Clustering
No ratings yet
ML CO4 SESSION 30 Hierarchical Clustering
20 pages
Lecture 16
No ratings yet
Lecture 16
29 pages
Practical Software Testing
No ratings yet
Practical Software Testing
3 pages
Unit 5 Clustering-2
No ratings yet
Unit 5 Clustering-2
28 pages
Image Segmentation: Unlocking Insights through Pixel Precision
From Everand
Image Segmentation: Unlocking Insights through Pixel Precision
Fouad Sabry
No ratings yet
Data Structures and Algorithm
From Everand
Data Structures and Algorithm
Knowledge Flow
No ratings yet
Profit and Loss Gr8AmbitionZ
100% (1)
Profit and Loss Gr8AmbitionZ
53 pages
Enhanced Edos-Shield For Mitigating Edos Attacks Originating From Spoofed Ip Addresses
No ratings yet
Enhanced Edos-Shield For Mitigating Edos Attacks Originating From Spoofed Ip Addresses
8 pages
The Tribune TT 23 March 2019 Page 17
No ratings yet
The Tribune TT 23 March 2019 Page 17
1 page
Application For Combined Graduate Level Examination - 2014
No ratings yet
Application For Combined Graduate Level Examination - 2014
1 page
Ibps Po Power Capsule English 2014 PDF
No ratings yet
Ibps Po Power Capsule English 2014 PDF
45 pages
Computer Notes For It Officer
No ratings yet
Computer Notes For It Officer
3 pages
Priyanka
No ratings yet
Priyanka
2 pages
Research Paper On Scada
No ratings yet
Research Paper On Scada
7 pages
Optimization of BER in Cognitive Radio Networks Using FHSS and Energy-Efficient OBRMB Spectrum Sensing
No ratings yet
Optimization of BER in Cognitive Radio Networks Using FHSS and Energy-Efficient OBRMB Spectrum Sensing
40 pages
Sbi Po Prelims Quant Memory Based Mock
0% (1)
Sbi Po Prelims Quant Memory Based Mock
3 pages
Infosys Case Study
No ratings yet
Infosys Case Study
26 pages
GK 2500
No ratings yet
GK 2500
120 pages
Fifth Year
No ratings yet
Fifth Year
23 pages
CAP-II SA Group-II June2021 Final-13
No ratings yet
CAP-II SA Group-II June2021 Final-13
61 pages
Unclos Maritime Zones
No ratings yet
Unclos Maritime Zones
3 pages
GJEDDE, Lisa. Role Game Playinng As A Platform For Crative and Collaborative Learning
No ratings yet
GJEDDE, Lisa. Role Game Playinng As A Platform For Crative and Collaborative Learning
1 page
The DJ Bible
No ratings yet
The DJ Bible
706 pages
Conceptual Framework
100% (1)
Conceptual Framework
5 pages
Container Load Plan As Mate Receipt - China Po# 4001377580 - NGC Id#5011959816
No ratings yet
Container Load Plan As Mate Receipt - China Po# 4001377580 - NGC Id#5011959816
2 pages
MADAM RIDES THE BUS
No ratings yet
MADAM RIDES THE BUS
4 pages
Matrix Algebra: Appendix A
No ratings yet
Matrix Algebra: Appendix A
6 pages
Garden Rail - Issue 306 - February 2020 PDF
No ratings yet
Garden Rail - Issue 306 - February 2020 PDF
66 pages
Bài Tập Đảo Ngữ Câu Điều Kiện
No ratings yet
Bài Tập Đảo Ngữ Câu Điều Kiện
7 pages
Wa0000.
No ratings yet
Wa0000.
3 pages
List of Government Officials 2017
No ratings yet
List of Government Officials 2017
42 pages
Reclaiming Space From MDC
No ratings yet
Reclaiming Space From MDC
18 pages
Disclosure To Promote The Right To Information
No ratings yet
Disclosure To Promote The Right To Information
18 pages
01 80 13 Project Site Design Criteria PDF
No ratings yet
01 80 13 Project Site Design Criteria PDF
2 pages
English Paper 2
No ratings yet
English Paper 2
21 pages
Unit 05 Cell Cycle
No ratings yet
Unit 05 Cell Cycle
13 pages
VP Software Business Development in San Francisco CA Resume Jay Hall
No ratings yet
VP Software Business Development in San Francisco CA Resume Jay Hall
2 pages
AFOAJ0XXOS2-Molex Cable Comply To Australian Standards
No ratings yet
AFOAJ0XXOS2-Molex Cable Comply To Australian Standards
1 page
Civilsyll
No ratings yet
Civilsyll
32 pages
ETSI EN 300 019-2-2 v2.1.2 (1999-09) EE Specification of Environmental Tests Transportation
No ratings yet
ETSI EN 300 019-2-2 v2.1.2 (1999-09) EE Specification of Environmental Tests Transportation
19 pages
Online Bus Ticket Booking
50% (2)
Online Bus Ticket Booking
6 pages
Explaining International Relations Since 1945 PDF
0% (4)
Explaining International Relations Since 1945 PDF
2 pages
Uk Pestle Analysis Part 3
No ratings yet
Uk Pestle Analysis Part 3
10 pages
Well Played
No ratings yet
Well Played
4 pages
Cspl302 Data Structure and Algorithms Lab
No ratings yet
Cspl302 Data Structure and Algorithms Lab
54 pages
TNB Annual Report 2014 PDF
No ratings yet
TNB Annual Report 2014 PDF
272 pages
sample 2024-25-Band-Handbook
No ratings yet
sample 2024-25-Band-Handbook
20 pages
Items List
No ratings yet
Items List
1 page

Clustering

Uploaded by

Clustering

Uploaded by

Cluster Quality Based Performance Evaluation of

Hierarchical Clustering Method

Puneet Jai Kaur

UIET, Panjab University

Cluster analysis is a method to detect the number of

into the following categories: Partitioning clustering,

the objects within the cluster as compared to the

objects outside the cluster [11].

To improve software engineering, a number of data mining

formed. Therefore this method is also known as bottom-up

To evaluate the main approach of our study, we consider three

A silhouette width with a value nearly equal to +1

A silhouette width with a value nearly equal to 0

Lastly, a value nearly equal to -1 means, the object of

Where D (A, B) is the distance to be calculated, C A is the

The silhouette width of an object can be calculated as:

D (A, B) = min {d (i, j)}

Where n is total number of objects in a data set, a i is the

D (A, B) = max {d (i, j)}

Silhouette index has an advantage over all indices used for

Where object i is in cluster A and j is in cluster B respectively.

Hierarchical clustering is an effective method to evaluate the

D (A, B) = average {d (i, j)}

Start with a number of n sub clusters at level L(0) = 0 and

Locate the nearest neighbors that is the neighbors with

Increase the counter number: C = C +1. Merge the

Upgrade the similarity lattice, by erasing the lines and

Stop, if only one clusters is remaining. Else, go to step 2.

METHODOLOGY AND RESULTS

that it gives us the relationship of an object with its

The steps of methodology followed in this paper is given in

Table 1. Elapsed Time for Different Population Size

Table 2. Cohesion Measurement for Different Population Size

Fig. 1. Steps of methodology used

There is no table generated for silhouette index because the

Fig. 2. Graphical representation of Elapsed Time

a low elapsed time always results in a good quality cluster.

Fig. 3. Graphical representation of Cohesion Measurement

Fig.4. Silhouette Plotting for different population size.

The main technique in data mining to improve software

T. Xie, S. Thummalapenta et.al, Data mining for software

N.Thanh, B.Asim et.al, Automatic

You might also like