6 - Chapter 6 - Hierarchical Clustering
6 - Chapter 6 - Hierarchical Clustering
Hierarchical Clustering
• Outline
• A Hierarchical clustering method works via grouping data into a tree of
clusters.
• Hierarchical clustering begins by treating every data points as a separate
cluster.
vertebrate invertebrate
• with C ⊂ C for t = 2,...,5. We assume that A and B are merged before C and
t-1 t
D.
• Number of Hierarchical Clustering
The number of different nested or hierarchical clustering corresponds to
the number
of different binary rooted trees or dendrograms with n leaves with
distinct labels.
• Any tree with t nodes has t -1 edges.
1 edges= t-1
• Also, any rooted binary tree with m leaves has m-1 internal nodes.
• Thus, a dendrogram with m leaf nodes has a total of t = m + m - 1 =
2m-1 nodes, T node= 2m-1
• and consequently edges. = t -1 = 2m-2 edges. edges = 2m-1
•
• Exercise1
• How many number of (t nodes , m leaf , edges) in the figures below
• Exercise1
• calculate the number of (t nodes edges ) depend on m leaf in the figures
below
• How many number of (t nodes edges) in the figures below
Dendrogram: Hierarchical Clustering
• Clustering obtained
by cutting the
dendrogram at a
desired level: each
connected
component forms a
cluster.
• Determining clusters
• One of the problems with hierarchical clustering is that there is no objective
way to say how many clusters there are.
• If we cut the single linkage tree at the point shown below, we would say that
there are two clusters.
• However, if we cut the tree lower we might say that there is one cluster and
two singletons / Node /Leaf
Hierarchical Clustering algorithms
• Agglomerative (bottom-up): أسلوب التجميع
• -starts with each object in a separate cluster.
• Clusters are formed by grouping objects into bigger and bigger clusters.
• Divisive (top-down): أسلوب التجزئة أو التقسيم
• starts with all the objects grouped in a single cluster. Clusters are divided or split
until each object is in a separate cluster.
• Does not require the number of clusters k in advance
• Needs a termination/readout condition
Hierarchical Agglomerative Clustering (HAC)
p2
p3
p4
p5
MIN
.
MAX
.
Group Average .
Proximity Matrix
Distance Between Centroids
Other methods driven by an objective
function
– Ward’s Method uses squared error
• 2-The complete linkage method is based on the maximum
distance or the furthest neighbor approach.
How to Define Inter-Cluster Similarity
p1 p2 p3 p4 p5 ...
p1
p2
p3
p4
p5
MIN
.
MAX
.
Group Average .
Proximity Matrix
Distance Between Centroids
Other methods driven by an objective
function
– Ward’s Method uses squared error
• 3- The average linkage method the distance between two
clusters is defined as the average of the distances between all
pairs of objects
How to Define Inter-Cluster Similarity
p1 p2 p3 p4 p5 ...
p1
p2
p3
p4
p5
MIN
.
MAX
.
Group Average .
Proximity Matrix
Distance Between Centroids
Other methods driven by an objective
function
– Ward’s Method uses squared error
Linkage Methods of Clustering
Single Linkage
Minimum Distance
Cluster 1 Cluster 2
Complete Linkage
Maximum Distance
Cluster 1 Cluster 2
Average Linkage
Average Distance
Cluster 1 Cluster 2
Select a Similarity Measure
• Similarity measure can be correlations or distances
7/4/2023 27
• Note: The main idea of Agglomerative algorithm is
• Let’s say we have six data points A, B, C, D, E, F.
• Step-1:
Consider each alphabet as a single cluster and calculate the distance of one cluster from
all the other clusters.
• Step-2:
In the second step comparable clusters are COMBIN / merged together to form a single
cluster. Let’s say cluster (B) and cluster (C) are very similar to each other therefore we
merge them in the
• Step3: similarly with cluster (D) and (E)
• Step4: get the clusters [(A), (BC), (DE), (F)]
• Step-5:
We recalculate the proximity according to the algorithm and merge the two nearest
clusters([(DE), (F)]) together to form new clusters as [(A), (BC), (DEF)]
•
• Step-6:
Repeating the same process; The clusters DEF and BC are comparable and
merged together to form a new cluster. We’re now left with clusters [(A), (BCDEF)].
• Step-7:
At last the two remaining clusters are merged together to form a single cluster
[(ABCDEF)].
7/4/2023 29
Nonhierarchical Clustering Methods
-In the sequential threshold method, a cluster center is selected and all
objects within a prespecified threshold value from the center are grouped
together.
-In the parallel threshold method, several cluster centers are selected and
objects within the threshold level are grouped with the nearest center.
-The optimizing partitioning method differs from the two threshold procedures
in that objects can later be reassigned to clusters to optimize an overall criterion,
such as average within cluster distance for a given number of clusters.
Classification of Clustering Procedures
Clustering Procedures
Fig. 20.4
Hierarchical Nonhierarchical
Agglomerative Divisive
Ward’s
Method