Clustring
Clustring
htm
Example of Complete Linkage Clustering
Complete linkage
Single linkage
1. The first two points (7 and 10) are close to each other and
should be in the same cluster
2. Also, the last two points (28 and 35) are close to each other
and should be in the same cluster
Cluster 1 : (7,10)
Cluster 2 : (20,28,35)
2. Complete Linkage : In complete link hierarchical clustering, we
merge in the members of the clusters in each step, which provide
the smallest maximum pairwise distance.
Cluster 2 : (28,35)
Hierarchical Clustering
Hierarchical clustering involves creating clusters that have a predetermined ordering from top to bottom. For exampl
and folders on the hard disk are organized in a hierarchy. There are two types of hierarchical
clustering, Divisive and Agglomerative.
Divisive method
In divisive or top-down clustering method we assign all of the observations to a single cluster and then partition the c
two least similar clusters using a flat clustering method (e.g., K-Means). Finally, we proceed recursively on each cluste
there is one cluster for each observation. There is evidence that divisive algorithms produce more accurate hierarchie
agglomerative algorithms in some circumstances but is conceptually more complex.
Agglomerative method
In agglomerative or bottom-up clustering method we assign each observation to its own cluster. Then, compute the s
(e.g., distance) between each of the clusters and join the two most similar clusters. Finally, repeat steps 2 and 3 until
only a single cluster left. The related algorithm is shown below.
Before any clustering is performed, it is required to determine the proximity matrix containing the distance between
using a distance function. Then, the matrix is updated to display the distance between each cluster. The following thr
methods differ in how the distance between each cluster is measured.
Single Linkage
In single linkage hierarchical clustering, the distance between two clusters is defined as the shortest distance between
points in each cluster. For example, the distance between clusters “r” and “s” to the left is equal to the length of the a
between their two closest points.
Complete Linkage
In complete linkage hierarchical clustering, the distance between two clusters is defined as the longest distance betw
points in each cluster. For example, the distance between clusters “r” and “s” to the left is equal to the length of the a
between their two furthest points.
Average Linkage
In average linkage hierarchical clustering, the distance between two clusters is defined as the average distance betwe
point in one cluster to every point in the other cluster. For example, the distance between clusters “r” and “s” to the
equal to the average length each arrow between connecting the points of one cluster to the other.
X1 X2
A 10 5
B 1 4
C 5 8
D 9 2
E 12 10
F 15 8
G 7 7
Step 1: Calculate distances between all data points using Euclidean distance function. The shortest distance is betwee
points C and G.
A B C D E F
B 9.06
C 5.83 5.66
D 3.16 8.25 7.21
E 5.39 12.53 7.28 14.42
F 5.83 14.56 10.00 16.16 3.61
G 3.61 6.71 2.24 8.60 5.83 8.06
Step 2: We use "Average Linkage" to measure the distance between the "C,G" cluster and other data points.
A B C,G D E
B 9.06
C,G 4.72 6.10
D 3.16 8.25 6.26
E 5.39 12.53 6.50 14.42
F 5.83 14.56 9.01 16.16 3.61
Step 3:
A,D B C,G E
B 8.51
C,G 5.32 6.10
E 6.96 12.53 6.50
F 7.11 14.56 9.01 3.61
Step 4:
A,D B C,G
B 8.51
C,G 5.32 6.10
E,F 6.80 13.46 7.65
Step 5:
A,D,C,G B
B 6.91
E,F 6.73 13.46
Step 6:
A,D,C,G,E,F
B 9.07
Final dendrogram:
What is Hierarchical Clustering?
Hierarchical clustering is a popular method for grouping objects. It
creates groups so that objects within a group are similar to each other
and different from objects in other groups. Clusters are visually
represented in a hierarchical tree called a dendrogram.
Hierarchical clustering has a couple of key benefits:
1. There is no need to pre-specify the number of clusters. Instead, the dendrogram can be cut
at the appropriate level to obtain the desired number of clusters.
Applications
There are many real-life applications of Hierarchical clustering. They
include:
Python Code:
https://www.learndatasci.com/glossary/hierarchical-clustering/
https://towardsdatascience.com/machine-learning-algorithms-part-12-
hierarchical-agglomerative-clustering-example-in-python-1e18e0075019
https://www.analyticsvidhya.com/blog/2019/05/beginners-guide-
hierarchical-clustering/
https://www.w3schools.com/python/python_ml_hierarchial_clustering.as
p
Decision tree
https://youtu.be/zNYdkpAcP-g
https://www.w3schools.com/python/python_ml_decision_tree.asp
Video link:
https://youtu.be/v7oLMvcxgFY
https://vitalflux.com/hierarchical-clustering-explained-with-python-
example/