Topic 6d - Hierarchical Algorithm
Topic 6d - Hierarchical Algorithm
CLUSTERING: HIERARCHICAL
APPROACH
HIERARCHICAL CLUSTERING
6 5
0.2
4
3 4
0.15 2
5
0.1 2
0.05 1
3 1
0
1 3 2 5 4 6
HIERARCHICAL CLUSTERING
◦ Divisive:
Start with one, all-inclusive cluster
At each step, split a cluster until each cluster contains a point (or there are k clusters)
9 9 9
8 8 8
7 7 7
6 6 6
5 5 5
4 4 4
3 3 3
2 2 2
1 1 1
0 0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
AGGLOMERATIVE CLUSTERING ALGORITHM
Basic algorithm :
1. Compute the dissimilarity matrix
2. Let each data point be a cluster
3. Repeat
4. Merge the two closest clusters
5. Update the dissimilarity matrix
6. Until only a single cluster remains
p2
p3
p4
p5
.
.
. Dissimilarity Matrix
...
p1 p2 p3 p4 p9 p10 p11 p12
INTERMEDIATE SITUATION
c1 c2 c3 c4 c5 ...
c2
we have some clusters
c3
c4
C3 c5
.
C4 .
Dissimilarity Matrix
.
C1
C2 C5
...
p1 p2 p3 p4 p9 p10 p11 p12
INTERMEDIATE SITUATION
c1 c2 c3 c4 c5 ...
• We want to merge the two closest clusters c1
c4
C3 c5
.
C4 . Dissimilarity Matrix
.
C1
C2 C5
...
p1 p2 p3 p4 p9 p10 p11 p12
AFTER MERGING
C2
U
C1 C5 C3 C4
• The question is “How do we
update the dissimilarity C1 ?
matrix?” C2 U C5 ? ? ? ?
C3 ?
C3 C4 ?
C4 Dissimilarity Matrix
C1
C2 U C5
...
p1 p2 p3 p4 p9 p10 p11 p12
HOW TO DEFINE INTER-CLUSTER SIMILARITY
p1 p2 p3 p4 p5 ...
Similarity? p1
p2
p3
p4
MIN p5
MAX .
.
Group Average
Distance Between Centroids
.
Dissimilarity Matrix
Other methods driven by an objective
function
HOW TO DEFINE INTER-CLUSTER SIMILARITY
p1 p2 p3 p4 p5 ...
p1
p2
p3
p4
MIN p5
MAX .
Group Average .
. Dissimilarity Matrix
Distance Between Centroids
Other methods driven by an objective function
HOW TO DEFINE INTER-CLUSTER SIMILARITY
p1 p2 p3 p4 p5 ...
p1
p2
p3
p4
MIN p5
MAX .
Group Average .
. Dissimilarity Matrix
Distance Between Centroids
Other methods driven by an objective function
HOW TO DEFINE INTER-CLUSTER SIMILARITY
p1 p2 p3 p4 p5 ...
p1
p2
p3
p4
MIN p5
MAX .
Group Average .
. Dissimilarity Matrix
Distance Between Centroids
Other methods driven by an objective function
HOW TO DEFINE INTER-CLUSTER SIMILARITY
p1 p2 p3 p4 p5 ...
p1
p2
p3
p4
MIN p5
MAX .
Group Average .
. Dissimilarity Matrix
Distance Between Centroids
Other methods driven by an objective function
MIN AND MAX
MIN or Single Link • MAX or Complete Linkage
• Similarity of two clusters is based on • Similarity of two clusters is based on
the two most similar (closest) points the two least similar (most distant)
in the different clusters points in the different clusters
• Determined by one pair of points, i.e., by • Determined by all pairs of points in the
one link in the dissimilarity graph. two clusters
• Using MIN OPERATOR (minimum • Using MAX OPERATOR (maximum
value) value)
Example: Example:
dist({3,6},{2,5}) dist({3,6},{2,5})
= min (dist(3,2), dist(6,2), dist(3,5), dist(6,5)) = max (dist(3,2), dist(6,2), dist(3,5), dist(6,5))
= min (0.15,0.25,0.28,0.39) = max (0.15,0.25,0.28,0.39)
= 0.15 = 0.39
CLUSTER SIMILARITY: MIN OR SINGLE LINK
1 2 3 4 5
CLUSTER SIMILARITY: MIN OR SINGLE LINK
Point x coordinate y coordinate
p1 0.40 0.53
p2 0.22 0.38 Numeric
p3 0.35 0.32 Attributes
p4 0.26 0.19
p5 0.08 0.41
p6 0.45 0.30
p1 p2 p3 p4 p5 p6
p1 0.00
p2 0.24 0.00
p3 0.22 0.15 0.00
p4 0.37 0.20 0.15 0.00
p5 0.34 0.14 0.28 0.29 0.00
√ 2
𝑑 ( 𝑝 1 , 𝑝 2 ) = ( 𝑥1 − 𝑥 2 ) + ( 𝑦 1 − 𝑦 2 )
2
p1 p2 p3 p4 p5 p6
Draw the dendrogram
p1 0.00 0.2
p2 0.24 0.00
p3 0.22 0.15 0.00 0.15
0
3 6 2 5 4 1
CLUSTER SIMILARITY: MIN OR SINGLE LINK
0.2
dendrogram
0.1
0.05
0
3 6 2 5 4 1
CLUSTER SIMILARITY: MIN OR SINGLE LINK
O Step 3 – form the next cluster by finding the point p1 P(2,5),(3,6)
P(2,5),(3,6) p4
p4
with the minimum distance to the other points in p1
p1 0.00
0.00
the existing clusters; P(2,5),(3,6) ??? 0.00
0.00
P(2,5),(3,6) 0.22
p4
p4 0.37 0.16
0.37 ??? 0.00
0.00
Distance Single link
dist({{3,6},{2,5}}, 4) = min (dist{(3,6), 4}, dist{(2,5), 4}) Updated matrix
= min (0.16,0.20)
= 0.16
dist({{3,6},{2,5}}, 1) = min (dist{(3,6), 1}, dist{(2,5), 1}) 0.2
= min (0.22,0.24)
= 0.22 0.15
0.1
0.05
Draw the dendogram
0
3 6 2 5 4 1
CLUSTER SIMILARITY: MIN OR SINGLE LINK
O Step 4 – form the next cluster by finding the distance p1 P((2,5),(3,6)),4
to the existing clusters; p1 0.00
P((2,5),(3,6)),4 ???
0.22 0.00
Distance Single link
Updated matrix
dist({{{3,6},{2,5}}, 4},1) = min ((dist({{3,6},{2,5}}, 1), dist(4,1))
= min (0.22,0.37)
= 0.22 0.2
0.15
0.1
Draw the dendogram
0.05
0
3 6 2 5 4 1
P(((2,5),(3,6)),4),1
Final matrix P(((2,5),(3,6)),4),1 0 Terminate
CLUSTER SIMILARITY: MIN OR SINGLE LINK
5
1 0.2
3
0.15
5
2 1 0.1
2 3 6 0.05
0
3 6 2 5 4 1
4
4
CLUSTER SIMILARITY: MAX OR COMPLETE LINKAGE
1 2 3 4 5
CLUSTER SIMILARITY: MAX OR COMPLETE LINKAGE
p1 p2 p3 p4 p5 p6 0.1
p1 0.00
0.05
p2 0.24 0.00
p3 0.22 0.15 0.00 0
3 6 2 5 4 1
Points Distance
dist({3,6},{2,5}) = max (dist(3,2), dist(6,2), dist(3,5), dist(6,5)) p1 P(2,5) P(3,6) p4
= max (0.15,0.25,0.28,0.39) p1 0.00
= 0.39
dist({3,6},{1}) = max (dist(3,1), dist(6,1)) P(2,5) 0.34 0.00
= max (0.22,0.23) P(3,6) 0.23 0.39 0.00
= 0.23
p4 0.37 0.29 0.22 0.00
dist({3,6},{4}) = max (dist(3,4), dist(6,4))
= max (0.16,0.22)
= 0.22 UPDATED MATRIX
dist({2,5},{1}) = max (dist(2,1), dist(5,1))
= max (0.24,0.34)
= 0.34
dist({2,5},{4}) = max (dist(2,4), dist(5,4))
= max (0.20,0.29)
= 0.29
CLUSTER SIMILARITY: MAX OR COMPLETE LINKAGE
O Step 2 – form the next cluster by finding the point with the
minimum distance to the other points in the existing
clusters;
newly formed cluster {(3,6), 4}
Draw dendrogram
p1 P(2,5) P(3,6) p4 0.4
p1 0.00 0.35
0.3
P(2,5) 0.34 0.00 0.25
0.15
p4 0.37 0.29 0.22 0.00 0.1
0.05
0
3 6 4 1 2 5
CLUSTER SIMILARITY: MAX OR COMPLETE LINKAGE
0.35
finding the point with the Draw the dendogram 0.3
0.2
0.1
0
3 6 4 1 2 5
CLUSTER SIMILARITY: MAX OR COMPLETE LINKAGE
O Step 4 – form the next cluster by finding the distance to the other points in the
existing clusters; newly formed cluster {{{2,5}, 1},{{3,6},4}}
P(2,5),1 P(3,6), 4 Points Distance
dist({{3,6},{4}}, = max (dist{(3,6),4}, (2,5)}, dist{(3,6),4}, 1}})
P(2,5),1 0.00 {{2,5},1}) = max (0.39,0.37)
P(3,6),4 0.39 0.00 = 0.39
0.4
UPDATED MATRIX
0.35
0.3 P({{3,6},{4}},
0.25
{{2,5},1})
Draw the dendrogram 0.2 P({{3,6},{4}}, {{2,5},1}) 0
0.15
0.1
0.05
0
3 6 4 1 2 5 Terminate
CLUSTER SIMILARITY: MAX OR COMPLETE LINKAGE
Nested Clusters
Dendrogram
4 1
0.4
2 5 0.35
5 0.3
2 0.25
0.2
3 6
0.15
0.1
3
1
0.05
0
3 6 4 1 2 5
4
HIERARCHICAL CLUSTERING: COMPARISON
Single Link OR MIN Complete Link OR MAX
5
1 4
3 1
2 5
5
2 1 5
2
2 3 6 3 6
3
1
4
4 4
0.4
0.2 0.35
0.3
0.15 0.25
0.2
0.1
0.15
0.1
0.05
0.05
0 0
3 6 2 5 4 1 3 6 4 1 2 5
SUMMARY