Clustering
Clustering
Clustering
Waqar Aziz
• What is clustering?
• Why bother with it?
• Types of clustering algorithms
• K-Means
• Hierarchical clustering
Recap
(Géron, 2019)
Types of clustering algorithms
Centroid-based Density-based Distribution-based Hierarchical clustering
k value 15
What are the strengths?
• Fast
• Scalable
What are the limitations?
• Choosing 𝑘 manually – it’s a hassle!
5 E Increasing similarity
4 D
3
2 C
1 A
0 B
0 1 2 3 4
5 E Branches
4 D
3
2 C
1 A
0 B
0 1 2 3 4
What are the 2 main approaches to hierarchical clustering?
6
5 E
1) Agglomerative 2) Divisive
4 D
3
B C D E 2 C ABCDE
1 A
0 B
AB 0 1 2 3 4
DE
ABC
ABC
DE
AB
ABCDE E
A B C D
which clusters should be combined, or split?
Increasing similarity
• Hierarchical clustering is proximity-based
Increasing similarity
• Hierarchical clustering is proximity-based
• Default is complete-linkage
1) Load in dataset
x y
6
Dps sepal length Petal
(cm) length 5 E
(cm)
4 D
A 1 1
3
B 1 0
2 C
C 0 2
D 2 4 1 A
E 3 5 0 B
0 0.5 1 1.5 2 2.5 3 3.5
Step by step…
A B C D E
AB DE C ABC DE
AB 0
ABC 0
DE 5.4 0
DE 5.4 0
C 2.2 4.2 0
K=?
What are the limitations?