Data Mining - Lecture 9
Data Mining - Lecture 9
Intelligence
Partitioning
Clustering Hierarchical
Density
By
Dr. Nora Shoaip
Lecture 8
Damanhour University
Faculty of Computers & Information Sciences
Department of Information Systems
2023 - 2024
Outline
The Basics
o What is Cluster Analysis?
o Requirements for Cluster Analysis
o Overview of methods
Partitioning Methods
o K-Means
Hierarchical Methods
o Agglomerative vs. Divisive
o Distance Measures
Density-Based Methods
o DBSCAN
2
What is Cluster Analysis?
5
Overview of Cluster Analysis Methods Partitioning
Overview of Cluster Analysis Methods
Hierarchical
Overview of Cluster Analysis Methods Density-
based
Overview of Cluster
Analysis Methods
Grid-based
Overview of Cluster Analysis Methods
Method Characteristics
Partitioning — Find mutually exclusive clusters of spherical shape
methods — Distance-based
— May use mean or medoid to represent cluster center
— Effective for small- to medium-size data sets
11
Partitioning Methods
K-Means
Partitioning Methods
K-Means
Partitioning Methods K-Means
Partitioning Methods K-Means
Cluster the eight points in table using k-means. (15 points) A1 A2
Assume that k = 3 and that initially the points are assigned to
clusters as follows: C1 = {x1, x2, x3}, C2 = {x4, x5, x6}, C3 = x1 2 10
{x7, x8}. x2 2 5
• Apply the k-means algorithm until convergence (i.e., until the x3 8 4
clusters do not change), using the Manhattan distance. x4 5 8
(Hint: The Manhattan distance is: d(i, j) = |xi1-xj1|+ |xi2-xj2|+ ….+
| xin-xjn|.) Make sure you clearly identify the final clustering and x5 7 5
show your steps. x6 6 4
A1 A2
x1 2 10
x2 2 5
x3 8 4
x4 5 8
x5 7 5
x6 6 4
x7 1 2
x8 4 9
Partitioning Methods K-Means
X1 X2 X3 X4 X5 X6 X7 X8
(2,10 (2,5) (8,4) (5,8) (7,5) (6,4) (1,2) (4,9)
)
5 1 7 5 5 5 5 5
Partitioning Methods K-Means
Factors to consider:
Selection of k
Selection of initial centroids
Calculation of dissimilarity
Calculation of cluster means
When it fails!
Clusters with very different sizes & with concave shapes
Hierarchical Methods
Agglomerative versus Divisive Clustering
Hierarchical clustering group data objects into a hierarchy or “tree” of
clusters
Agglomerative bottom-up (merge) composition
Each object has its own cluster
Two clusters that are closest merged into a bigger cluster
Iteratively merge till termination condition or single cluster is formed
Divisive top-down (split) composition
All objects in one big cluster
Divide into subclusters
Recursively divide subclusters into even smaller subclusters
Terminate when each object has his own cluster or objects in clusters are
similar “enough”
Hierarchical Methods
Agglomerative Clustering
Agglomerative
AGNES
Step 0 Step 1
minimum
distance a
ab
b
e
Hierarchical Methods
Agglomerative Clustering
Agglomerative
AGNES
Step 0 Step 1 Step 2
ab
d
de
e
Measure distance between c, d, e and individual elements in cluster {a,b}, choose any
with minimum distance (single linkage)
Hierarchical Methods
Agglomerative Clustering
Agglomerative
AGNES
Step 0 Step 1 Step 2 Step 3
ab
cde
de
Measure distance between c and individual elements in cluster {a,b} and {d,e}, as
well as distance between pairs in {a,b} and {d,e}, choose any with minimum distance
(single linkage)
Hierarchical Methods
Agglomerative Clustering
Agglomerative
AGNES
Step 0 Step 1 Step 2 Step 3 Step 4
ab
abcde
cde
Hierarchical Methods
Agglomerative Clustering
Step 4 Step 3 Step 2 Step 1 Step 0
a
ab
b
c abcde
d cde
de
e
c abcde
d cde
de
e