DSML-ML09. Unsupervised Learning
DSML-ML09. Unsupervised Learning
Machine Learning
Unsupervised Learning
(Clustering)
Athanasios (Thanos) Voulodimos
Assistant Professor
Inter-cluster
Intra-cluster distances are
distances are maximized
minimized
Applications of Cluster Analysis
Discovered Clusters Industry Group
• Understanding Applied-Matl-DOWN,Bay-Network-Down,3-COM-DOWN,
4 Louisiana-Land-UP,Phillips-Petro-UP,Unocal-UP,
Schlumberger-UP
Oil-UP
• Summarization
• Reduce the size of large data
sets
Clustering precipitation
in Australia
Early applications of cluster analysis
• John Snow, London 1854
Notion of a Cluster can be Ambiguous
• Hierarchical clustering
• A set of nested clusters organized as a hierarchical
tree
Partitional Clustering
p1
p3 p4
p2
p1 p2 p3 p4
Traditional Hierarchical Traditional Dendrogram
Clustering
p1
p3 p4
p2
p1 p2 p3 p4
• Hierarchical clustering
• DBSCAN
K-MEANS
K-means Clustering
• Partitional clustering approach
• Each cluster is associated with a centroid
(center point)
• Each point is assigned to the cluster with the
closest centroid
• Number of clusters, K, must be specified
• The objective is to minimize the sum of
distances of the points to their respective
centroid
K-means Clustering
2.5
1.5
Original Points
y
1
0.5
3 3
2.5 2.5
2 2
1.5 1.5
y
y
1 1
0.5 0.5
0 0
2.5
1.5
y
0.5
2 2 2
y
1 1 1
0 0 0
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x x x
2 2 2
y
1 1 1
0 0 0
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x x x
Importance of Choosing Initial Centroids
Iteration 5
1
2
3
4
3
2.5
1.5
y
0.5
2.5 2.5
2 2
1.5 1.5
y
y
1 1
0.5 0.5
0 0
2 2 2
y
1 1 1
0 0 0
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x x x
Dealing with Initialization
• Do multiple runs and select the clustering with the
smallest error
• Divisive:
• Start with one, all-inclusive cluster
• At each step, split a cluster until each cluster contains a point (or there
are k clusters)
0.1
2
1
0.05
3 1
0
1 3 2 5 4 6
Strengths of Hierarchical Clustering
• Do not have to assume any particular number of
clusters
• Any desired number of clusters can be obtained by
‘cutting’ the dendogram at the proper level
...
p1 p2 p3 p4 p9 p10 p11 p12
Intermediate Situation
• After some merging steps, we have some clusters
C1 C2 C3 C4 C5
C1
C2
C3 C3
C4 C4
C5
C1 Proximity Matrix
C2 C5
...
p1 p2 p3 p4 p9 p10 p11 p12
Intermediate Situation
• We want to merge the two closest clusters (C2 and C5) and
update the proximity matrix.
C1 C2 C3 C4 C5
C1
C2
C3 C3
C4
C4
C5
Proximity Matrix
C1
C2 C5
...
p1 p2 p3 p4 p9 p10 p11 p12
After Merging
• The question is “How do we update the proximity matrix?”
C2
U
C1 C5 C3 C4
C1 ?
C2 U C5 ? ? ? ?
C3
C3 ?
C4
C4 ?
C1 Proximity Matrix
C2 U C5
...
p1 p2 p3 p4 p9 p10 p11 p12
How to Define Inter-Cluster Similarity
p1 p2 p3 p4 p5 ...
p1
Similarity?
p2
p3
p4
p5
! MIN
.
! MAX .
! Group Average .
Proximity Matrix
! Distance Between Centroids
! Other methods driven by an objective
function
– Ward’s Method uses squared error
How to Define Inter-Cluster Similarity
p1 p2 p3 p4 p5 ...
p1
p2
p3
p4
p5
! MIN
.
! MAX .
! Group Average .
Proximity Matrix
! Distance Between Centroids
! Other methods driven by an objective
function
– Ward’s Method uses squared error
How to Define Inter-Cluster Similarity
p1 p2 p3 p4 p5 ...
p1
p2
p3
p4
p5
! MIN
.
! MAX .
! Group Average .
Proximity Matrix
! Distance Between Centroids
! Other methods driven by an objective
function
– Ward’s Method uses squared error
How to Define Inter-Cluster Similarity
p1 p2 p3 p4 p5 ...
p1
p2
p3
p4
p5
! MIN
.
! MAX .
! Group Average .
Proximity Matrix
! Distance Between Centroids
! Other methods driven by an objective
function
– Ward’s Method uses squared error
How to Define Inter-Cluster Similarity
p1 p2 p3 p4 p5 ...
p1
´ ´ p2
p3
p4
p5
! MIN
.
! MAX .
! Group Average .
Proximity Matrix
! Distance Between Centroids
! Other methods driven by an objective
function
– Ward’s Method uses squared error
Hierarchical Clustering: MIN
1 2 3 4 5 6
1 0 .24 .22 .37 .34 .23
5
1 2 .24 0 .15 .20 .14 .25
3 3 .22 .15 0 .15 .28 .11
4 .37 .20 .15 0 .29 .22
5 5 .34 .14 .28 .29 0 .39
2 1
6 .23 .25 .11 .22 .39 0
2 3 6
0.2
4
4 0.15
0.1
0.05
4 0.3
0.25
0.2
0.15
0.1
Nested Clusters Dendrogram
0.05
0
3 6 4 1 2 5
Strength of MAX
p jÎClusterj
proximity(Clusteri , Clusterj ) =
|Clusteri |*|Clusterj |
0.15
0.1
0
3 6 4 1 2 5
Hierarchical Clustering:
Problems and Limitations
• Computational complexity in time and space
• Important Questions:
• How do we measure density?
• What is a dense region?
• DBSCAN:
• Density at point p: number of points within a circle of radius Eps
• Dense Region: A circle of radius Eps that contains at least MinPts
points
DBSCAN
• Characterization of points
• A point is a core point if it has more than a specified
number of points (MinPts) within Eps
• These points belong in a dense region and are at the interior of
a cluster
• Density-connected
• A point p is density-connected to a
point q if there is a path of edges
p q
from p to q
o
DBSCAN Algorithm
• Label points as core, border and noise
• Eliminate noise points
• For every core point p that has not been assigned
to a cluster
• Create a new cluster with the point p and all the
points that are density-connected to p.
• Assign border points to the cluster of the closest
core point.
DBSCAN: Determining Eps and MinPts
• Idea is that for points in a cluster, their kth nearest neighbors
are at roughly the same distance
• Noise points have the kth nearest neighbor at farther distance
• So, plot sorted distance of every point to its kth nearest
neighbor
• Find the distance d where there is a “knee” in the curve
• Eps = d, MinPts = k
Eps ~ 7-10
MinPts = 4
When DBSCAN Works Well
Original Points
Clusters
• Resistant to Noise
• Can handle clusters of different shapes and sizes
When DBSCAN Does NOT Work Well
(MinPts=4, Eps=9.75).
Original Points
• Varying densities
• High-dimensional data
(MinPts=4, Eps=9.92)
DBSCAN: Sensitive to Parameters
Slides source: