Clustering: Unsupervised Learning
Clustering: Unsupervised Learning
Unsupervised learning
introduction
Machine Learning
Supervised learning
Training set:
Andrew Ng
Unsupervised learning
Training set:
Andrew Ng
Applications of clustering
Input:
- (number of clusters)
- Training set
(drop convention)
Andrew Ng
K-means algorithm
}
Andrew Ng
Andrew Ng
Andrew Ng
K-means for non-separated clusters
T-shirt sizing
Weight
Height
Andrew Ng
K-means for non-separated clusters
T-shirt sizing
Weight
Height
Andrew Ng
Clustering
Optimization
objective
Machine Learning
K-means optimization objective
= index of cluster (1,2,…, ) to which example is currently
assigned
= cluster centroid ( )
= cluster centroid of cluster to which example has been
assigned
Optimization objective:
Andrew Ng
K-means algorithm
Repeat {
for = 1 to
:= index (from 1 to ) of cluster centroid
closest to
for = 1 to
:= average (mean) of points assigned to cluster
}
Andrew Ng
Andrew Ng
Andrew Ng
Clustering
Random
initialization
Machine Learning
K-means algorithm
Repeat {
for = 1 to
:= index (from 1 to ) of cluster centroid
closest to
for = 1 to
:= average (mean) of points assigned to cluster
}
Andrew Ng
Random initialization
Should have
Andrew Ng
Depending on the initialization of cluster
centroids K-means can produce different results
Local optima
Andrew Ng
Depending on the initialization of cluster
centroids K-means can produce different results
Local optima
Andrew Ng
Random initialization
For i = 1 to 100 {
Andrew Ng
Andrew Ng
Andrew Ng
Andrew Ng
Clustering
Choosing the
number of clusters
Machine Learning
What is the right value of K?
Andrew Ng
Choosing the value of K
Elbow method:
Cost function
Cost function
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
Andrew Ng
Andrew Ng
Andrew Ng
Andrew Ng
Andrew Ng
Andrew Ng
Choosing the value of K
Sometimes, you’re running K-means to get clusters to use for some
later/downstream purpose. Evaluate K-means based on a metric for
how well it performs for that later purpose.
Weight
Weight
Height Height
Andrew Ng