0% found this document useful (0 votes)
79 views

Clustering: Unsupervised Learning

This document discusses the k-means clustering algorithm. It begins with an introduction to clustering and unsupervised learning. It then explains the k-means algorithm, which takes as input the number of clusters k and randomly assigns cluster centroids initially. It iterates between assigning examples to their closest centroids and recalculating the centroid positions until convergence. The document discusses challenges like non-separated clusters and notes that k-means can find local optima depending on random initialization. It concludes with methods for choosing the optimal number of clusters k, such as the elbow method.

Uploaded by

sourabh
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views

Clustering: Unsupervised Learning

This document discusses the k-means clustering algorithm. It begins with an introduction to clustering and unsupervised learning. It then explains the k-means algorithm, which takes as input the number of clusters k and randomly assigns cluster centroids initially. It iterates between assigning examples to their closest centroids and recalculating the centroid positions until convergence. The document discusses challenges like non-separated clusters and notes that k-means can find local optima depending on random initialization. It concludes with methods for choosing the optimal number of clusters k, such as the elbow method.

Uploaded by

sourabh
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 44

Clustering

Unsupervised learning
introduction

Machine Learning
Supervised learning

Training set:
Andrew Ng
Unsupervised learning

Training set:
Andrew Ng
Applications of clustering

Market segmentation Social network analysis

Image credit: NASA/JPL-Caltech/E. Churchwell (Univ. of Wisconsin, Madison)

Organize computing clusters Astronomical data analysis


Andrew Ng
Andrew Ng
Clustering
K-means
algorithm
Machine Learning
Andrew Ng
Andrew Ng
Andrew Ng
Andrew Ng
Andrew Ng
Andrew Ng
Andrew Ng
Andrew Ng
Andrew Ng
K-means algorithm assumption

Input:
- (number of clusters)
- Training set

(drop convention)

Andrew Ng
K-means algorithm

Randomly initialize cluster centroids


Repeat {
for = 1 to
:= index (from 1 to ) of cluster centroid
closest to
for = 1 to
:= average (mean) of points assigned to cluster

}
Andrew Ng
Andrew Ng
Andrew Ng
K-means for non-separated clusters

T-shirt sizing

Weight
Height

Andrew Ng
K-means for non-separated clusters

T-shirt sizing

Weight
Height

Andrew Ng
Clustering
Optimization
objective
Machine Learning
K-means optimization objective
= index of cluster (1,2,…, ) to which example is currently
assigned
= cluster centroid ( )
= cluster centroid of cluster to which example has been
assigned
Optimization objective:

Andrew Ng
K-means algorithm

Randomly initialize cluster centroids

Repeat {
for = 1 to
:= index (from 1 to ) of cluster centroid
closest to
for = 1 to
:= average (mean) of points assigned to cluster
}
Andrew Ng
Andrew Ng
Andrew Ng
Clustering
Random
initialization
Machine Learning
K-means algorithm

Randomly initialize cluster centroids

Repeat {
for = 1 to
:= index (from 1 to ) of cluster centroid
closest to
for = 1 to
:= average (mean) of points assigned to cluster
}
Andrew Ng
Random initialization
Should have

Randomly pick training


examples.

Set equal to these


examples.

Andrew Ng
Depending on the initialization of cluster
centroids K-means can produce different results
Local optima

Andrew Ng
Depending on the initialization of cluster
centroids K-means can produce different results
Local optima

Andrew Ng
Random initialization
For i = 1 to 100 {

Randomly initialize K-means.


Run K-means. Get .
Compute cost function (distortion)

Pick clustering that gave lowest cost

Andrew Ng
Andrew Ng
Andrew Ng
Andrew Ng
Clustering
Choosing the
number of clusters
Machine Learning
What is the right value of K?

Andrew Ng
Choosing the value of K
Elbow method:
Cost function

Cost function
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8

(no. of clusters) (no. of clusters)

Andrew Ng
Andrew Ng
Andrew Ng
Andrew Ng
Andrew Ng
Andrew Ng
Choosing the value of K
Sometimes, you’re running K-means to get clusters to use for some
later/downstream purpose. Evaluate K-means based on a metric for
how well it performs for that later purpose.

E.g. T-shirt sizing T-shirt sizing

Weight
Weight

Height Height
Andrew Ng

You might also like