0% found this document useful (0 votes)
58 views

Clustering

The document summarizes two clustering algorithms: 1) K-means clustering which assigns data points to centroids and updates the centroids at each iteration until convergence. 2) Hierarchical agglomerative clustering which starts with each data point in its own cluster and joins the two nearest clusters at each step based on a distance metric until all data points are in one cluster. It provides examples applying each algorithm and questions for the reader to practice implementing k-means clustering.

Uploaded by

suman
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views

Clustering

The document summarizes two clustering algorithms: 1) K-means clustering which assigns data points to centroids and updates the centroids at each iteration until convergence. 2) Hierarchical agglomerative clustering which starts with each data point in its own cluster and joins the two nearest clusters at each step based on a distance metric until all data points are in one cluster. It provides examples applying each algorithm and questions for the reader to practice implementing k-means clustering.

Uploaded by

suman
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Clustering

Summer Term 2019/2020

1 k-means
1. Randomly pick k centroids.

2. Repeat until no example changes cluster in two subsequent iterations:

(a) For each example find the nearest centroid and assign the example to its cluster.
(b) For each cluster, determine the new centroid – the average of all points in the
cluster.

2 Hierarchical agglomerative clustering


1. Begin with each example in its own cluster.

2. Identify two nearest cluster according to an accepted metric and join them. When
determining the distance between two clusters, we can for example compare their
centroids.

3. Repeat step 2 until all examples are in the one cluster.

Questions
Question 1.
Given the following data points:

• A(1, 1),

• B(3, 5),

• C(4, 3),

• D(4, 5).

Starting with centroids c1 (2, 4) and c2 (5, 4), cluster them using k-means.

1
8

0
0 2 4 6 8

Question 2.
Cluster the following data points using k-means, starting with the centroids c1 (0, 1, 0, 1),
c2 (0, 2, 1, 0):

• A(0, 1, 0, 1),

• B(0, 0, 2, 0),

• C(3, 2, -1, 1),

• D(0, 2, 1, 0).

Question 3.
Cluster the following data points using hierarchical agglomerative clustering. Draw the
dendrogram.

• A(0,0),

• B(1,2),

• C(2,1),

• D(5,1),

• E(6,0),

• F(6,3).

2
6

4
F
2 B
C D
0 A E

0 2 4 6

Mini-project: k-means
Implement the k-means algorithm. Cluster the iris dataset in in iris.data (during clu-
stering, ignore the decision attribute).
The program should additionally:

• Allow the user to pick k.

• After every iteration: print the sum/average distance of each point from its centroid.
This value should decrease with every iteration.

• At the end: print the members of each cluster.

• Optional: print measures of cluster homogeneity, eg. percentage of each iris class, or
entropy.

You might also like