0% found this document useful (0 votes)
27 views

QMM: Exercise Sheet 5 - Clustering: Fabien Baeriswyl, J Er Ome Reboulleau, Tom Ruszkiewicz

This document contains instructions for three exercises using clustering methods on different datasets. Exercise 1 involves hierarchical clustering of car data, comparing single, complete, and average linkage approaches. Exercise 2 uses k-means clustering on country data, exploring the number of clusters and effects of rescaling and multiple runs. Exercise 3 has the student cluster US state data and discuss any patterns in the resulting clusters.

Uploaded by

laurine.hoyo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

QMM: Exercise Sheet 5 - Clustering: Fabien Baeriswyl, J Er Ome Reboulleau, Tom Ruszkiewicz

This document contains instructions for three exercises using clustering methods on different datasets. Exercise 1 involves hierarchical clustering of car data, comparing single, complete, and average linkage approaches. Exercise 2 uses k-means clustering on country data, exploring the number of clusters and effects of rescaling and multiple runs. Exercise 3 has the student cluster US state data and discuss any patterns in the resulting clusters.

Uploaded by

laurine.hoyo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

QMM: Exercise Sheet 5 - Clustering

Fabien Baeriswyl, Jérôme Reboulleau, Tom Ruszkiewicz

Exercise 1. For this exercise, use the Cars.csv datafile. This dataset consists of 32 cars from different
manufacturers, for which we record information about their prices, fuel consumption, and power notably. We want
to cluster the data. In this exercise do not rescale the dataset unless explicitly asked to do so.
a) First, provide a quick exploratory analysis of the data. Use visuals.
b) Create a function that computes the Euclidean distance between two vectors (of same size, but this size can
be anything). Use it to find the distance between the Ford Mondeo LX and the Ford Galaxy LX (the first two
observations of the dataset).
c) Compute the distance matrix between all the 32 observations.
d) Run a single-linkage hierarchical clustering on the data. Plot the result. Discuss any specific feature.
e) Run a complete-linkage hierarchical clustering on the data. Plot the result. Discuss any specific feature.
f) Run an average-linkage hierarchical clustering on the data. Plot the result. Discuss any specific feature.
Comparing it to the two previous models, which one is best? Briefly explain.
g) Under the complete-linkage and the average-linkage cluster models, using 4 clusters, provide the repartition of
the observations in these 4 clusters.
h) Rescale your dataset. Then, run the average-linkage model again and discuss any change.

Exercise 2. For this exercise, use the Country.csv datafile. This dataset consists of 109 countries for which we
have indicators about their gdp, literacy and urban population among others.
a) Rescale the data.
b) Use a visual approach to determine how many clusters you should use in a k-means partitioning method.
c) Run k-means clustering without setting the nstart argument to anything. Observe the changes while running
this function over and over again. Comment briefly.
d) Run k-means clustering setting nstart to 25. Plot the result and discuss briefly.
e) Run k-means clustering using 3 clusters. Discuss on the changes.
f) Plot the gdp versus the literacy and emphasise the clusters from your 3-cluster model from point e).

Exercise 3. For this exercise, use the South.csv datafile. This consists of 16 US states and their characteristics
in 1965, among which their mean temperature and precipitation, income, share of african americans and GOP vote
share (for Grand Old Party, formally the Republican Party) for various elections. Run clustering methods of your
choice to cluster the states using the tools from the previous exercises. Use the resulting clusters to discuss some
cluster particularities.

You might also like