0% found this document useful (0 votes)
4 views

03 Clustering

03 Clustering

Uploaded by

hrhee1atl
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

03 Clustering

03 Clustering

Uploaded by

hrhee1atl
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 63

Clustering

Course: Artificial Intelligence


Fundamentals

Instructor: Marco Bonzanini


Machine Learning Tasks

Supervised Unsupervised
Discrete Data

Classification Clustering
(predict a label) (group similar items)

Continuous Data
Dimensionality
Regression Reduction
(predict a quantity) (reduce n. of variables)
Machine Learning Tasks

Supervised Unsupervised
Discrete Data

Classification Clustering
(predict a label) (group similar items)

Continuous Data
Dimensionality
Regression Reduction
(predict a quantity) (reduce n. of variables)
Agenda
• Introduction to Clustering

• Clustering Algorithms

• Centroid-based

• Optional: Connectivity-based

• Optional: Density-based

• Evaluation
Introduction to
Clustering
Clustering

• Place similar items in the same group

• (Place dissimilar items in different groups)

• How to define “similarity”?

• Clustering cannot give a comprehensive


description of an object
Clustering Applications

• Customer Segmentation

• Fraud Detection

• Social Network Analysis

• Search engines (navigation, indexing, …)

• … Your use case?


Clustering vs Classification

• Classification:
- supervised
- requires a set of labelled training samples

• Clustering:
- unsupervised
- learns without labels
Classification Example

Training:
items with labels
Classification Example
New, unseen item
?
Training:
items with labels
Classification Example

Training:
items with labels
Prediction:
assign item to class
Clustering Example

Training: no labels
Clustering Example

Training: no labels
Prediction: group items
More Definitions

• “Learn from raw data”

• “Find structure in data”

• “Unsupervised classification”
Flat vs Hierarchical
• Flat approach:

• There’s a number of clusters, and the relation


between clusters is undetermined

• Often start with a random partial partitioning

• Refine it iteratively (e.g. K-Means)

• Measurement: error minimisation


Flat vs Hierarchical
• Hierarchical approach:

• Bottom-up, agglomerative

• Top-down, divisive

• A hierarchy of clusters (i.e. tree structure)

• Measurement: similarity of instances


Hard vs Soft

• Cluster assignment

• Hard clustering: each item belongs to one and only


one cluster (more common)

• Soft clustering: items can belong to more than one


cluster (e.g. a pair of sneakers can be in “sports”
and “shoes”)
Common Issues
• Item representation (e.g. vector)

• Need a notion of distance / similarity

• Ideal: semantic similarity

• Practical: Euclidean distance, cosine similarity

• How many clusters?

• Fixed a priori? Data-driven?


Clustering
Algorithms
Different Approaches

• Centroid-based clustering:
K-Means

• Connectivity-based clustering:
Hierarchical Agglomerative

• Density-based clustering:
DBSCAN
Centroid-based Clustering

• Clusters are represented by their centroid

• Centroid: central vector, centre of gravity, arithmetic


mean

• Centroid: not necessarily a member of the cluster

• Based on distance between items and centroids

• Most common algorithm: K-Means


K-Means Overview
• Input:
- Set of items
- Desired n. of clusters K

• Output:
- A partition of the input set into K clusters

• Assumption:
- Input items are real-valued vectors
- Notion of distance / similarity
K-Means Algorithm

1.Initialise K centroids randomly

2.For each point: assign to closest centroid

3.Update centroids

4.Repeat 2-3 until convergence


K-Means Example

K=2
K-Means Example

1. Centroids: random init


K-Means Example

2. Assign items to
closest centroid
K-Means Example

3. Update centroids
K-Means Example

4. Repeat:
- assign items to centroids
- update centroids
K-Means Example

4. Repeat:
- assign items to centroids
- update centroids
K-Means Example

4. Repeat:
- assign items to centroids
- update centroids
K-Means Example

Convergence!
Assignment Step

• Assign to “closest” centroid

• Closest = least squared Euclidean distance


Update Step

• New centroid is the mean of the cluster


K-Means Discussion

• Pros: intuitive, quite good in practice

• Cons: requires to know (or find out) K

• Elbow method to find out K


Elbow Method

• Intrinsic metric: within-cluster Sum of Squared Error

• a.k.a. Distortion

• If K increases
the distortion decreases

X
Notebook Intermezzo
Clustering - KMeans
Connectivity-based
Clustering
• a.k.a. Hierarchical Clustering

• It needs a notion of pairwise dissimilarity between


groups (clusters), called linkage

• Top-down (divisive): start with all items in one group,


then divide them to maximise within-group similarity

• Bottom-up (agglomerative): start with items in


individual groups, then aggregate the most similar
ones (until one cluster containing all objects is formed)
Agglomerative Clustering
Example
7 2
1

Steps:
6
{1}, {2}, {3}, {4}, {5}, {6}, {7}
{1, 2}, {3}, {4}, {5}, {6}, {7}
{1, 2}, {3, 4}, {5}, {6}, {7}
4
{1, 2}, {3, 4, 5}, {6}, {7}
5
{1, 2, 6}, {3, 4, 5}, {7}
3 {1, 2, 6, 7}, {3, 4, 5}
{1, 2, 3, 4, 5, 6, 7}
Dendrogram

• Used to illustrate the output of hierarchical


clustering

• The algorithm builds a tree-based hierarchical


taxonomy
Dendrogram Example
7 2
1

5 4

3
1 2 6 7 5 3 4
Dendrogram Example on
Iris dataset
From Dendroid to Clusters

• Cutting the dendrogram horizontally partitions the


data points into clusters

• Choice of distance

• Choice of number of clusters


Linkage
• The notion of dissimilarity is described with a distance
function d(G, H), with G and H groups of nodes (cluster
assignments at any level)

• Single linkage: smallest dissimilarity between two points in


opposite groups, i.e. nearest neighbour interpretation

• Complete linkage: largest dissimilarity between two points


in opposite groups, i.e. furthest neighbour interpretation

• Average linkage: average dissimilarity between all points


in opposite groups
Single Linkage
Distance

0.7
• For each point x
there is a point y
in its cluster
where d(x, y) ≤ 0.7
1 2 6 7 5 3 4
Complete Linkage
Distance

0.7
• For each point x
all points y
in its cluster
satisfy d(x, y) ≤ 0.7
1 2 6 7 5 3 4
Average Linkage
Distance

0.7

• Cut interpretation:
there isn’t a good one!

1 2 6 7 5 3 4
Linkage Issues
• Single linkage suffers from chaining.
Only one pair of points needs to be close in order to
merge two groups, i.e. clusters can be spread out and
not very compact

• Complete linkage suffers from crowding.


Score based on worst-case dissimilarity between pairs
of points, i.e. clusters are compact but not far apart

• Average linkage strikes a balance, but doesn’t have a


clear interpretation
More on Linkage
• Centroid linkage (new centroid is avg of all group items)

• Median linkage: like Centroid, but new centroid


calculated as avg of the two old centroids

• Ward linkage (Ward’s variance minimisation)


Hierarchical Clustering
Discussion

• Pros: repeatability (why?), no prior knowledge of K


is required (can choose cut-off threshold, or
number of clusters)

• Cons: complexity (why?)

• No silver bullet
Notebook Intermezzo:
Clustering - Hierarchical
Density-based Clustering
• Clusters are regions in the data space with higher
density, separated by lower density regions

• Objects in sparse areas are considered noise or


borders between clusters

• A cluster is defined as a maximal set of density-


connected points

• Shape of clusters: arbitrary

• Most common algorithm: DBSCAN


DBSCAN Overview

• Density-Based Spatial Clustering of Applications


with Noise

• Given an input set of points, it groups together


points that are closely packed together, i.e. with
many neighbour points.
DBSCAN Concepts
• ε-Neighbourhood
N(p) = {q | d(p, q) ≤ ε }

• High density points (core points)


p is core point if N(p) contains at least minPts objects

• Density-reachable points
q is directly reachable from p if it’s in N(p) and p is a
core point
q is reachable from p if p is a core point and there’s a
path of directly reachable core points between them
DBSCAN Concepts
7 minPts = 2

2
6 1

8
- 1 is reachable from 6
- 3 and 5 are
5 4
ε directly reachable from 4
3 - 7 is an outlier
DBSCAN Algorithm
• Find the ε-neighbour of all points

• Identify the core points (at least minPts)

• Find the connected components of core points,


ignoring the non-core points

• Assign non-core points to nearby cluster, if the


cluster is an ε-neighbourhood, otherwise assign to
noise
DBSCAN Example
DBSCAN Discussion

• Pros: no prior knowledge of K is required, can find


arbitrarily shaped cluster, robust to outliers (notion
of noise)

• Cons: complexity (why?), sensitive to data sets with


large differences in densities
Notebook Intermezzo:
Clustering - DBSCAN
Clustering
Evaluation
Cluster Quality

• How good is the clustering result?

• What is its interpretation?

• What is the purpose of the clustering task?


Internal Evaluation
• Evaluation based on the data set itself

• No external gold standard

• Idea: good clustering produces clusters with high


within-cluster similarity and low between-cluster
similarity

• Drawback: is the evaluation biased?

• e.g. Sum of Squared Errors (SSE)


External Evaluation
• Requires externally supplied labels

• Relationship with classification evaluation

• Metrics: Precision, Recall, F-Measure, Jaccard


Index, Dice Index, …

• Drawback: are we missing out on knowledge


discovery?
Questions?

You might also like