0% found this document useful (0 votes)
5 views

lecture_06

Notes

Uploaded by

bhavanid07092002
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

lecture_06

Notes

Uploaded by

bhavanid07092002
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

Model-based clustering

Gaussian mixture models

Erik-Jan van Kesteren & Daniel L. Oberski


Last week
• Hierarchical clustering
• K-means clustering
• Assessing cluster solutions
• Stability
• Internal metrics
• External validation
Today
• Model-based clustering
• Maximum likelihood estimation
• EM algorithm
• Multivariate model-based clustering
• Assumptions & restrictions

• Goal: understand, apply, and assess model-based


clustering methods
Reading materials
• Mixture models: latent profile and
latent class analysis (Oberski, 2016)
http://daob.nl/wp-
content/papercite-
data/pdf/oberski2016mixturemode
ls.pdf
• MBCC sections 2.1 and 2.2
Model-based clustering
K-means again
1. Assign examples to 𝐾 clusters

2. a. Calculate K cluster
centroids;

b. Assign examples to cluster


with closest centroid;

3. If assignments changed, back


to step 2a; else stop.
K-means again
• K-means is based on a rule
• Why this rule and not some other rule?
• What kind of data does the rule work well for?
• In what situations would the rule fail?
• What happens if we want to change the rule?

All difficult to answer by staring at the algorithm.


K-means again
• k-means algorithm makes
clusters which are circular in
the space of the data.
• Is this reasonable?
• Maybe x and y covary within
the clusters, in the same way
or even differently?
• Maybe we need ellipses?
Model-based clustering
Steps:
1. Pretend we believe in some statistical model that describes
data as belonging to unobserved (“latent”) groups;
2. Estimate (“train”) this model using the data.

The rule follows from the model!


• Instead of worrying about algorithm, we worry about model.
• Earlier mentioned questions are easier to answer.
Model-based clustering
• Assumptions about the clusters are explicit, not implicit.
• We will look at the most used family of models:

Gaussian mixture models (GMMs)


• Data within each cluster (multivariate) normally distributed.
• Parameters can be either the same or different across groups:
• Volume (size of the clusters in data space);
• Shape (circle or ellipse);
• Orientation (the angle of the ellipse).
Model-based clustering
Another major advantage
• For each observation, get a posterior probability of
belonging to each cluster
• Reflects that cluster membership is uncertain
• Cluster assignment can be done based on the highest
probability cluster for each observation
Model-based clustering
Remember silhouette?
• 𝑎𝑖 = avg. distance to
fellow cluster
members (cohesion)
• 𝑏𝑖 = min. distance to
member from
different cluster
(separation)
𝑏𝑖 − 𝑎𝑖
𝑠𝑖 =
max 𝑎𝑖 , 𝑏𝑖 Introduction to
data mining
Model-based clustering
Specific examples of model-based clustering:
• Gaussian mixture models
• Latent profile analysis
• Latent class analysis (categorical observations)
• Latent Dirichlet allocation
Gaussian mixture modelling
Model-based clustering
• Statistical model + assumptions defines a likelihood:

𝑝 𝑑𝑎𝑡𝑎 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠) = 𝑝 𝑦 𝜃)

• Maximum likelihood estimation: find the parameters 𝜃 for which


it is most likely to observe this data
• This is how models can be estimated / fit / trained

• NB: the model and its assumptions are debatable!


Model-based clustering
Likelihood (density) for height data:

𝑝 ℎ𝑒𝑖𝑔ℎ𝑡 𝜃) =
𝑃𝑟 𝑚𝑎𝑛 𝑁𝑜𝑟𝑚𝑎𝑙 𝜇𝑚𝑎𝑛 , 𝜎𝑚𝑎𝑛 +
𝑃𝑟(𝑤𝑜𝑚𝑎𝑛)𝑁𝑜𝑟𝑚𝑎𝑙(𝜇𝑤𝑜𝑚𝑎𝑛 , 𝜎𝑤𝑜𝑚𝑎𝑛 )

Or, in clearer notation:

𝑝 ℎ𝑒𝑖𝑔ℎ𝑡 𝜃) =
𝜋1𝑋 𝑁𝑜𝑟𝑚𝑎𝑙 𝜇1 , 𝜎1 +
(1 − 𝜋1𝑋 )𝑁𝑜𝑟𝑚𝑎𝑙 𝜇2 , 𝜎2
Model-based clustering
Gaussian mixture parameters:
• 𝜋1𝑋 determines the relative cluster sizes
• Proportion of observations to be expected in each cluster
• 𝜇1 and 𝜇2 determine the locations of the clusters
• Like centroids in k-means clustering
• 𝜎1 and 𝜎2 determine the volume of the clusters
• how large / spread out the clusters are in data space

Together, these 5 unknown parameters describe our model of how the


data is generated.
Estimation: the EM algorithm
If we know who is a man and who is a woman, it’s easy to find
the maximum likelihood estimates for 𝜇 and 𝜎:
σ𝑁1
𝑖=1 ℎ𝑒𝑖𝑔ℎ𝑡𝑖
σ𝑁1
𝑖=1 ℎ𝑒𝑖𝑔ℎ𝑡𝑖 − 𝜇Ƹ 1
2
𝜇Ƹ 1 = , 𝜎ො1 =
𝑁1 𝑁1 − 1

(and same for 𝜇ො2 and 𝜎ො2 )

But we don’t know this!


-> Assignments need to be estimated too.
Estimation: the EM algorithm
• Solution: Figure out the posterior probability of being a
man/woman, given the current estimates of the means and
sds
• If we know cluster locations and shapes,
how likely is it that a 1.7m person is
a man or a woman?

𝑋 2.20
𝜋𝑚𝑎𝑛 = ≈ 0.77
2.86
Estimation: the EM algorithm
• Now we have some class assignments (probabilities);
• So we can go back to the parameters and update them using
our easy rule (M-step)
• Then, we can compute new posterior probabilities (E-step)

Does it remind you of something…?


Estimation: the EM algorithm
Live coding EM
Break
Multivariate model-based
clustering
Multivariate model-based
clustering
• With 2 observed features:
• mean becomes a vector of 2 means
• standard deviation turns into a 2x2 variance-covariance matrix
determining the shape of the cluster
• So we have multiple within-cluster parameters:
• Two means
• Two variances, one for each observed variable
• A single covariance among the features
• Together, the 11 parameters define the likelihood in bivariate
space, which from the top looks like ellipses
Multivariate normal distribution

𝑁𝑜𝑟𝑚𝑎𝑙 𝑥; 𝜇, 𝜎 =

𝑀𝑉𝑁 𝑥; 𝜇, 𝜎 =
Multivariate model-based
clustering
𝑝 𝒚 𝜃) = 𝜋1𝑋 𝑀𝑉𝑁 𝝁𝟏 , 𝚺𝟏 + 1 − 𝜋1𝑋 𝑀𝑉𝑁 𝝁𝟐 , 𝚺𝟐
Estimation: the EM algorithm
Multivariate model-based
clustering
• Cluster shape parameters (the variance-covariance matrix)
can be constrained to be equal across clusters
• Same as k-means
• Can also be different across clusters
• not possible in k-means
• More flexible, complex model
• Think about the bias-variance tradeoff!
TOP SECRET SLIDE
• K-means clustering is a GMM with the following model:
• All prior class proportions are 1/K
• EII model: equal volume, only circles
• All posterior probabilities are either 0 or 1
TOP SECRET SLIDE 2
• GMM has trouble with clusters that are not ellipses
• Secret weapon: merging

Powerful idea:
• Start with Gaussian mixture solution
• Merge “similar” components to create non-Gaussian clusters

NB: we’re distinguishing “components” from “clusters” now


Merging

library(mclust)
out <- Mclust(x)
com <-
clustCombi(out)

plot(com)
Assessing clustering results
Methods to assess whether the obtained clusters are “good”:
• Stability (previous lecture)
• External validity (previous lecture)
• Model fit
Model fit
How well does the model fit to the data?
Log-likelihood
𝑁 𝑁

ℓ 𝜃 = log 𝑝(𝑦|𝜃) = log ෑ 𝑝 𝑦𝑛 𝜃 = ෍ log 𝑝(𝑦𝑛 |𝜃)


𝑛=1 𝑛=1
The higher the log-likelihood, the more likely the data (if we
assume this model is correct)
Deviance
−2 ⋅ ℓ(𝜃) (lower deviance is better)
Information criteria
Deviance forms the basis of information criteria, which balance
fit and complexity

Akaike information criterion


𝐴𝐼𝐶 = −2ℓ 𝜃 + 2𝑘
(where k is the number of parameters)

Bayesian information criterion


𝐵𝐼𝐶 = −2ℓ 𝜃 + 𝑘 log 𝑛
(where n is the number of rows in your data)
Information criteria
Think: bias and variance tradeoff!
• Variance also has to do with stability

Better fit & lower complexity = better cluster solution

(other assessment methods also available for model-based


clustering)
High-dim!
How to do GMM in high dimensions?
• Same solution as we are used to by now!
• Perform clustering on dimension reduction version of original data
• Integrate regularization / dimension reduction in your GMM
optimization method

• Bouveyron et al. (2007) High-dimensional data clustering;


Computational Statistics & Data Analysis 52, 502 – 519
• The second solution
• Akin to “mixtures of probabilistic PCA”
Model-based clustering in R
Model-based clustering in R
• Mclust implements multivariate model-based clustering
• Provides an easy interface to fit several parameterizations
• Model comparison with BIC
• Plotting functionality
Model-based clustering in R
• Mclust uses an identifier for each possible parametrization of
the cluster shape: E for equal, V for variable in:
• Volume (size of the clusters in data space)
• Shape (circle or ellipse)
• Orientation (the angle of the ellipse)
• So an EEE model has equal volume, shape and orientation
• A VVV model has variable volume, shape, and orientation
• A VVE model has variable volume and shape but equal
orientation
Model-based clustering in R
Model-based clustering in R
VVV, 3 clusters

• How Mclust optimizes


hyperparameters:
• Fit all the models with up to
9 clusters (or more, your
choice!)
• Compute the BIC of each
model
• Choose the model with the
lowest BIC
Practical: perform model-based
clustering
Take-home exercises: 1-11
Questions?

You might also like