0% found this document useful (0 votes)

5 views

lecture_06

Notes

Uploaded by

bhavanid07092002

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

lecture_06

Notes

Uploaded by

bhavanid07092002

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

Model-based clustering

Gaussian mixture models

Erik-Jan van Kesteren & Daniel L. Oberski

Last week
• Hierarchical clustering
• K-means clustering
• Assessing cluster solutions
• Stability
• Internal metrics
• External validation
Today
• Model-based clustering
• Maximum likelihood estimation
• EM algorithm
• Multivariate model-based clustering
• Assumptions & restrictions

• Goal: understand, apply, and assess model-based

clustering methods
Reading materials
• Mixture models: latent profile and
latent class analysis (Oberski, 2016)
http://daob.nl/wp-
content/papercite-
data/pdf/oberski2016mixturemode
ls.pdf
• MBCC sections 2.1 and 2.2
Model-based clustering
K-means again
1. Assign examples to 𝐾 clusters

2. a. Calculate K cluster
centroids;

b. Assign examples to cluster

with closest centroid;

3. If assignments changed, back

to step 2a; else stop.
K-means again
• K-means is based on a rule
• Why this rule and not some other rule?
• What kind of data does the rule work well for?
• In what situations would the rule fail?
• What happens if we want to change the rule?

All difficult to answer by staring at the algorithm.

K-means again
• k-means algorithm makes
clusters which are circular in
the space of the data.
• Is this reasonable?
• Maybe x and y covary within
the clusters, in the same way
or even differently?
• Maybe we need ellipses?
Model-based clustering
Steps:
1. Pretend we believe in some statistical model that describes
data as belonging to unobserved (“latent”) groups;
2. Estimate (“train”) this model using the data.

The rule follows from the model!

• Instead of worrying about algorithm, we worry about model.
• Earlier mentioned questions are easier to answer.
Model-based clustering
• Assumptions about the clusters are explicit, not implicit.
• We will look at the most used family of models:

Gaussian mixture models (GMMs)

• Data within each cluster (multivariate) normally distributed.
• Parameters can be either the same or different across groups:
• Volume (size of the clusters in data space);
• Shape (circle or ellipse);
• Orientation (the angle of the ellipse).
Model-based clustering
Another major advantage
• For each observation, get a posterior probability of
belonging to each cluster
• Reflects that cluster membership is uncertain
• Cluster assignment can be done based on the highest
probability cluster for each observation
Model-based clustering
Remember silhouette?
• 𝑎𝑖 = avg. distance to
fellow cluster
members (cohesion)
• 𝑏𝑖 = min. distance to
member from
different cluster
(separation)
𝑏𝑖 − 𝑎𝑖
𝑠𝑖 =
max 𝑎𝑖 , 𝑏𝑖 Introduction to
data mining
Model-based clustering
Specific examples of model-based clustering:
• Gaussian mixture models
• Latent profile analysis
• Latent class analysis (categorical observations)
• Latent Dirichlet allocation
Gaussian mixture modelling
Model-based clustering
• Statistical model + assumptions defines a likelihood:

𝑝 𝑑𝑎𝑡𝑎 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠) = 𝑝 𝑦 𝜃)

• Maximum likelihood estimation: find the parameters 𝜃 for which

it is most likely to observe this data
• This is how models can be estimated / fit / trained

• NB: the model and its assumptions are debatable!

Model-based clustering
Likelihood (density) for height data:

𝑝 ℎ𝑒𝑖𝑔ℎ𝑡 𝜃) =
𝑃𝑟 𝑚𝑎𝑛 𝑁𝑜𝑟𝑚𝑎𝑙 𝜇𝑚𝑎𝑛 , 𝜎𝑚𝑎𝑛 +
𝑃𝑟(𝑤𝑜𝑚𝑎𝑛)𝑁𝑜𝑟𝑚𝑎𝑙(𝜇𝑤𝑜𝑚𝑎𝑛 , 𝜎𝑤𝑜𝑚𝑎𝑛 )

Or, in clearer notation:

𝑝 ℎ𝑒𝑖𝑔ℎ𝑡 𝜃) =
𝜋1𝑋 𝑁𝑜𝑟𝑚𝑎𝑙 𝜇1 , 𝜎1 +
(1 − 𝜋1𝑋 )𝑁𝑜𝑟𝑚𝑎𝑙 𝜇2 , 𝜎2
Model-based clustering
Gaussian mixture parameters:
• 𝜋1𝑋 determines the relative cluster sizes
• Proportion of observations to be expected in each cluster
• 𝜇1 and 𝜇2 determine the locations of the clusters
• Like centroids in k-means clustering
• 𝜎1 and 𝜎2 determine the volume of the clusters
• how large / spread out the clusters are in data space

Together, these 5 unknown parameters describe our model of how the

data is generated.
Estimation: the EM algorithm
If we know who is a man and who is a woman, it’s easy to find
the maximum likelihood estimates for 𝜇 and 𝜎:
σ𝑁1
𝑖=1 ℎ𝑒𝑖𝑔ℎ𝑡𝑖
σ𝑁1
𝑖=1 ℎ𝑒𝑖𝑔ℎ𝑡𝑖 − 𝜇Ƹ 1
2
𝜇Ƹ 1 = , 𝜎ො1 =
𝑁1 𝑁1 − 1

(and same for 𝜇ො2 and 𝜎ො2 )

But we don’t know this!

-> Assignments need to be estimated too.
Estimation: the EM algorithm
• Solution: Figure out the posterior probability of being a
man/woman, given the current estimates of the means and
sds
• If we know cluster locations and shapes,
how likely is it that a 1.7m person is
a man or a woman?

𝑋 2.20
𝜋𝑚𝑎𝑛 = ≈ 0.77
2.86
Estimation: the EM algorithm
• Now we have some class assignments (probabilities);
• So we can go back to the parameters and update them using
our easy rule (M-step)
• Then, we can compute new posterior probabilities (E-step)

Does it remind you of something…?

Estimation: the EM algorithm
Live coding EM
Break
Multivariate model-based
clustering
Multivariate model-based
clustering
• With 2 observed features:
• mean becomes a vector of 2 means
• standard deviation turns into a 2x2 variance-covariance matrix
determining the shape of the cluster
• So we have multiple within-cluster parameters:
• Two means
• Two variances, one for each observed variable
• A single covariance among the features
• Together, the 11 parameters define the likelihood in bivariate
space, which from the top looks like ellipses
Multivariate normal distribution

𝑁𝑜𝑟𝑚𝑎𝑙 𝑥; 𝜇, 𝜎 =

𝑀𝑉𝑁 𝑥; 𝜇, 𝜎 =
Multivariate model-based
clustering
𝑝 𝒚 𝜃) = 𝜋1𝑋 𝑀𝑉𝑁 𝝁𝟏 , 𝚺𝟏 + 1 − 𝜋1𝑋 𝑀𝑉𝑁 𝝁𝟐 , 𝚺𝟐
Estimation: the EM algorithm
Multivariate model-based
clustering
• Cluster shape parameters (the variance-covariance matrix)
can be constrained to be equal across clusters
• Same as k-means
• Can also be different across clusters
• not possible in k-means
• More flexible, complex model
• Think about the bias-variance tradeoff!
TOP SECRET SLIDE
• K-means clustering is a GMM with the following model:
• All prior class proportions are 1/K
• EII model: equal volume, only circles
• All posterior probabilities are either 0 or 1
TOP SECRET SLIDE 2
• GMM has trouble with clusters that are not ellipses
• Secret weapon: merging

Powerful idea:
• Start with Gaussian mixture solution
• Merge “similar” components to create non-Gaussian clusters

NB: we’re distinguishing “components” from “clusters” now

Merging

library(mclust)
out <- Mclust(x)
com <-
clustCombi(out)

plot(com)
Assessing clustering results
Methods to assess whether the obtained clusters are “good”:
• Stability (previous lecture)
• External validity (previous lecture)
• Model fit
Model fit
How well does the model fit to the data?
Log-likelihood
𝑁 𝑁

ℓ 𝜃 = log 𝑝(𝑦|𝜃) = log ෑ 𝑝 𝑦𝑛 𝜃 = ෍ log 𝑝(𝑦𝑛 |𝜃)

𝑛=1 𝑛=1
The higher the log-likelihood, the more likely the data (if we
assume this model is correct)
Deviance
−2 ⋅ ℓ(𝜃) (lower deviance is better)
Information criteria
Deviance forms the basis of information criteria, which balance
fit and complexity

Akaike information criterion

𝐴𝐼𝐶 = −2ℓ 𝜃 + 2𝑘
(where k is the number of parameters)

Bayesian information criterion

𝐵𝐼𝐶 = −2ℓ 𝜃 + 𝑘 log 𝑛
(where n is the number of rows in your data)
Information criteria
Think: bias and variance tradeoff!
• Variance also has to do with stability

Better fit & lower complexity = better cluster solution

(other assessment methods also available for model-based

clustering)
High-dim!
How to do GMM in high dimensions?
• Same solution as we are used to by now!
• Perform clustering on dimension reduction version of original data
• Integrate regularization / dimension reduction in your GMM
optimization method

• Bouveyron et al. (2007) High-dimensional data clustering;

Computational Statistics & Data Analysis 52, 502 – 519
• The second solution
• Akin to “mixtures of probabilistic PCA”
Model-based clustering in R
Model-based clustering in R
• Mclust implements multivariate model-based clustering
• Provides an easy interface to fit several parameterizations
• Model comparison with BIC
• Plotting functionality
Model-based clustering in R
• Mclust uses an identifier for each possible parametrization of
the cluster shape: E for equal, V for variable in:
• Volume (size of the clusters in data space)
• Shape (circle or ellipse)
• Orientation (the angle of the ellipse)
• So an EEE model has equal volume, shape and orientation
• A VVV model has variable volume, shape, and orientation
• A VVE model has variable volume and shape but equal
orientation
Model-based clustering in R
Model-based clustering in R
VVV, 3 clusters

• How Mclust optimizes

hyperparameters:
• Fit all the models with up to
9 clusters (or more, your
choice!)
• Compute the BIC of each
model
• Choose the model with the
lowest BIC
Practical: perform model-based
clustering
Take-home exercises: 1-11
Questions?

Hourglass Workout Program by Luisagiuliet 2
76% (21)
Hourglass Workout Program by Luisagiuliet 2
51 pages
12 Week Program: Summer Body Starts Now
87% (46)
12 Week Program: Summer Body Starts Now
70 pages
Read People Like A Book by Patrick King-Edited
57% (83)
Read People Like A Book by Patrick King-Edited
12 pages
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
77% (13)
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
260 pages
Cheat Code To The Universe
94% (79)
Cheat Code To The Universe
34 pages
Facial Gains Guide (001 081)
91% (45)
Facial Gains Guide (001 081)
81 pages
Curse of Strahd
95% (467)
Curse of Strahd
258 pages
The Psychiatric Interview - Daniel Carlat
91% (34)
The Psychiatric Interview - Daniel Carlat
473 pages
The Borax Conspiracy
91% (57)
The Borax Conspiracy
14 pages
The Secret Language of Attraction
86% (108)
The Secret Language of Attraction
278 pages
How To Develop and Write A Grant Proposal
83% (542)
How To Develop and Write A Grant Proposal
17 pages
Penis Enlargement Secret
60% (124)
Penis Enlargement Secret
12 pages
Workbook For The Body Keeps The Score
89% (53)
Workbook For The Body Keeps The Score
111 pages
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
83% (1016)
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
13 pages
KamaSutra Positions
78% (69)
KamaSutra Positions
55 pages
7 Hermetic Principles
93% (30)
7 Hermetic Principles
3 pages
27 Feedback Mechanisms Pogil Key
77% (13)
27 Feedback Mechanisms Pogil Key
6 pages
Frank Hammond - List of Demons
92% (92)
Frank Hammond - List of Demons
3 pages
Phone Codes
79% (28)
Phone Codes
5 pages
36 Questions That Lead To Love
91% (35)
36 Questions That Lead To Love
3 pages
How 2 Setup Trust
97% (307)
How 2 Setup Trust
3 pages
100 Questions To Ask Your Partner
78% (36)
100 Questions To Ask Your Partner
2 pages
The 36 Questions That Lead To Love - The New York Times
91% (35)
The 36 Questions That Lead To Love - The New York Times
3 pages
Satanic Calendar
25% (56)
Satanic Calendar
4 pages
The 36 Questions That Lead To Love - The New York Times
95% (21)
The 36 Questions That Lead To Love - The New York Times
3 pages
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
100% (8)
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
27 pages
Jeffrey Epstein39s Little Black Book Unredacted PDF
75% (12)
Jeffrey Epstein39s Little Black Book Unredacted PDF
95 pages
System Software Unit 1 Exercise Solution
100% (12)
System Software Unit 1 Exercise Solution
8 pages
1001 Songs
70% (73)
1001 Songs
1,798 pages
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
23% (954)
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
38 pages
Zodiac Sign & Their Most Common Addictions
63% (30)
Zodiac Sign & Their Most Common Addictions
9 pages
Shimura Correspondence Wee Teck PDF
100% (1)
Shimura Correspondence Wee Teck PDF
51 pages
EzHULL Users Manual - en
No ratings yet
EzHULL Users Manual - en
78 pages
15_GMC
No ratings yet
15_GMC
4 pages
iris_mbc_solution
No ratings yet
iris_mbc_solution
6 pages
ML Lecture06 Unsupervised Learning
No ratings yet
ML Lecture06 Unsupervised Learning
87 pages
Pattern Analysis-Machine Learning
No ratings yet
Pattern Analysis-Machine Learning
74 pages
Week 4 - Lecture Slides - K-Means, Mixture Models, & EM
No ratings yet
Week 4 - Lecture Slides - K-Means, Mixture Models, & EM
65 pages
EM and Kmeans relations
No ratings yet
EM and Kmeans relations
70 pages
Expectation-Maximization Clustring V2
No ratings yet
Expectation-Maximization Clustring V2
9 pages
Clustering, K-Means,. Expectation Maximization, Mean Shift, Classifier Ensembles, Bagging, Boosting
No ratings yet
Clustering, K-Means,. Expectation Maximization, Mean Shift, Classifier Ensembles, Bagging, Boosting
21 pages
Clustering Mixture
No ratings yet
Clustering Mixture
22 pages
Model-Based Clustering
No ratings yet
Model-Based Clustering
23 pages
Mixture Models and Clustering
No ratings yet
Mixture Models and Clustering
8 pages
Symmetrical Based Projects
No ratings yet
Symmetrical Based Projects
105 pages
Concepts and Techniques: - Chapter 11
No ratings yet
Concepts and Techniques: - Chapter 11
103 pages
DSA5102_lecture10
No ratings yet
DSA5102_lecture10
40 pages
Gaussian Mixture Mode
No ratings yet
Gaussian Mixture Mode
30 pages
401 Week7 Part 2 EM Algorithm
No ratings yet
401 Week7 Part 2 EM Algorithm
58 pages
Topic: Machine Learning
No ratings yet
Topic: Machine Learning
35 pages
GaussianMixtureModel(GMM)_0a8d7758700f041bd57d8aef0862eb14
No ratings yet
GaussianMixtureModel(GMM)_0a8d7758700f041bd57d8aef0862eb14
18 pages
Lec. 15-Final. ClusAdvanced
No ratings yet
Lec. 15-Final. ClusAdvanced
103 pages
Unit 5
No ratings yet
Unit 5
5 pages
ML.5-Clustering Techniques (Week 9)
No ratings yet
ML.5-Clustering Techniques (Week 9)
71 pages
EML %th Module
No ratings yet
EML %th Module
40 pages
Gaussian Mixture Modelling GMM
No ratings yet
Gaussian Mixture Modelling GMM
11 pages
Unit 2 - SVM
No ratings yet
Unit 2 - SVM
137 pages
PROBABILISTIC Learning Jb-new
No ratings yet
PROBABILISTIC Learning Jb-new
13 pages
Module - 5 - ECE3047 - Machine Learning
No ratings yet
Module - 5 - ECE3047 - Machine Learning
52 pages
2.2 - 2.3 Clutering (Hier)
No ratings yet
2.2 - 2.3 Clutering (Hier)
25 pages
Week 5 v1.1 - Unsupervised Learning
No ratings yet
Week 5 v1.1 - Unsupervised Learning
40 pages
AIMLB PGP 2024 Session 12
No ratings yet
AIMLB PGP 2024 Session 12
46 pages
Latent Clustering W Mplus v2
No ratings yet
Latent Clustering W Mplus v2
57 pages
Lecture Expectation Maximization
No ratings yet
Lecture Expectation Maximization
58 pages
Clustering, K-Means,. Expectation Maximization, Mean Shift, Classifier Ensembles, Bagging, Boosting
No ratings yet
Clustering, K-Means,. Expectation Maximization, Mean Shift, Classifier Ensembles, Bagging, Boosting
21 pages
APznzab0G8iLD5cDfn798Gn-fXshRpam8ullbf6ZS5Hd4l0BEcKNHy9gDG24DS66RfgvnKXAQjMAivMmmi5cmDWF9tqOaPMy3afuzafCU1kpG1xfQIr7b98q406ZWiqt50nL8WhMI6azoYzWSgf7c7khnqww3VlQ9I90ROmc0QL4DbmipYYoLleGYR6TO4UYmc_PsaQB5v0XmLUwPEub3QuwGdUnUEr2dp_hV4bds0MuRbpJ
No ratings yet
APznzab0G8iLD5cDfn798Gn-fXshRpam8ullbf6ZS5Hd4l0BEcKNHy9gDG24DS66RfgvnKXAQjMAivMmmi5cmDWF9tqOaPMy3afuzafCU1kpG1xfQIr7b98q406ZWiqt50nL8WhMI6azoYzWSgf7c7khnqww3VlQ9I90ROmc0QL4DbmipYYoLleGYR6TO4UYmc_PsaQB5v0XmLUwPEub3QuwGdUnUEr2dp_hV4bds0MuRbpJ
34 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
64 pages
Unsupervised Learning-01
No ratings yet
Unsupervised Learning-01
42 pages
Machine Learning & Data Mining: Understanding
No ratings yet
Machine Learning & Data Mining: Understanding
7 pages
Week3 Statnlp Web
No ratings yet
Week3 Statnlp Web
58 pages
Model-Based Clustering: o K o K
No ratings yet
Model-Based Clustering: o K o K
6 pages
Expectation Maximization
No ratings yet
Expectation Maximization
23 pages
CLUSTER: An Unsupervised Algorithm For Modeling Gaussian Mixtures
No ratings yet
CLUSTER: An Unsupervised Algorithm For Modeling Gaussian Mixtures
20 pages
Week 10
No ratings yet
Week 10
50 pages
Image Segmentation1
No ratings yet
Image Segmentation1
42 pages
Unit 3 Updated Notes
No ratings yet
Unit 3 Updated Notes
29 pages
Statistical Methods For NLP: Document and Topic Clustering, K-Means, Mixture Models, Expectation-Maximization
No ratings yet
Statistical Methods For NLP: Document and Topic Clustering, K-Means, Mixture Models, Expectation-Maximization
47 pages
Machine Learning & Data Mining
No ratings yet
Machine Learning & Data Mining
108 pages
Get One More Story in Your Member Preview When You Sign Up. It's Free
No ratings yet
Get One More Story in Your Member Preview When You Sign Up. It's Free
12 pages
ML RUSA Module 6 Probablistic EM KNN SVM
No ratings yet
ML RUSA Module 6 Probablistic EM KNN SVM
51 pages
Tutorial em
No ratings yet
Tutorial em
57 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
Class19-22 Clustering 17-25oct2019
No ratings yet
Class19-22 Clustering 17-25oct2019
42 pages
1. Clustering
No ratings yet
1. Clustering
75 pages
Gaussian Mixture Models Unit-III
No ratings yet
Gaussian Mixture Models Unit-III
13 pages
CLUSTERING PPT 1233
No ratings yet
CLUSTERING PPT 1233
18 pages
Introduction To (Statistical) Machine Learning
No ratings yet
Introduction To (Statistical) Machine Learning
30 pages
ELLIPTICAL MIXTURE MODELS IMPROVE THE ACCURACY OF GAUSSIAN MIXTURE MODELS WITH EXPECTATIONMAXIMIZATION ALGORITHM
No ratings yet
ELLIPTICAL MIXTURE MODELS IMPROVE THE ACCURACY OF GAUSSIAN MIXTURE MODELS WITH EXPECTATIONMAXIMIZATION ALGORITHM
20 pages
L 8 Clustering
No ratings yet
L 8 Clustering
58 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Mathematics for Data Science: Linear Algebra with Matlab
From Everand
Mathematics for Data Science: Linear Algebra with Matlab
César Pérez López
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
From Everand
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
César Pérez López
No ratings yet
Lab 11
No ratings yet
Lab 11
2 pages
Effect of Piperine On Pharmacokinetics of Rifampicin and Isoniazid: Development and Validation of High Performance Liquid Chromatography Method
No ratings yet
Effect of Piperine On Pharmacokinetics of Rifampicin and Isoniazid: Development and Validation of High Performance Liquid Chromatography Method
10 pages
Practical 7
No ratings yet
Practical 7
2 pages
Ess at 12 CN Stu
No ratings yet
Ess at 12 CN Stu
11 pages
Ideas May Refer To An Actual Reality or To The Idea Itself. Other Ideas May Be As They Are or As Product of The
No ratings yet
Ideas May Refer To An Actual Reality or To The Idea Itself. Other Ideas May Be As They Are or As Product of The
7 pages
DS Notes
No ratings yet
DS Notes
170 pages
Iso 13715 - 2000
100% (1)
Iso 13715 - 2000
16 pages
Parent Overview Es3 Graphing
No ratings yet
Parent Overview Es3 Graphing
2 pages
Fundamental Finite Element Analysis and Applications with Mathematica and MATLAB Computations 1st Edition M. Asghar Bhatti pdf download
100% (1)
Fundamental Finite Element Analysis and Applications with Mathematica and MATLAB Computations 1st Edition M. Asghar Bhatti pdf download
47 pages
MCQ Geometry
No ratings yet
MCQ Geometry
44 pages
Radix and Bucket Sort Notes
No ratings yet
Radix and Bucket Sort Notes
4 pages
Predictive Modelling Using Linear Regression
No ratings yet
Predictive Modelling Using Linear Regression
12 pages
Topics: Vector Calculus Identities
No ratings yet
Topics: Vector Calculus Identities
7 pages
Right Triangle Definitions, Where 0 /2: MVCC Learning Commons IT129
No ratings yet
Right Triangle Definitions, Where 0 /2: MVCC Learning Commons IT129
1 page
Odds Ratio
No ratings yet
Odds Ratio
3 pages
Solution To 2020 AIME I
No ratings yet
Solution To 2020 AIME I
3 pages
IIT Madras Syllabus PDF
No ratings yet
IIT Madras Syllabus PDF
2 pages
Experiment 9 - Energy Conservation in 2 Dimensions
No ratings yet
Experiment 9 - Energy Conservation in 2 Dimensions
6 pages
Charts For Water Hammer in Low Head Pump Discharge
No ratings yet
Charts For Water Hammer in Low Head Pump Discharge
27 pages
TRG Control Valve
No ratings yet
TRG Control Valve
13 pages
Co3 - Math
No ratings yet
Co3 - Math
4 pages
Half Book paper of Physics 2025 Class 9th
No ratings yet
Half Book paper of Physics 2025 Class 9th
2 pages
(Ohm's Law) Tutorial 2 For Electricals
No ratings yet
(Ohm's Law) Tutorial 2 For Electricals
5 pages
MTH202
No ratings yet
MTH202
388 pages
Btech Me 5 Sem Heat and Mass Transfer Rme 502 2018 19 PDF
No ratings yet
Btech Me 5 Sem Heat and Mass Transfer Rme 502 2018 19 PDF
2 pages
AMTA Assignment AMTA B (Aswin Avni Navya)
No ratings yet
AMTA Assignment AMTA B (Aswin Avni Navya)
13 pages
Constituents and Architecture of Composite Materials
No ratings yet
Constituents and Architecture of Composite Materials
14 pages

lecture_06

Uploaded by

lecture_06

Uploaded by

Model-based clustering

Gaussian mixture models

Erik-Jan van Kesteren & Daniel L. Oberski

• Goal: understand, apply, and assess model-based

b. Assign examples to cluster

3. If assignments changed, back

All difficult to answer by staring at the algorithm.

The rule follows from the model!

Gaussian mixture models (GMMs)

• Maximum likelihood estimation: find the parameters 𝜃 for which

• NB: the model and its assumptions are debatable!

Or, in clearer notation:

Together, these 5 unknown parameters describe our model of how the

(and same for 𝜇ො2 and 𝜎ො2 )

But we don’t know this!

Does it remind you of something…?

NB: we’re distinguishing “components” from “clusters” now

ℓ 𝜃 = log 𝑝(𝑦|𝜃) = log ෑ 𝑝 𝑦𝑛 𝜃 = ෍ log 𝑝(𝑦𝑛 |𝜃)

Akaike information criterion

Bayesian information criterion

Better fit & lower complexity = better cluster solution

(other assessment methods also available for model-based

• Bouveyron et al. (2007) High-dimensional data clustering;

• How Mclust optimizes

You might also like