401 Week7 Part 2 EM Algorithm
401 Week7 Part 2 EM Algorithm
Department of Statistics
Week 7 Part 2
Basic Ideas
The EM algorithm is used in situations where you have both missing data and unknown
parameters.
Example from ritvikmath: https://www.youtube.com/watch?v=xy96ArOpntA
Let’s say you are told the values (1, 2, x) are drawn from a known distribution: Normal(mean
= 1, sd = 1)
What is your best guess for the missing value x?
The answer is easy: best guess is the mean of the distribution = 1.
Let’s say you are given the values (1, 2, 0) are are told they are drawn from a distribution:
Normal(mean = µ, sd = 1)
What is your best guess for µ?
This time, the mean of the distribution is unknown.
Based on the data, your best guess (the maximum likelihood estimate) of the mean of the
distribution is the mean of your sample = (0+1+2)/3 = 1.
Let’s say you have some data as well as a missing value (1, 2, x). You know they are drawn
from a distribution with an unknown mean: Normal(mean = µ, sd = 1)
What is your best guess for the missing value x and the unknown parameter µ?
We apply the EM algorithm to Gaussian Mixtures. The data consists of points that are
generated from several Gaussian distributions.
In this case, the missing data is to which distribution a point belongs.
The parameters (mean and variance) of the Gaussian distributions are also unknown.
We will use the EM algorithm iteratively to figure out both the unknown parameters and the
unknown cluster assignments.
0
−5
−10
−15 −10 −5 0 5 10 15
X[, 1]
Copyright Miles Chen. For personal use only. Do not distribute. 14 / 58
The mixture consists of three components
plot(X[,1], X[,2], col = X[,3], cex = 0.5, asp = 1, main = "Plot with cluster
Plot with cluster labels. Ususally, these are unknown to us.
10
5
X[, 2]
0
−5
−10
−15 −10 −5 0 5 10 15
X[, 1]
Copyright Miles Chen. For personal use only. Do not distribute. 15 / 58
Some points could be “assigned” to more than one cluster
0
−5
−10
−15 −10 −5 0 5 10 15
X[, 1]
Rather than strictly assigning a point to one cluster like in K-means clustering, we use a Bayes
classifier to get a probabilistic assignment - what we might call a membership weight. Each
point now has a vector of probabilities (that add to 1).
Once we have the membership weights, we calculate the parameters of each cluster
distribution. We use a weighted mean and weighted variance-covariance matrix using the
membership weights of the points in the clusters. This maximize the likelihood of the values
“assigned” to the cluster.
We iterate back and forth between calculating membership weights (E-step) and recalculating
the parameters of the cluster distributions (M-step).
It could be useful to think of the EM algorithm as a blend between k-means clustering and a
Bayes classifier.
Note that the denominator (the marginal) is equal to the sum of the numerators across all
possible clusters.
αk is the prior probability (from the previous iteration) that a point belongs to cluster k. It is
equal to the proportion of “points that are assigned” to cluster k divided by the total number
of points.
Nk
αk =
N
The “number of points assigned” to cluster k, however, is not an integer. Rather it is the sum
of the membership weights. That is:
N
X
Nk = wik
i=1
The likelihood is the probability density function evaluated for the values in xi given the
current estimates of the cluster’s mean and sigma matrix. The PDF is the multivariate Normal
distribution.
1 1
pk (xi |µk , Σk ) = D/2 1/2
exp − (xi − µk )T Σ−1
k (xi − µk )
(2π) |Σk | 2
Using the mvtnorm package in R, we can easily evaluate the above:
likelihood_k = dmvnorm(X_new, mean = xbar_k, sigma = var_k);
Once you have calculated the membership weights of each point to each cluster, we
recalculate the “centroid” of each cluster according to the probabilistic weights of the points
that have been “assigned” to each cluster:
N
1 X
µk = wik xi
Nk i=1
This is just a weighted mean of all the points.
Points that are highly likely to be in cluster k, will have a membership weight wik close to 1
and will contribute more to the calculation of the “mean”. While points that are unlikely to be
in cluster k, will have wik close to 0 and will not contribute much to the value of the “mean”.
We use the same idea to calculate the Σ matrices of each cluster, using the new µk values we
just calculated in the previous step:
N
1 X
Σk = wik (xi − µk )(xi − µk )T
Nk i=1
0
−5
−10
−15 −10 −5 0 5 10 15
X[, 1]
Copyright Miles Chen. For personal use only. Do not distribute. 23 / 58
True Groups
10
5
y
0
−5
−10
−15 −10 −5 0 5 10 15
x
Copyright Miles Chen. For personal use only. Do not distribute. 24 / 58
# use these initial arbitrary values
N <- dim(dat)[1] # number of data points
alpha <- c(0.33,0.33,0.34) # arbitrary starting mixing parameters
mu <- matrix( # starting means
c(0,0,
-9,-9,
9,9),
nrow = 3, byrow=TRUE
)
sig1 <- matrix(c(1,0,0,1), nrow=2) # three arbitrary covariance matrices
sig2 <- matrix(c(1,0,0,1), nrow=2)
sig3 <- matrix(c(1,0,0,1), nrow=2)
I’ve intentionally hidden the rest of my code because you will code up the EM algorithm in
your HW assignment.
0
−5
−10
−15 −10 −5 0 5 10 15
x
Copyright Miles Chen. For personal use only. Do not distribute. 26 / 58
Before we continue
0
−5
−10
−15 −10 −5 0 5 10 15
x
Copyright Miles Chen. For personal use only. Do not distribute. 28 / 58
EM − One iteration
10
5
y
0
−5
−10
−15 −10 −5 0 5 10 15
x
Copyright Miles Chen. For personal use only. Do not distribute. 29 / 58
EM − One iteration
10
5
y
0
−5
−10
−15 −10 −5 0 5 10 15
x
Copyright Miles Chen. For personal use only. Do not distribute. 30 / 58
EM − One iteration
10
5
y
0
−5
−10
−15 −10 −5 0 5 10 15
x
Copyright Miles Chen. For personal use only. Do not distribute. 31 / 58
EM − One iteration
10
5
y
0
−5
−10
−15 −10 −5 0 5 10 15
x
Copyright Miles Chen. For personal use only. Do not distribute. 32 / 58
EM − One iteration
10
5
y
0
−5
−10
−15 −10 −5 0 5 10 15
x
Copyright Miles Chen. For personal use only. Do not distribute. 33 / 58
EM − One iteration
10
5
y
0
−5
−10
−15 −10 −5 0 5 10 15
x
Copyright Miles Chen. For personal use only. Do not distribute. 34 / 58
EM − One iteration
10
5
y
0
−5
−10
−15 −10 −5 0 5 10 15
x
Copyright Miles Chen. For personal use only. Do not distribute. 35 / 58
EM − One iteration
10
5
y
0
−5
−10
−15 −10 −5 0 5 10 15
x
Copyright Miles Chen. For personal use only. Do not distribute. 36 / 58
EM − One iteration
10
5
y
0
−5
−10
−15 −10 −5 0 5 10 15
x
Copyright Miles Chen. For personal use only. Do not distribute. 37 / 58
EM − One iteration
10
5
y
0
−5
−10
−15 −10 −5 0 5 10 15
x
Copyright Miles Chen. For personal use only. Do not distribute. 38 / 58
EM Final Clusters
10
5
y
0
−5
−10
−15 −10 −5 0 5 10 15
x
Copyright Miles Chen. For personal use only. Do not distribute. 39 / 58
True Groups
10
5
y
0
−5
−10
−15 −10 −5 0 5 10 15
x
Copyright Miles Chen. For personal use only. Do not distribute. 40 / 58
Different Starting Locations
0
−5
−10
−15 −10 −5 0 5 10 15
x
Copyright Miles Chen. For personal use only. Do not distribute. 42 / 58
EM − One iteration
10
5
y
0
−5
−10
−15 −10 −5 0 5 10 15
x
Copyright Miles Chen. For personal use only. Do not distribute. 43 / 58
EM − One iteration
10
5
y
0
−5
−10
−15 −10 −5 0 5 10 15
x
Copyright Miles Chen. For personal use only. Do not distribute. 44 / 58
EM − One iteration
10
5
y
0
−5
−10
−15 −10 −5 0 5 10 15
x
Copyright Miles Chen. For personal use only. Do not distribute. 45 / 58
EM − One iteration
10
5
y
0
−5
−10
−15 −10 −5 0 5 10 15
x
Copyright Miles Chen. For personal use only. Do not distribute. 46 / 58
EM − One iteration
10
5
y
0
−5
−10
−15 −10 −5 0 5 10 15
x
Copyright Miles Chen. For personal use only. Do not distribute. 47 / 58
EM − One iteration
10
5
y
0
−5
−10
−15 −10 −5 0 5 10 15
x
Copyright Miles Chen. For personal use only. Do not distribute. 48 / 58
EM − One iteration
10
5
y
0
−5
−10
−15 −10 −5 0 5 10 15
x
Copyright Miles Chen. For personal use only. Do not distribute. 49 / 58
EM − One iteration
10
5
y
0
−5
−10
−15 −10 −5 0 5 10 15
x
Copyright Miles Chen. For personal use only. Do not distribute. 50 / 58
EM − One iteration
10
5
y
0
−5
−10
−15 −10 −5 0 5 10 15
x
Copyright Miles Chen. For personal use only. Do not distribute. 51 / 58
EM − One iteration
10
5
y
0
−5
−10
−15 −10 −5 0 5 10 15
x
Copyright Miles Chen. For personal use only. Do not distribute. 52 / 58
EM − One iteration
10
5
y
0
−5
−10
−15 −10 −5 0 5 10 15
x
Copyright Miles Chen. For personal use only. Do not distribute. 53 / 58
EM − One iteration
10
5
y
0
−5
−10
−15 −10 −5 0 5 10 15
x
Copyright Miles Chen. For personal use only. Do not distribute. 54 / 58
EM − One iteration
10
5
y
0
−5
−10
−15 −10 −5 0 5 10 15
x
Copyright Miles Chen. For personal use only. Do not distribute. 55 / 58
EM − One iteration
10
5
y
0
−5
−10
−15 −10 −5 0 5 10 15
x
Copyright Miles Chen. For personal use only. Do not distribute. 56 / 58
EM − One iteration
10
5
y
0
−5
−10
−15 −10 −5 0 5 10 15
x
Copyright Miles Chen. For personal use only. Do not distribute. 57 / 58
Fast forward to the end
Groups identified by EM
10
5
y
0
−5
−10