cs229 Notes7b PDF
cs229 Notes7b PDF
Andrew Ng
and the parameter j gives p(z (i) = j),), and x(i) |z (i) = j N (j , j ). We
let k denote the number of values that the z (i) s can take on. Thus, our
model posits that each x(i) was generated by randomly choosing z (i) from
{1, . . . , k}, and then x(i) was drawn from one of k Gaussians depending on
z (i) . This is called the mixture of Gaussians model. Also, note that the
z (i) s are latent random variables, meaning that theyre hidden/unobserved.
This is what will make our estimation problem difficult.
The parameters of our model are thus , and . To estimate them, we
can write down the likelihood of our data:
X
m
(, , ) = log p(x(i) ; , , )
i=1
X
m X
k
= log p(x(i) |z (i) ; , )p(z (i) ; ).
i=1 z (i) =1
1
2
likelihood problem would have been easy. Specifically, we could then write
down the likelihood as
X
m
(, , ) = log p(x(i) |z (i) ; , ) + log p(z (i) ; ).
i=1
1 X
m
j = 1{z (i) = j},
m i=1
Pm (i)
i=1 1{z = j}x(i)
j = P m (i) = j}
,
i=1 1{z
Pm (i)
i=1 1{z P = j}(x(i) j )(x(i) j )T
j = m (i) = j}
.
i=1 1{z
Indeed, we see that if the z (i) s were known, then maximum likelihood
estimation becomes nearly identical to what we had when estimating the
parameters of the Gaussian discriminant analysis model, except that here
the z (i) s playing the role of the class labels.1
However, in our density estimation problem, the z (i) s are not known.
What can we do?
The EM algorithm is an iterative algorithm that has two main steps.
Applied to our problem, in the E-step, it tries to guess the values of the
z (i) s. In the M-step, it updates the parameters of our model based on our
guesses. Since in the M-step we are pretending that the guesses in the first
part were correct, the maximization becomes easy. Heres the algorithm:
1 X (i)
m
j := w ,
m i=1 j
Pm (i) (i)
i=1 wj x
j := Pm (i) ,
i=1 wj
Pm (i) (i)
i=1 wj (x j )(x(i) j )T
j := Pm (i)
i=1 wj