0% found this document useful (0 votes)

50 views

Expectation Maximization - Georgia Tech - Machine Learning - English

The document discusses expectation maximization (EM) which is an algorithm that alternates between two steps: expectation (E-step) and maximization (M-step). In the E-step, it computes the probability that each data point belongs to each cluster. In the M-step, it calculates the mean of each cluster based on the probabilities from the E-step. EM is similar to k-means clustering but soft assigns data points to multiple clusters based on probabilities rather than hard assigning to a single cluster. If the probabilities in EM were restricted to 0/1, it would be equivalent to k-means clustering.

Uploaded by

yousef shaban

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views

Expectation Maximization - Georgia Tech - Machine Learning - English

Uploaded by

yousef shaban

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 3

So, this is going to lead us to

the concept of expectation maximization. So, expectation

maximization is actually, at an algorithmic level,

it's surprisingly similar to K means. So, what

we are going to do is, we're going to

tick-tock back and forth between 2 different

probabilistic calculations. So, you see that? I

kind of drew it like the other one.

>> Mm Hm. The names of the 2 phases are expectation,

and maximization. Sort of you know, our name is our algorithim.

>> I like that.

>> So, what they're going to do is, we're going to

move back and forth between a soft clustering, and computing

the means from the soft cluster. So the soft

clustering goes like this. This probablisitic indicator variable, Z I

J. Represents the likelihood that data element I comes

from cluster J. And so, the way we're going to do

that, since we're in the maximum likelihood setting, is to

use Bayes' rule, and say, well, that's going to be proportional

to the probability that data element I was

produced by cluster J. And then we have a

normalization factor. Normally, we'd also have the prior

in there. So why is the prior gone Charles?

>> Well, because you said it was the maximum likelihood. Scenario.

>> Yeah, right. We talked about how that

just meant that it was uniform and that

allowed us to just leave that component out.

It's not going to have any impact on the normalization.

>> Right.

>> So that's what the Z step is. Is if we had

the clusters, if we knew where the means were, then we could compute

how likely it is that the data would come from

the means, and that's just this calculation here. So that's computing

the expectation. Defining the Z varaibles from the muse. The centers.

We're going to pass that information. That clustering information, Z, over to

the maximization step. What the maximization is going to say is,

okay, well if that's the clustering, we can compute the means

from those clusters. All we have to do is just take

the average variable value. Right? So the average of the Xi's.

Within each cluster J. What's the likelihood it came from cluster J and

then again, we have to normalize. If you think of this as being

a 0 1 indicator variable, then really it is just the average of

the things we assign to that cluster. But here, we actually are kind

of soft assigning, so we could have half of one of the data

points in there, and it only counts half towards the average, and we

could have a tenth in another place, and a whole value in another

place, and so we're just doing this weighted average of the data points.

>> So,

can I ask you a question, Michael?

>> Yeah, shoot.

>> So, this makes sense to me, and I, and I

even get that for the Gaussian case, the z i variable will

always be non 0 in the end, because there's always some

probability. They come from some Gaussian because they have infinite extent. So

I, this all makes sense to me. Is there a way

to take exactly this algorithm and turn it into k means? I'm

staring at it, and it feels like if all your probabilities

were ones and zeroes, you would end up with exactly k means.

I think.
>> I dunno, I never really thought about that. Let's

think about that for a moment. Certainly, the case, if

all the z variables were 0, 1, then the maximization

set would be the means, which is what k means does.

>> Mm-hm. Then, what would happen? We send these means back, and what we do

in k-means is we say, each data point belongs to it's closest center.

>> Mm-hm.

>> Which is very similar actually to what this does. Except that

here we then make it proportional. So I guess it would

exactly that if we made these clustering assignments, pushed them to

0 or 1 depending on which was the most likely cluster.

Right, so if you made it so that the probability of you

being to a cluster actually depends upon all the clusters, and

you always got a 1 over 0. Basically you did, this was

like a hidden argmax kind of a thing, or a

hidden max or something. Then you would end up with exactly k-means.

>> I think

you're right.

>> Huh.

>> Yeah, I never thought about that.

>> Okay.

>> So it really does end up being an awful lot like

the k-means algorithm, which is improving

in the error metric, this squared error

metric. This is actually going to be improving in a probabilistic metric, right.

The, the data is going to be more and more likely over time.

>> That makes sense.

NDEB Equivalency - Process - Required - Documents - 2020
No ratings yet
NDEB Equivalency - Process - Required - Documents - 2020
9 pages
ER Diagram Question and Answer
69% (13)
ER Diagram Question and Answer
2 pages
Lecture 3
No ratings yet
Lecture 3
15 pages
GMMEMNotes
No ratings yet
GMMEMNotes
10 pages
Clustering MIT 15.097 Course Notes
No ratings yet
Clustering MIT 15.097 Course Notes
9 pages
Lecture08b Kmeans
No ratings yet
Lecture08b Kmeans
10 pages
A Tutorial On Clustering Algorithms
No ratings yet
A Tutorial On Clustering Algorithms
4 pages
Week 4 - Lecture Slides - K-Means, Mixture Models, & EM
No ratings yet
Week 4 - Lecture Slides - K-Means, Mixture Models, & EM
65 pages
lec37
No ratings yet
lec37
13 pages
gmm
No ratings yet
gmm
8 pages
lec48
No ratings yet
lec48
12 pages
ML Lecture06 Unsupervised Learning
No ratings yet
ML Lecture06 Unsupervised Learning
87 pages
Introduction To (Statistical) Machine Learning
No ratings yet
Introduction To (Statistical) Machine Learning
30 pages
kdd10 Thclust
No ratings yet
kdd10 Thclust
185 pages
Normal Distribn Theory
0% (1)
Normal Distribn Theory
16 pages
2.2 - 2.3 Clutering (Hier)
No ratings yet
2.2 - 2.3 Clutering (Hier)
25 pages
Week 7 - Latent Variable Models and Expectation Maximization
No ratings yet
Week 7 - Latent Variable Models and Expectation Maximization
39 pages
Wk03 machine learning
No ratings yet
Wk03 machine learning
5 pages
08-Data_Mining_Clustering
No ratings yet
08-Data_Mining_Clustering
22 pages
1 The K-Medoids Algorithm
No ratings yet
1 The K-Medoids Algorithm
5 pages
Expectation Maximization
No ratings yet
Expectation Maximization
23 pages
EM and Kmeans relations
No ratings yet
EM and Kmeans relations
70 pages
Clustering: CMPUT 466/551 Nilanjan Ray
No ratings yet
Clustering: CMPUT 466/551 Nilanjan Ray
34 pages
Region Segmentation Readings: Chapter 10: 10.1 Additional Materials Provided
No ratings yet
Region Segmentation Readings: Chapter 10: 10.1 Additional Materials Provided
47 pages
cs229 Notes7b PDF
No ratings yet
cs229 Notes7b PDF
4 pages
Tema5 Teoria-2830
No ratings yet
Tema5 Teoria-2830
57 pages
Image Enhancement Image Filtering
No ratings yet
Image Enhancement Image Filtering
167 pages
Report 1
No ratings yet
Report 1
3 pages
Lec 11
No ratings yet
Lec 11
57 pages
Mixture Models and Clustering
No ratings yet
Mixture Models and Clustering
8 pages
Dis10 Sol PDF
No ratings yet
Dis10 Sol PDF
6 pages
Module13 GaussianMixtureModel
No ratings yet
Module13 GaussianMixtureModel
17 pages
VIP Cheatsheet: Unsupervised Learning: Afshine Amidi and Shervine Amidi September 9, 2018
No ratings yet
VIP Cheatsheet: Unsupervised Learning: Afshine Amidi and Shervine Amidi September 9, 2018
3 pages
Introduction To Data Mining Clustering Analysis
No ratings yet
Introduction To Data Mining Clustering Analysis
84 pages
Cluster Analysis 1731695796
No ratings yet
Cluster Analysis 1731695796
91 pages
Lecture 18 K Means Clustering
No ratings yet
Lecture 18 K Means Clustering
77 pages
38.8 - Clustering As MF - mp4
No ratings yet
38.8 - Clustering As MF - mp4
4 pages
Lecture 11 - K-Means Clustering (DONE!!) PDF
No ratings yet
Lecture 11 - K-Means Clustering (DONE!!) PDF
49 pages
WINSEM2021-22_ECE6093_ETH_VL2021220505450_Reference_Material_I_23-03-2022_slides_kmeans_(1) (1)
No ratings yet
WINSEM2021-22_ECE6093_ETH_VL2021220505450_Reference_Material_I_23-03-2022_slides_kmeans_(1) (1)
28 pages
6.2 K Means
No ratings yet
6.2 K Means
23 pages
Lecture Expectation Maximization
No ratings yet
Lecture Expectation Maximization
58 pages
Pattern Analysis-Machine Learning
No ratings yet
Pattern Analysis-Machine Learning
74 pages
GMM (2)
No ratings yet
GMM (2)
25 pages
Week 7 GMM
No ratings yet
Week 7 GMM
9 pages
Xu19a Supp
No ratings yet
Xu19a Supp
4 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
SpectralClustering Lectures
No ratings yet
SpectralClustering Lectures
162 pages
K-Medias, Mezcla de Gausianas y Un Ejemplo
No ratings yet
K-Medias, Mezcla de Gausianas y Un Ejemplo
6 pages
Applied Stat
No ratings yet
Applied Stat
2 pages
Lecture Notes On Clustering
No ratings yet
Lecture Notes On Clustering
10 pages
Industrial Mathematics Institute: Research Report
No ratings yet
Industrial Mathematics Institute: Research Report
25 pages
ML UNIT III
No ratings yet
ML UNIT III
12 pages
GAUSSIAN MIXTURES
No ratings yet
GAUSSIAN MIXTURES
5 pages
Randomized Dimensionality Reduction For - Means Clustering
No ratings yet
Randomized Dimensionality Reduction For - Means Clustering
18 pages
The Problem: Library (MASS) Data (Faithful) Attach (Faithful)
No ratings yet
The Problem: Library (MASS) Data (Faithful) Attach (Faithful)
7 pages
10 K Means Clustering PDF
No ratings yet
10 K Means Clustering PDF
5 pages
DSA5102_lecture10
No ratings yet
DSA5102_lecture10
40 pages
9 Unsupervised Learning: 9.1 K-Means Clustering
No ratings yet
9 Unsupervised Learning: 9.1 K-Means Clustering
34 pages
06 - Unsupervised Learning - 18 Dec 2023
No ratings yet
06 - Unsupervised Learning - 18 Dec 2023
50 pages
ML DSBA Lab7
No ratings yet
ML DSBA Lab7
6 pages
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet
Hypothesis Testing Made Simple
From Everand
Hypothesis Testing Made Simple
Leonard Gaston
4/5 (5)
Essay 7231 Writing Correction Task 2
No ratings yet
Essay 7231 Writing Correction Task 2
3 pages
Work Sheet 4 - Complex Sentences - Adjective Clause
No ratings yet
Work Sheet 4 - Complex Sentences - Adjective Clause
9 pages
Complex Sentences Using Adjective Clauses - 3
No ratings yet
Complex Sentences Using Adjective Clauses - 3
2 pages
Kubernetes: How To Pass The Certified Kubernetes Administrator (CKA) Exam
100% (1)
Kubernetes: How To Pass The Certified Kubernetes Administrator (CKA) Exam
241 pages
BCCH Power Saving
No ratings yet
BCCH Power Saving
15 pages
Coursera Algorithms On Graphs
No ratings yet
Coursera Algorithms On Graphs
312 pages
Coursera Algorithms On String
0% (1)
Coursera Algorithms On String
256 pages
GPRS Dimensions
No ratings yet
GPRS Dimensions
10 pages
Coursera Advanced Algorithms and Complexity
No ratings yet
Coursera Advanced Algorithms and Complexity
329 pages
Essay 7232 Writing Correction Task 2
No ratings yet
Essay 7232 Writing Correction Task 2
5 pages
Youtube Data Strcutre and Algorithms New Baghdad
No ratings yet
Youtube Data Strcutre and Algorithms New Baghdad
147 pages
Antenna Configurations 1560 13
No ratings yet
Antenna Configurations 1560 13
1 page
Yousef Udacity Deep Learning Part1 Introdution + Part 2 NN
No ratings yet
Yousef Udacity Deep Learning Part1 Introdution + Part 2 NN
437 pages
Coursera Data Structure
No ratings yet
Coursera Data Structure
491 pages
Udacity Deep LEarning Part4 RNN
No ratings yet
Udacity Deep LEarning Part4 RNN
338 pages
Yousef AI Follow-Up Sheet
100% (1)
Yousef AI Follow-Up Sheet
729 pages
Yousef Udacity Deep Learning Part 3 CNN
No ratings yet
Yousef Udacity Deep Learning Part 3 CNN
253 pages
Filter High PDCH Cong Continuous TBF Sharing Afp A FLP ??
No ratings yet
Filter High PDCH Cong Continuous TBF Sharing Afp A FLP ??
9 pages
1.4 Main Changes in Ericsson GSM RAN G16B: Also Relation Attendance Level
No ratings yet
1.4 Main Changes in Ericsson GSM RAN G16B: Also Relation Attendance Level
54 pages
Coursera Algorithm Toolbox
0% (1)
Coursera Algorithm Toolbox
456 pages
Cell Configuration Change Request: BSC06 GC2133A BSC06 GC2133A BSC06 GC2133C BSC06 GC2133C
No ratings yet
Cell Configuration Change Request: BSC06 GC2133A BSC06 GC2133A BSC06 GC2133C BSC06 GC2133C
3 pages
Yousef ML Washin Classification
100% (1)
Yousef ML Washin Classification
333 pages
Yousef Time Series Analysis in Python 2020
100% (1)
Yousef Time Series Analysis in Python 2020
835 pages
Inferential Statistics in Details
No ratings yet
Inferential Statistics in Details
652 pages
BTS Power Saving Feature
No ratings yet
BTS Power Saving Feature
534 pages
Reference Email Sample
100% (1)
Reference Email Sample
4 pages
SPTH Guide en 1.13.1
No ratings yet
SPTH Guide en 1.13.1
36 pages
Water Side in HVAC
No ratings yet
Water Side in HVAC
14 pages
Designing QC Procedures For Multiple Instruments: John Yundt-Pacheco
No ratings yet
Designing QC Procedures For Multiple Instruments: John Yundt-Pacheco
25 pages
MAD MICROPROJECT 3
No ratings yet
MAD MICROPROJECT 3
18 pages
Answer:: Chapter 19 - Solution Procedures For Transportation and Assignment Problems True / False
No ratings yet
Answer:: Chapter 19 - Solution Procedures For Transportation and Assignment Problems True / False
13 pages
RBS&RNC John Jalilian.
No ratings yet
RBS&RNC John Jalilian.
78 pages
RDT Colombia - Offshore
No ratings yet
RDT Colombia - Offshore
9 pages
2024 fall syllabus bios 500H v3
No ratings yet
2024 fall syllabus bios 500H v3
18 pages
2022materialcanvassform 1215710892601434
No ratings yet
2022materialcanvassform 1215710892601434
8 pages
CN MCQ QB
No ratings yet
CN MCQ QB
30 pages
Analyses of Machine Learning Techniques For Sign Language To Text Conversion For Speech Impaired
No ratings yet
Analyses of Machine Learning Techniques For Sign Language To Text Conversion For Speech Impaired
5 pages
Product Brief - Snapdragon 680 4g Mobile Platform
No ratings yet
Product Brief - Snapdragon 680 4g Mobile Platform
2 pages
DE I (A) Report Format Details
No ratings yet
DE I (A) Report Format Details
3 pages
Lesson 2 Force Systems
No ratings yet
Lesson 2 Force Systems
7 pages
Product Obsolescence Bulletin: 27 September 2022 02
No ratings yet
Product Obsolescence Bulletin: 27 September 2022 02
2 pages
ELS 09 Agustus 2023
No ratings yet
ELS 09 Agustus 2023
19 pages
De - Continuous Assessment Card - Format
No ratings yet
De - Continuous Assessment Card - Format
9 pages
Mass Media
No ratings yet
Mass Media
16 pages
Southern Lumber Catalog 121523
No ratings yet
Southern Lumber Catalog 121523
112 pages
Team 2 - Uber Case - Week 9
No ratings yet
Team 2 - Uber Case - Week 9
8 pages
CCSK
No ratings yet
CCSK
14 pages
Etabs Truss Steel Assigment
No ratings yet
Etabs Truss Steel Assigment
8 pages
Java Questions
No ratings yet
Java Questions
21 pages
Magnetic Memory
No ratings yet
Magnetic Memory
3 pages
4.optimization Techniques
No ratings yet
4.optimization Techniques
1 page
PS 407
No ratings yet
PS 407
3 pages
Translation 5
No ratings yet
Translation 5
35 pages
Dr_Kishore_CV_Resource_Person
No ratings yet
Dr_Kishore_CV_Resource_Person
8 pages

Expectation Maximization - Georgia Tech - Machine Learning - English

Uploaded by

Expectation Maximization - Georgia Tech - Machine Learning - English

Uploaded by

So, this is going to lead us to

the concept of expectation maximization. So, expectation

maximization is actually, at an algorithmic level,

it's surprisingly similar to K means. So, what

we are going to do is, we're going to

tick-tock back and forth between 2 different

probabilistic calculations. So, you see that? I

kind of drew it like the other one.

>> Mm Hm. The names of the 2 phases are expectation,

and maximization. Sort of you know, our name is our algorithim.

>> I like that.

>> So, what they're going to do is, we're going to

move back and forth between a soft clustering, and computing

the means from the soft cluster. So the soft

clustering goes like this. This probablisitic indicator variable, Z I

J. Represents the likelihood that data element I comes

from cluster J. And so, the way we're going to do

that, since we're in the maximum likelihood setting, is to

use Bayes' rule, and say, well, that's going to be proportional

to the probability that data element I was

produced by cluster J. And then we have a

normalization factor. Normally, we'd also have the prior

in there. So why is the prior gone Charles?

>> Yeah, right. We talked about how that

just meant that it was uniform and that

allowed us to just leave that component out.

It's not going to have any impact on the normalization.

>> So that's what the Z step is. Is if we had

how likely it is that the data would come from

We're going to pass that information. That clustering information, Z, over to

the maximization step. What the maximization is going to say is,

okay, well if that's the clustering, we can compute the means

from those clusters. All we have to do is just take

the average variable value. Right? So the average of the Xi's.

then again, we have to normalize. If you think of this as being

a 0 1 indicator variable, then really it is just the average of

of soft assigning, so we could have half of one of the data

could have a tenth in another place, and a whole value in another

can I ask you a question, Michael?

>> Yeah, shoot.

>> So, this makes sense to me, and I, and I

always be non 0 in the end, because there's always some

I, this all makes sense to me. Is there a way

to take exactly this algorithm and turn it into k means? I'm

staring at it, and it feels like if all your probabilities

think about that for a moment. Certainly, the case, if

all the z variables were 0, 1, then the maximization

set would be the means, which is what k means does.

in k-means is we say, each data point belongs to it's closest center.

here we then make it proportional. So I guess it would

exactly that if we made these clustering assignments, pushed them to

0 or 1 depending on which was the most likely cluster.

Right, so if you made it so that the probability of you

being to a cluster actually depends upon all the clusters, and

you always got a 1 over 0. Basically you did, this was

like a hidden argmax kind of a thing, or a

>> Yeah, I never thought about that.

>> So it really does end up being an awful lot like

the k-means algorithm, which is improving

in the error metric, this squared error

metric. This is actually going to be improving in a probabilistic metric, right.

>> That makes sense.

You might also like