CS-12
CS-12
AIML CZG565
Unsupervised Learning
•These content of modules & context under topics are planned by the course owner
Dr. Sugata, with grateful acknowledgement to many others who made their course
materials freely available online
•We here by acknowledge all the contributors for their material and inputs.
•We have provided source information wherever necessary
•Students are requested to refer to the textbook w.r.t detailed content of the
presentation deck shared over canvas
•We have reduced the slides from canvas and modified the content flow to suit the
requirements of the course and for ease of class presentation
External: CS109 and CS229 Stanford lecture notes, Dr.Andrew NG and many others who
made their course materials freely available online
BITS Pilani, Pilani Campus
Course Plan
M5 Decision Tree
M8 Bayesian Learning
M9 Ensemble Learning
• Mixture Models
• Expectation Maximization (EM) Algorithm
• K-means Clustering
Input:
Structured data of features like colour , shape,
size, etc.,
Output:
Group/Cluster of objects
6
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
K-Means Algorithm
• Works iteratively to find {𝜇k}and {rnk} such that J is minimized
Two - Phases:
1. Assign / Re-assign data points to clusters based on minimum
distance to cluster centers
1. Re-compute the cluster means
For all xt ∊ X:
M-Step:
Algorithm:
Algorithm:
Algorithm:
Algorithm:
Candidate
Glucose
Weight level
1 72 185
2 56 170
3 60 168
4 68 179
5 72 182
6 77 188
7 70 180
8 84 183
17
19
20
21
22
23
24
25
26
27
• If you plot k against SSE, you will see that the error decreases as k gets larger, this
is because when the number of clusters increases, they should be smaller, so
distortion is also smaller. The idea of the elbow method is to choose the k at which
the SSE decreases abruptly.
BITS Pilani, Pilani Campus
(naive) K-Means for detecting
outliers
• Let
• dist(x, 𝜇k) be the distance of a
point x, assigned to cluster k to its
center 𝜇k.
• L𝜇k be the the average distance
of all the points assigned to
cluster k with its center
BITS Pilani
K means – Hard clustering
0 190 255
intensity
BITS Pilani, Pilani Campus
1 of K coding mechanism
• For each data point xn, we introduce a set of binary indicator variables
rnk ∈ [0,1] such that
•
35
36
x x x
2 2 2
x x x
1 1 1
Mins)
• A mixture model assumes that a set of observed
objects is a mixture of instances from multiple
probabilistic clusters, and conceptually each observed Length of Geyser Eruptions
(in Mins)
object is generated independently
Mixture models
K = number of mixture components (clusters),
πⱼ = mixture weights
Unknown variables: y
In clustering: y=1….K clusters
Parameters: θ
In GMM : θ= π : {π1, . . . , πk}
μ : {μ1 , . . . , μK}
Σ : {Σ1, . . . ΣK }
Goal:
In GMM :
E step:
Evaluate the responsibilities (posterior probabilities) using the current parameter
M-Step :
• maximize the expectation of the complete-data log-likelihood, computed with
respect to the conditional probabilities found in the Expectation step. The result of
the maximization is a new parameter vector μnew , Σnew and πnew
• Keep 𝛾 (znk ) fixed, and apply MLE for maximizing of ln p(X|π,µ,Σ) for μk , Σk and
πk, to get μnew , Σnew and πnew
To Estimate:
M-Step
Initialize 𝝿, 𝜇,Σ
also
E-Step
and evaluate the log likelihood
Repeat Until
Convergence
Log likehood =
Log likelihood = ln(sum of Doc1) + ln(sum of
ln(0.010) + ln(0.083) + ln(0.006)
Doc2 + ln(Sum of Doc3)….
=-12.16
This value needs to be found for every iteration
and at one iteration it will approach a very large
value . That will be one of the stopping
convergence point of the algorithm.
Weightk=cluster1 (new)
= 2.0059/3 = 0.67
BITS Pilani, Pilani Campus
GMM - EM Algorithm
2nd Iteration continues
I Standardize the data if required:
Note:
The GMM model can be relaxed
by setting these parameters
while calling the function
• "spherical": all clusters must be spherical, but they can have different diameters
(i.e., different variances).
• "diag": clusters can take on any ellipsoidal shape of any size, but the ellipsoid’s axes
must be parallel to the coordinate axes (i.e., the covariance matrices must be
diagonal).
• "tied": all clusters must have the same ellipsoidal shape, size and orientation (i.e., all
clusters share the same covariance matrix).
gm = GaussianMixture(n_components=3, n_init=10)
gm.fit(X)
gm.weights_
gm.means_
gm.covariances_
• https://www.youtube.com/watch?v=TG6Bh-NFhA0
• https://www.youtube.com/watch?v=qMTuMa86NzU