0% found this document useful (0 votes)
8 views

Unit Iii

machine learning unit 3

Uploaded by

do7760
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Unit Iii

machine learning unit 3

Uploaded by

do7760
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 70

UNIT-III

K-means clustering, Mixtures of Gaussians,


Alternative view of EM, Factor analysis, PCA,
Choosing the number of latent dimensions,
Clustering- measuring dissimilarity, Evaluating the
output of clustering methods, Hierarchial
clustering.
INTRODUCTION

• Additional latent variables allows to express relatively complex marginal distributions over latent variables
in terms of more tractable joint distributions over the expanded space.
• Maximum-Likelihood estimator in such a space is the Expectation-Maximization (EM) algorithm
K-MEANS CLUSTERING
K-MEANS CLUSTERING
K-MEANS CLUSTERING

Robbins-Monro procedure:
K-MEANS CLUSTERING
Image Segmentation and Compression
• The goal of segmentation is to partition an image into regions each of which has a reasonably homogeneous visual
appearance or which corresponds to objects or parts of objects.
• Each pixel in an image is a point in a 3-dimensional space comprising the intensities of the red, blue, and green channels,
and our segmentation algorithm simply treats each pixel in the image as a separate data point.
• We illustrate the result of running K-means to convergence, for any particular value of K, by re-drawing the image
replacing each pixel vector with the {R, G, B} intensity triplet given by the centre µ k to which that pixel has been
assigned. Results for various values of K are shown in below Figure

• Two examples of the application of the K-means


clustering algorithm to image segmentation showing the
initial images together with their K-means
segmentations obtained using various values of K.
• This also illustrates of the use of vector quantization for
data compression, in which smaller values of K give
higher compression at the expense of poorer image
quality
Mixtures of Gaussians
Latent variables
•The basic idea is to model dependence between two variables by adding an edge between
them in the graph.
• An alternative approach is to assume that the observed variables are correlated because
they arise from a hidden common “cause”.
•Model with hidden variables are also known as latent variable models or LVMs.
•Such models are harder to fit than models with no latent variables. However, they can have
significant advantages, for two main reasons.
Mixtures of Gaussians
Latent variables
Mixtures of Gaussians
Mixtures of Gaussians
Mixtures of Gaussians
Maximum Likelihood
Mixtures of Gaussians
EM for Gaussian Mixtures
Mixtures of Gaussians
Mixtures of Gaussians
Mixtures of Gaussians
EM for Gaussian Mixtures Summary
An Alternative View of EM
In practice, the complete data set {X,Z} could not be obtained, so only the incomplete data X will be
available.
The state of knowledge of the values of the latent variables in Z is given only by the posterior
distribution p(Z|X, θ).
Due to incomplete data, each cycle of EM will increase the incomplete-data log likelihood.
Solution:
▪ The distribution of the observed values is obtained by taking the joint distribution of all the variables
and then marginalizing over the missing ones.
▪ EM can then be used to maximize the corresponding likelihood function EM algorithm.
▪ It will be a valid procedure if the data values are missing at random, that the mechanism causing values
to be missing does not depend on the unobserved values
▪ Eg: sensor fails to return a value whenever the quantity it is measuring exceeds some threshold
Relation to K-means
▪ Comparison of the K-means algorithm with the EM algorithm for Gaussian mixtures shows that there is
a close similarity.
▪ K-means algorithm performs a hard assignment of data points to clusters, in which each data point is
associated uniquely with one cluster whereas the EM algorithm makes a soft assignment based on the
posterior probabilities
16
An Alternative View of EM
An Alternative View of EM
An Alternative View of EM
An Alternative View of EM
An Alternative View of EM
Factor analysis
• Problem- Mixture models are limited by discrete, mutually exclusive latent variables
(one-hot encoding).
Factor analysis
Factor analysis
FA is a low rank parameterization of an MVN

• Factor Analysis (FA): Specifies a joint density model x using fewer


parameters.
Factor analysis
Inference of the latent factors
Factor analysis
Factor analysis

Unidentifiability
Factor analysis
Unidentifiability
Principal components analysis (PCA)
Principal components analysis (PCA)
Principal components analysis (PCA)
Principal components analysis (PCA)
Classical PCA: statement of the theorem
Principal components analysis (PCA)
Principal components analysis (PCA)
Principal components analysis (PCA)
Singular value decomposition (SVD)
Choosing the number of latent dimensions
Model selection for FA/PPCA
Model selection for PCA
Clustering
A grouping of data objects such that the objects within a group are similar (or near) to one another and dissimilar
(or far) from the objects in other groups. This method is defined under the branch of Unsupervised Learning,
which aims at gaining insights from unlabeled data points, that is, unlike supervised learning we don’t have a
target variable.
Clustering
Now it is not necessary that the clusters formed must be circular in shape. The shape of clusters can be arbitrary.
There are many algortihms that work well with detecting arbitrary shaped clusters.

Clustering aims at forming groups of homogeneous data points from a heterogeneous dataset. It evaluates the
similarity based on a metric like Euclidean distance, Cosine similarity, Manhattan distance, etc. and then group
the points with highest similarity score together.
Clustering
Types of clustering
• Partitional - each object belongs in exactly one cluster
• Hierarchical - a set of nested clusters organized in a tree

A distinction among different types of clustering's is whether the set of clusters is nested or unnested.
A partitional clustering a simply a division of the set of data objects into non-overlapping subsets (clusters) such
that each data object is in exactly one subset.
A hierarchical clustering is a set of nested clusters that are organized as a tree.
Similarity vs Dissimilarity measures
The similarity or dissimilarity between two objects is generally based on difference in corresponding attribute
values. In a clustering algorithm, the similarity or dissimilarity between objects is usually measured by a distance
function.
Data Types Similarity and Dissimilarity Measures
For nominal variables, these measures are binary, indicating whether two values are equal or not.
For ordinal variables, it is the difference between two values that are normalized by the max distance. For the
other variables, it is just a distance function.
Measuring (dis)similarity
Some common attribute dissimilarity functions are as follows:
Evaluating the output of clustering methods
Purity

• The purity ranges between 0 (bad) and 1 (good). However, we can trivially achieve a purity of 1 by
putting each object into its own cluster, so this measure does not penalize for the number of clusters.

Purity is

Three clusters with labeled objects inside


Rand index
Mutual information
Hierarchical clustering
Why hierarchical clustering?
Agglomerative clustering
Single link

Complete link

Average link
Divisive clustering
• Another method involves building a minimum spanning tree from the dissimilarity graph and
creating new clusters by breaking the link with the largest dissimilarity.
Dissimilarity analysis steps:

4. Continue moving objects from G to H until a stopping criterion is met. At each step, pick the point i* that maximizes
the difference between the dissimilarity to G and the dissimilarity to H:

5. Stop the process when become negative

You might also like