K-Means and PCA
K-Means and PCA
RAMAPURAM CAMPUS
K-means Clustering,
Principal Component Analysis and
Singular Value Decomposition
Dr.S.Veena,Associate Professor/CSE 1
Overview
K-means Clustering
• Introduction to K-means Clustering
• K-means working methodology from first principles
• Optimal number of clusters and cluster evaluation
• The elbow method
• K-means clustering with the iris data example
Principal Component Analysis(PCA)
• Introduction to PCA
• PCA working methodology from first principles
• PCA applied on handwritten digits using scikit-learn
Singular Value Decomposition(SVD)
• Introduction to SVD
• SVD applied on handwritten digits using scikit-learn
Dr.S.Veena,Associate Professor/CSE
2
K-means Clustering
Introduction to K-Means Clustering
• Clustering is the task of grouping observations in such a way
that members of the same cluster are more similar to each
other and members of different clusters are very different from
each other.
• Examples
– In anti-money laundering measures, suspicious activities and individuals
can be identified using anomaly detection
– In biology, clustering is used to find groups of genes with similar
expression patterns
– In marketing analytics, clustering is used to find segments of similar
customers so that different marketing strategies can be applied to
different customer segments accordingly
Dr.S.Veena,Associate Professor/CSE
3
K-means Clustering
Dr.S.Veena,Associate Professor/CSE
4
K-means Clustering
Dr.S.Veena,Associate Professor/CSE
5
K-means Clustering
K-means working methodology from first principles
The k-means working methodology is illustrated in the following example in
which 12 instances are considered with their X and Y values. The task is to
determine the optimal clusters out of the data.
Dr.S.Veena,Associate Professor/CSE
6
K-means Clustering
Dr.S.Veena,Associate Professor/CSE
7
K-means Clustering
The Euclidean distance between two points A (X1, Y1) and B (X2, Y2) is shown
as follows:
Dr.S.Veena,Associate Professor/CSE
8
K-means Clustering
Dr.S.Veena,Associate Professor/CSE
9
K-means Clustering
Assignment of instances to both centroids
Dr.S.Veena,Associate Professor/CSE
10
K-means Clustering
Iteration 2: In this iteration, new centroids are calculated from the assigned
instances for that cluster or centroid. New centroids are calculated based on the
simple average of the assigned points.
Dr.S.Veena,Associate Professor/CSE
11
K-means Clustering
Dr.S.Veena,Associate Professor/CSE
12
K-means Clustering
Dr.S.Veena,Associate Professor/CSE
13
K-means Clustering
Iteration 3: In this iteration, new assignments are calculated based on the
Euclidean distance between instances and new centroids. In the event of any
changes, new centroids will be calculated iteratively until no changes in
assignments are possible or the number of iterations is reached.
Dr.S.Veena,Associate Professor/CSE
14
K-means Clustering
Dr.S.Veena,Associate Professor/CSE
15
K-means Clustering
Dr.S.Veena,Associate Professor/CSE
16
K-means Clustering
Dr.S.Veena,Associate Professor/CSE
17
K-means Clustering
Dr.S.Veena,Associate Professor/CSE
18
K-means clustering with the iris data example
Dr.S.Veena,Associate Professor/CSE
19
K-means clustering with the iris data example
# K-means clustering
>>> import numpy as np
Following code is used to separate class variable as dependent variable for creating colors in plot and
unsupervised learning algorithm applied on given x variables without any target variable does present:
Dr.S.Veena,Associate Professor/CSE
21
K-means clustering with the iris data example
>>> k_means_fit.fit(x_iris)
Dr.S.Veena,Associate Professor/CSE
22
K-means clustering with the iris data example
From the previous confusion matrix, we can see that all the setosa flowers are
clustered correctly, whereas 2 out of 50 versicolor, and 14 out of 50 virginica
flowers are incorrectly classified.
Dr.S.Veena,Associate Professor/CSE
23
K-means clustering with the iris data example
To perform sensitivity analysis to check how many number of clusters does actually provide
better explanation of segments:
>>> for k in range(2,10):
... k_means_fitk = KMeans(n_clusters=k,max_iter=300)
... k_means_fitk.fit(x_iris)
Dr.S.Veena,Associate Professor/CSE
24
K-means clustering with the iris data example
we also need to see the average within cluster variation value and elbow plot
before concluding the optimal K value.
# Avg. within-cluster sum of squares
>>> K = range(1,10)
Dr.S.Veena,Associate Professor/CSE
25
K-means clustering with the iris data example
Dr.S.Veena,Associate Professor/CSE
27
K-means clustering with the iris data example
From the elbow plot, it seems that at the value of three, the
slope changes drastically.
Here,we can select the optimal k-value as three.
# elbow curve - percentage of variance explained
>>> fig = plt.figure()
>>> ax = fig.add_subplot(111)
>>> ax.plot(K, bss/tss*100, 'b*-')
>>> plt.grid(True)
>>> plt.xlabel('Number of clusters')
>>> plt.ylabel('Percentage of variance explained')
>>> plt.show()
Dr.S.Veena,Associate Professor/CSE
28
K-means clustering with the iris data example
Dr.S.Veena,Associate Professor/CSE
29
Principal component analysis - PCA
Dr.S.Veena,Associate Professor/CSE
30
Principal component analysis - PCA
Dr.S.Veena,Associate Professor/CSE
31
Principal component analysis - PCA
• The left- hand side of the diagram depicts the top view, front view,
and side view of the component.
• However, on the right-hand side, an isometric view has been
drawn, in which one single image has been used to visualize how
the component looks.
• So, one can imagine that the left-hand images are the actual
variables and the right-hand side is the first principal component,
in which most variance has been captured.
• Finally, three images have been replaced by a single image by
rotating the axis of direction. we replicate the same technique in
PCA analysis
Dr.S.Veena,Associate Professor/CSE
32
Principal component analysis - PCA
• Actual data has been shown in a 2D space, in which X and Y axis
are used to plot the data.
• Principal components are the ones in which maximum variation of
the data is captured.
Dr.S.Veena,Associate Professor/CSE
33
Principal component analysis - PCA
Dr.S.Veena,Associate Professor/CSE
34
Principal component analysis - PCA
Dr.S.Veena,Associate Professor/CSE
35
Principal component analysis - PCA
• More formally, A is a linear transformation from a vector space and is a nonzero
vector, then eigenvector of A if is a scalar multiple of .
• The condition can be written as the following equation:
Dr.S.Veena,Associate Professor/CSE
36
Principal component analysis - PCA
• The following example describes how to calculate eigenvectors and eigenvalues from the square
matrix and its understanding.
• Note that eigenvectors and eigenvalues can be calculated only for square matrices (those with
the same dimensions of rows and columns)
Dr.S.Veena,Associate Professor/CSE
37
Principal component analysis - PCA
• A characteristic equation states that the determinant of the matrix, that is the
difference between the data matrix and the product of the identity matrix and an
eigenvalue is 0
• Both eigenvalues for the preceding matrix are equal to -2. We can use
eigenvalues to substitute for eigenvectors in an equation:
Dr.S.Veena,Associate Professor/CSE
38
Principal component analysis - PCA
• Substituting the value of eigenvalue in the preceding equation, we will obtain the
following formula:
• This equation indicates it can have multiple solutions of eigenvectors we can substitute
with any values which hold the preceding equation for verification of equation. Here,
we have used the vector [1 1] for verification, which seems to be proved
Dr.S.Veena,Associate Professor/CSE
39
Principal component analysis - PCA
Dr.S.Veena,Associate Professor/CSE
40
Principal component analysis - PCA
PCA working methodology from first principles
• PCA working methodology is described in the following sample data, which has two
dimensions for each instance or data point. The objective here is to reduce the 2D data into
one dimension (also known as the principal component):
Dr.S.Veena,Associate Professor/CSE
41
Principal component analysis - PCA
PCA working methodology from first principles
The first step, prior to proceeding with any analysis, is to subtract the mean from all the observations,
which removes the scale factor of variables and makes them more uniform across dimensions.
Dr.S.Veena,Associate Professor/CSE
42
Principal component analysis - PCA
PCA working methodology from first principles
Principal components are calculated using two different
techniques:
• Covariance matrix of the data
• Singular value decomposition
Dr.S.Veena,Associate Professor/CSE
43
Principal component analysis - PCA
PCA working methodology from first principles
A sample covariance calculation is shown for X and Y variables in the following formulas. However, it is a 2 x 2 matrix of an entire
covariance matrix (also, it is a square matrix).
Dr.S.Veena,Associate Professor/CSE
44
Principal component analysis - PCA
PCA working methodology from first principles
we can calculate eigenvectors and eigenvalue
Dr.S.Veena,Associate Professor/CSE
45
Principal component analysis - PCA
PCA working methodology from first principles
Python syntax:
Dr.S.Veena,Associate Professor/CSE
46
Principal component analysis - PCA
PCA working methodology from first principles
Once we obtain the eigenvectors and eigenvalues, we can project data into principal
components.
The first eigenvector has the greatest eigenvalue and is the first principal component, as we
would like to reduce the original 2D data into 1D data.
From the preceding result, we can see the 1D projection of the first principal component from
the original 2D data.
Also, the eigenvalue of 1.5725 explains the fact that the principal component explains
variance of 57 percent more than the original variables.
Dr.S.Veena,Associate Professor/CSE
47
Principal component analysis - PCA
PCA applied on handwritten digits using scikit-learn
• Handwritten digits example from scikit- learn datasets
• Handwritten digits are created from 0-9 and its respective 64 features (8 x 8 matrix)
of pixel intensities.
• The idea is to represent the original features of 64 dimensions into as few as possible
>>> X = digits.data
>>> y = digits.target
Dr.S.Veena,Associate Professor/CSE
48
Principal component analysis - PCA
PCA applied on handwritten digits using scikit-learn
Plot the graph using the plt.show function:
>>> plt.matshow(digits.images[0])
>>> plt.show()
Dr.S.Veena,Associate Professor/CSE
49
Principal component analysis - PCA
PCA applied on handwritten digits using scikit-learn
Before performing PCA, it is advisable to perform scaling of input data to eliminate
any issues due to different dimensions of the data.
In the following code, we have applied scaling on all the columns separately:
>>> from sklearn.preprocessing import scale
>>> X_scale = scale(X,axis=0)
In the following, we have used two principal components, so that we can represent
the performance on a 2D graph.
>>> pca = PCA(n_components=2)
>>> reduced_X = pca.fit_transform(X_scale)
>>> zero_x, zero_y = [],[] ; one_x, one_y = [],[]
>>> two_x,two_y = [],[]; three_x, three_y = [],[]
>>> four_x,four_y = [],[]; five_x,five_y = [],[]
>>> six_x,six_y = [],[]; seven_x,seven_y = [],[]
>>> eight_x,eight_y = [],[]; nine_x,nine_y = [],[]
Dr.S.Veena,Associate Professor/CSE
50
Principal component analysis - PCA
PCA applied on handwritten digits using scikit-learn
we are appending the relevant principal components to each digit separately so that we
can create a scatter plot of all 10 digits
Dr.S.Veena,Associate Professor/CSE
51
Principal component analysis - PCA
PCA applied on handwritten digits using scikit-learn
.
Dr.S.Veena,Associate Professor/CSE
52
Principal component analysis - PCA
PCA applied on handwritten digits using scikit-learn
Dr.S.Veena,Associate Professor/CSE
53
Principal component analysis - PCA
PCA applied on handwritten digits using scikit-learn
Dr.S.Veena,Associate Professor/CSE
54
Principal component analysis - PCA
PCA applied on handwritten digits using scikit-learn
Dr.S.Veena,Associate Professor/CSE
55
Principal component analysis - PCA
PCA applied on handwritten digits using scikit-learn
Dr.S.Veena,Associate Professor/CSE
56
Principal component analysis - PCA
PCA applied on handwritten digits using scikit-learn
Dr.S.Veena,Associate Professor/CSE
57
Principal component analysis - PCA
PCA applied on handwritten digits using scikit-learn
In a 3D plot, Digit 2 is at the extreme left and digit 0 is at the lower part of the plot. Whereas, digit 4 is at the top-right end, digit 6
seems to be more towards the PC 1 axis.
Dr.S.Veena,Associate Professor/CSE
58
Principal component analysis - PCA
PCA applied on handwritten digits using scikit-learn
● Choosing the number of PCAs to be extracted is an open-ended question in unsupervised
learning, but there are some turnarounds to get an approximated view.
● There are two ways we can determine the number of clusters:
○ Check where the total variance explained is diminishing marginally
○ Total variance explained greater than 80 percent
● The following code does provide the total variance explained with the change in number of
principal components.
Dr.S.Veena,Associate Professor/CSE
59
Principal component analysis - PCA
PCA applied on handwritten digits using scikit-learn
Dr.S.Veena,Associate Professor/CSE
60
Singular value decomposition - SVD
• Many implementations of PCA use singular value decomposition
to calculate eigenvectors and eigenvalues.
• SVD is given by the following equation:
• Columns of U are called left singular vectors of the data matrix, the
columns of V are its right singular vectors, and the diagonal entries
of are its singular values.
• Left singular vectors are the eigenvectors of the covariance matrix
and the diagonal element of are the square roots of the
eigenvalues of the covariance matrix
Dr.S.Veena,Associate Professor/CSE
61
Singular value decomposition - SVD
Advantages of SVD:
• SVD can be applied even on rectangular matrices; whereas, eigenvalues are
defined only for square matrices.
• The equivalent of eigenvalues obtained through the SVD method are called singular
values, and vectors obtained equivalent to eigenvectors are known as singular
vectors.
• However, as they are rectangular in nature, we need to have left singular vectors
and right singular vectors respectively for their dimensions.
• If a matrix A has a matrix of eigenvectors P that is not invertible, then A does not
have an eigen decomposition.
• However, if A is m x n real matrix with m > n, then A can be written using a singular
value decomposition.
• Both U and V are orthogonal matrices, which means UT U = I (I with m x m
dimension) or VT V = I (here I with n x n dimension), where two identity matrices
may have different dimensions.
• is a non-negative diagonal matrix with m x n dimensions.
Dr.S.Veena,Associate Professor/CSE
62
Singular value decomposition - SVD
• First stage - singular values/eigenvalues are calculated with the equation. Once we
obtain the singular/eigenvalues, we will substitute to determine the V or right
singular/eigen vectors:
• we will substitute to obtain the left singular vectors U using the equation mentioned
as follows
Dr.S.Veena,Associate Professor/CSE
63
Singular value decomposition - SVD
Dr.S.Veena,Associate Professor/CSE
64
Singular value decomposition - SVD
Dr.S.Veena,Associate Professor/CSE
65
Singular value decomposition - SVD
The original matrix of dimension (1797 x 64) has been decomposed into
a left singular vector (1797 x 15), singular value (diagonal matrix of 15),
and right singular vector (15 x 64).
Dr.S.Veena,Associate Professor/CSE
66
Singular value decomposition - SVD
>>> n_comps = 15
>>> from sklearn.decomposition import
TruncatedSVD
>>> svd = TruncatedSVD(n_components=n_comps,
n_iter=300, random_state=42)
>>> reduced_X = svd.fit_transform(X)
>>> print("\nTotal Variance explained for %d
singular features are %0.3f"%(n_comps,
svd.explained_variance_ratio_.sum()))
Dr.S.Veena,Associate Professor/CSE
67
Singular value decomposition - SVD
Dr.S.Veena,Associate Professor/CSE
68
Singular value decomposition - SVD
Dr.S.Veena,Associate Professor/CSE
69