0% found this document useful (0 votes)
268 views69 pages

K-Means and PCA

The document discusses K-means clustering, principal component analysis (PCA), and singular value decomposition (SVD). It provides an overview of K-means clustering, including how it works, determining the optimal number of clusters, and evaluating clusters. It demonstrates K-means clustering on the iris dataset. For PCA, it introduces the concept and shows how to apply it to handwritten digits using scikit-learn. Similarly for SVD, it introduces the concept and applies it to handwritten digits with scikit-learn. The document uses examples and illustrations to explain key aspects of these three unsupervised learning techniques.

Uploaded by

vdjohn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
268 views69 pages

K-Means and PCA

The document discusses K-means clustering, principal component analysis (PCA), and singular value decomposition (SVD). It provides an overview of K-means clustering, including how it works, determining the optimal number of clusters, and evaluating clusters. It demonstrates K-means clustering on the iris dataset. For PCA, it introduces the concept and shows how to apply it to handwritten digits using scikit-learn. Similarly for SVD, it introduces the concept and applies it to handwritten digits with scikit-learn. The document uses examples and illustrations to explain key aspects of these three unsupervised learning techniques.

Uploaded by

vdjohn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 69

SRM INSTITUTE OF SCIENCE AND TECHNOLOGY,

RAMAPURAM CAMPUS

K-means Clustering,
Principal Component Analysis and
Singular Value Decomposition

Dr.S.Veena,Associate Professor/CSE 1
Overview

K-means Clustering
• Introduction to K-means Clustering
• K-means working methodology from first principles
• Optimal number of clusters and cluster evaluation
• The elbow method
• K-means clustering with the iris data example
Principal Component Analysis(PCA)
• Introduction to PCA
• PCA working methodology from first principles
• PCA applied on handwritten digits using scikit-learn
Singular Value Decomposition(SVD)
• Introduction to SVD
• SVD applied on handwritten digits using scikit-learn
Dr.S.Veena,Associate Professor/CSE
2
K-means Clustering
Introduction to K-Means Clustering
• Clustering is the task of grouping observations in such a way
that members of the same cluster are more similar to each
other and members of different clusters are very different from
each other.
• Examples
– In anti-money laundering measures, suspicious activities and individuals
can be identified using anomaly detection
– In biology, clustering is used to find groups of genes with similar
expression patterns
– In marketing analytics, clustering is used to find segments of similar
customers so that different marketing strategies can be applied to
different customer segments accordingly

Dr.S.Veena,Associate Professor/CSE
3
K-means Clustering

Introduction to K-Means Clustering


● K-means clustering algorithm
○ an iterative process of moving the centers of clusters or centroids
to the mean position of their constituent points
○ reassigning instances to their closest clusters iteratively until
there is no significant change in the number of cluster centers
possible or number of iterations reached.

• The cost function of k-means is determined by the Euclidean


distance (square-norm) between the observations belonging to that
cluster with its respective centroid value.

Dr.S.Veena,Associate Professor/CSE
4
K-means Clustering

Introduction to K-Means Clustering


• An intuitive way to understand the equation is, if there is only one cluster
(k=1), then the distances between all the observations are compared with its
single mean.
• if number of clusters increases to 2 (k= 2), then two-means are calculated
and a few of the observations are assigned to cluster 1 and other
observations are assigned to cluster two-based on proximity.
• Subsequently, distances are calculated in cost functions by applying the
same distance measure, but separately to their cluster centers:

Dr.S.Veena,Associate Professor/CSE
5
K-means Clustering
K-means working methodology from first principles
The k-means working methodology is illustrated in the following example in
which 12 instances are considered with their X and Y values. The task is to
determine the optimal clusters out of the data.

Dr.S.Veena,Associate Professor/CSE
6
K-means Clustering

K-means working methodology from first principles

Dr.S.Veena,Associate Professor/CSE
7
K-means Clustering

K-means working methodology from first principles


Iteration 1:
• Let us assume two centers from two instances out of all the 12 instances.
• chosen instance 1 (X = 7, Y = 8) and instance 8 (X = 1, Y = 4),
• For each instance, calculate its Euclidean distances with respect to
• both centroids and assign it to the nearest cluster center.

The Euclidean distance between two points A (X1, Y1) and B (X2, Y2) is shown
as follows:

Dr.S.Veena,Associate Professor/CSE
8
K-means Clustering

Dr.S.Veena,Associate Professor/CSE
9
K-means Clustering
Assignment of instances to both centroids

Dr.S.Veena,Associate Professor/CSE
10
K-means Clustering
Iteration 2: In this iteration, new centroids are calculated from the assigned
instances for that cluster or centroid. New centroids are calculated based on the
simple average of the assigned points.

Dr.S.Veena,Associate Professor/CSE
11
K-means Clustering

Dr.S.Veena,Associate Professor/CSE
12
K-means Clustering

Dr.S.Veena,Associate Professor/CSE
13
K-means Clustering
Iteration 3: In this iteration, new assignments are calculated based on the
Euclidean distance between instances and new centroids. In the event of any
changes, new centroids will be calculated iteratively until no changes in
assignments are possible or the number of iterations is reached.

Dr.S.Veena,Associate Professor/CSE
14
K-means Clustering

Optimal number of clusters and cluster evaluation


• Though selecting number of clusters is more of an art than science, optimal
number of clusters are chosen where, there will not be much marginal
increase in explanation ability by increasing number of clusters are
possible.
• In practical applications, usually business should be able to provide what
would be approximate number of clusters they are looking for.

Dr.S.Veena,Associate Professor/CSE
15
K-means Clustering

The elbow method


• The elbow method is used to determine the optimal number of clusters in k-
means clustering.
• The elbow method plots the value of the cost function produced by different
• values of k.
• As the value of K increases, there will be fewer elements in the cluster.
So average distortion will decrease. The lesser number of elements means
closer to the centroid. So, the point where this distortion declines the most
is the elbow point.
• if k increases, average distortion will decrease, each cluster will have fewer
constituent instances, and the instances will be closer to their respective
centroids.
• However, the improvements in average distortion will decline as k increases.
• The value of k at which improvement in distortion declines the most is called
the elbow, at which we should stop dividing the data into further clusters.

Dr.S.Veena,Associate Professor/CSE
16
K-means Clustering

The elbow method

Dr.S.Veena,Associate Professor/CSE
17
K-means Clustering

Evaluation of clusters with silhouette coefficient


• the silhouette coefficient is a measure of the compactness and separation of
the clusters.
• Higher values represent a better quality of cluster.
• The silhouette coefficient is higher for compact clusters that are well
separated and lower for overlapping clusters.
• Silhouette coefficient values do change from -1 to +1, and the higher the
value is, the better.
• The silhouette coefficient is calculated per instance. For a set of instances, it
is calculated as the mean of the individual sample's scores.

a is the mean distance between the instances in the cluster,


b is the mean distance between the instance and the instances in the next
closest cluster.

Dr.S.Veena,Associate Professor/CSE
18
K-means clustering with the iris data example

Data set : http://archive.ics.uci.edu/ml/datasets/Iris.

The iris data has three types of flowers: setosa, versicolor,


and virginica and their respective measurements of sepal
length, sepal width, petal length, and petal width.

The task is to group the flowers based on their


measurements.

Dr.S.Veena,Associate Professor/CSE
19
K-means clustering with the iris data example

K-means algorithm from scikit-learn has been utilized in the following


example

# K-means clustering
>>> import numpy as np

>>> import pandas as pd

>>> import matplotlib.pyplot as plt

>>> from scipy.spatial.distance import cdist, pdist

>>> from sklearn.cluster import KMeans


>>> from sklearn.metrics import silhouette_score

>>> iris = pd.read_csv("iris.csv"

>>> print (iris.head())


Dr.S.Veena,Associate Professor/CSE
20
K-means clustering with the iris data example

Following code is used to separate class variable as dependent variable for creating colors in plot and
unsupervised learning algorithm applied on given x variables without any target variable does present:

>>> x_iris = iris.drop(['class'],axis=1)

>>> y_iris = iris["class"]

Dr.S.Veena,Associate Professor/CSE
21
K-means clustering with the iris data example

three clusters have been used

The maximum number of iterations chosen here is 300

>>> k_means_fit = KMeans(n_clusters=3,max_iter=300)

>>> k_means_fit.fit(x_iris)

>>> print ("\nK-Means Clustering - Confusion

Matrix\n\n",pd.crosstab(y_iris, k_means_fit.labels_,rownames = ["Actuall"],


colnames = ["Predicted"]) )

>>> print ("\nSilhouette-score: %0.3f" % silhouette_score(x_iris, k_means_fit.labels_,


metric='euclidean')

Dr.S.Veena,Associate Professor/CSE
22
K-means clustering with the iris data example

From the previous confusion matrix, we can see that all the setosa flowers are
clustered correctly, whereas 2 out of 50 versicolor, and 14 out of 50 virginica
flowers are incorrectly classified.

Dr.S.Veena,Associate Professor/CSE
23
K-means clustering with the iris data example
To perform sensitivity analysis to check how many number of clusters does actually provide
better explanation of segments:
>>> for k in range(2,10):
... k_means_fitk = KMeans(n_clusters=k,max_iter=300)

... k_means_fitk.fit(x_iris)

... print ("For K value",k,",Silhouette-score: %0.3f" %


silhouette_score(x_iris, k_means_fitk.labels_,
metric='euclidean'))

The silhouette coefficient values in the preceding results shows


that K value 2 and K value 3 have better scores than all the other
values.

Dr.S.Veena,Associate Professor/CSE
24
K-means clustering with the iris data example

we also need to see the average within cluster variation value and elbow plot
before concluding the optimal K value.
# Avg. within-cluster sum of squares

>>> K = range(1,10)

>>> KM = [KMeans(n_clusters=k).fit(x_iris) for k in K]


>>> centroids = [k.cluster_centers_ for k in KM]

>>> D_k = [cdist(x_iris, centrds, 'euclidean') for centrds in centroids]


>>> cIdx = [np.argmin(D,axis=1) for D in D_k]
>>> dist = [np.min(D,axis=1) for D in D_k]

>>> avgWithinSS = [sum(d)/x_iris.shape[0] for d in dist

Dr.S.Veena,Associate Professor/CSE
25
K-means clustering with the iris data example

# Total with-in sum of square


>>> wcss = [sum(d**2) for d in dist]
>>> tss = sum(pdist(x_iris)**2)/x_iris.shape[0]
>>> bss = tss-wcss
# elbow curve - Avg. within-cluster sum of squares
>>> fig = plt.figure()
>>> ax = fig.add_subplot(111)
>>> ax.plot(K, avgWithinSS, 'b*-')
>>> plt.grid(True)
>>> plt.xlabel('Number of clusters')
>>> plt.ylabel('Average within-cluster sum of squares')
Dr.S.Veena,Associate Professor/CSE
26
K-means clustering with the iris data example

Dr.S.Veena,Associate Professor/CSE
27
K-means clustering with the iris data example

From the elbow plot, it seems that at the value of three, the
slope changes drastically.
Here,we can select the optimal k-value as three.
# elbow curve - percentage of variance explained
>>> fig = plt.figure()
>>> ax = fig.add_subplot(111)
>>> ax.plot(K, bss/tss*100, 'b*-')
>>> plt.grid(True)
>>> plt.xlabel('Number of clusters')
>>> plt.ylabel('Percentage of variance explained')
>>> plt.show()
Dr.S.Veena,Associate Professor/CSE
28
K-means clustering with the iris data example

Dr.S.Veena,Associate Professor/CSE
29
Principal component analysis - PCA

• Principal component analysis (PCA) is the


dimensionality reduction technique which has so many
utilities.
• PCA reduces the dimensions of a dataset by projecting the
data onto a lower-dimensional subspace.
• For example, a 2D dataset could be reduced by projecting
the points onto a line.
• Each instance in the dataset would then be represented by
a single value, rather than a pair of values.
• In a similar way, a 3D dataset could be reduced to two
dimensions by projecting variables onto a plane.

Dr.S.Veena,Associate Professor/CSE
30
Principal component analysis - PCA

• PCA can easily be explained with the following diagram of a mechanical


bracket which has been drawn in the machine drawing module of a
mechanical engineering course.

Dr.S.Veena,Associate Professor/CSE
31
Principal component analysis - PCA

• The left- hand side of the diagram depicts the top view, front view,
and side view of the component.
• However, on the right-hand side, an isometric view has been
drawn, in which one single image has been used to visualize how
the component looks.
• So, one can imagine that the left-hand images are the actual
variables and the right-hand side is the first principal component,
in which most variance has been captured.
• Finally, three images have been replaced by a single image by
rotating the axis of direction. we replicate the same technique in
PCA analysis

Dr.S.Veena,Associate Professor/CSE
32
Principal component analysis - PCA
• Actual data has been shown in a 2D space, in which X and Y axis
are used to plot the data.
• Principal components are the ones in which maximum variation of
the data is captured.

Dr.S.Veena,Associate Professor/CSE
33
Principal component analysis - PCA

Fitting the principal components.


• The first principal component covers the maximum variance in the data
• second principal component is orthogonal to the first principal component, as we
know all principal components are orthogonal to each other.

Dr.S.Veena,Associate Professor/CSE
34
Principal component analysis - PCA

• We can represent the whole data with the first principal


component itself.
• It is advantageous to represent the data with fewer dimensions, to
save space and also to grab maximum variance in the data,which
can be utilized for supervised learning in the next stage.
• This is the core advantage of computing principal components.
• Eigenvectors are the axes (directions) along which a linear
transformation acts simply by stretching/compressing and/or
flipping
• Eigenvalues give the factors by which the compression occurs. In
another way, an eigenvector of a linear transformation is a
nonzero vector whose direction does not change when that linear
transformation is applied to it.

Dr.S.Veena,Associate Professor/CSE
35
Principal component analysis - PCA
• More formally, A is a linear transformation from a vector space and is a nonzero
vector, then eigenvector of A if is a scalar multiple of .
• The condition can be written as the following equation:

• In the preceding equation, is an eigenvector, A is a square matrix, and λ is a scalar


called an eigenvalue.
• The direction of an eigenvector remains the same after it has been transformed by A;
only its magnitude has changed, as indicated by the eigenvalue,

Dr.S.Veena,Associate Professor/CSE
36
Principal component analysis - PCA

• The following example describes how to calculate eigenvectors and eigenvalues from the square
matrix and its understanding.
• Note that eigenvectors and eigenvalues can be calculated only for square matrices (those with
the same dimensions of rows and columns)

• product of A and any eigenvector of A must be equal to the eigenvector multiplied


by the magnitude of eigenvalue:

Dr.S.Veena,Associate Professor/CSE
37
Principal component analysis - PCA
• A characteristic equation states that the determinant of the matrix, that is the
difference between the data matrix and the product of the identity matrix and an
eigenvalue is 0

• Both eigenvalues for the preceding matrix are equal to -2. We can use
eigenvalues to substitute for eigenvectors in an equation:

Dr.S.Veena,Associate Professor/CSE
38
Principal component analysis - PCA

• Substituting the value of eigenvalue in the preceding equation, we will obtain the
following formula:

• The preceding equation can be rewritten as a system of equations, as follows

• This equation indicates it can have multiple solutions of eigenvectors we can substitute
with any values which hold the preceding equation for verification of equation. Here,
we have used the vector [1 1] for verification, which seems to be proved

Dr.S.Veena,Associate Professor/CSE
39
Principal component analysis - PCA

• PCA needs unit eigenvectors to be used in calculations, hence we need to divide


the same with the norm or we need to normalize the eigenvector. The 2-norm
equation is shown as follows:

• The norm of the output vector is calculated as follows

• The unit eigenvector is shown as follows:

Dr.S.Veena,Associate Professor/CSE
40
Principal component analysis - PCA
PCA working methodology from first principles
• PCA working methodology is described in the following sample data, which has two
dimensions for each instance or data point. The objective here is to reduce the 2D data into
one dimension (also known as the principal component):

Dr.S.Veena,Associate Professor/CSE
41
Principal component analysis - PCA
PCA working methodology from first principles
The first step, prior to proceeding with any analysis, is to subtract the mean from all the observations,
which removes the scale factor of variables and makes them more uniform across dimensions.

Dr.S.Veena,Associate Professor/CSE
42
Principal component analysis - PCA
PCA working methodology from first principles
Principal components are calculated using two different
techniques:
• Covariance matrix of the data
• Singular value decomposition

Covariance is a measure of how much two variables change together


and it is a measure of the strength of the correlation between two sets
of variables.
If the covariance of two variables is zero, we can conclude that there
will not be any correlation between two sets of the variables. The
formula for covariance is as follows

Dr.S.Veena,Associate Professor/CSE
43
Principal component analysis - PCA
PCA working methodology from first principles

A sample covariance calculation is shown for X and Y variables in the following formulas. However, it is a 2 x 2 matrix of an entire
covariance matrix (also, it is a square matrix).

Dr.S.Veena,Associate Professor/CSE
44
Principal component analysis - PCA
PCA working methodology from first principles
we can calculate eigenvectors and eigenvalue

By solving the preceding equation, we can obtain eigenvectors and eigenvalues, as


follows:

Dr.S.Veena,Associate Professor/CSE
45
Principal component analysis - PCA
PCA working methodology from first principles

Python syntax:

>>> import numpy as np

>>> w, v = np.linalg.eig(np.array([[ 0.91335 ,0.75969 ],[0.75969,0.69702]]))

\>>> print ("\nEigen Values\n", w)

>>> print ("\nEigen Vectors\n", v)

Dr.S.Veena,Associate Professor/CSE
46
Principal component analysis - PCA
PCA working methodology from first principles
Once we obtain the eigenvectors and eigenvalues, we can project data into principal
components.
The first eigenvector has the greatest eigenvalue and is the first principal component, as we
would like to reduce the original 2D data into 1D data.

From the preceding result, we can see the 1D projection of the first principal component from
the original 2D data.
Also, the eigenvalue of 1.5725 explains the fact that the principal component explains
variance of 57 percent more than the original variables.
Dr.S.Veena,Associate Professor/CSE
47
Principal component analysis - PCA
PCA applied on handwritten digits using scikit-learn
• Handwritten digits example from scikit- learn datasets
• Handwritten digits are created from 0-9 and its respective 64 features (8 x 8 matrix)
of pixel intensities.
• The idea is to represent the original features of 64 dimensions into as few as possible

# PCA - Principal Component Analysis


>>> import matplotlib.pyplot as plt
>>> from sklearn.decomposition import PCA
>>> from sklearn.datasets import load_digits
>>> digits = load_digits()

>>> X = digits.data
>>> y = digits.target

>>> print (digits.data[0].reshape(8,8))

Dr.S.Veena,Associate Professor/CSE
48
Principal component analysis - PCA
PCA applied on handwritten digits using scikit-learn
Plot the graph using the plt.show function:
>>> plt.matshow(digits.images[0])

>>> plt.show()

Dr.S.Veena,Associate Professor/CSE
49
Principal component analysis - PCA
PCA applied on handwritten digits using scikit-learn
Before performing PCA, it is advisable to perform scaling of input data to eliminate
any issues due to different dimensions of the data.
In the following code, we have applied scaling on all the columns separately:
>>> from sklearn.preprocessing import scale
>>> X_scale = scale(X,axis=0)
In the following, we have used two principal components, so that we can represent
the performance on a 2D graph.
>>> pca = PCA(n_components=2)
>>> reduced_X = pca.fit_transform(X_scale)
>>> zero_x, zero_y = [],[] ; one_x, one_y = [],[]
>>> two_x,two_y = [],[]; three_x, three_y = [],[]
>>> four_x,four_y = [],[]; five_x,five_y = [],[]
>>> six_x,six_y = [],[]; seven_x,seven_y = [],[]
>>> eight_x,eight_y = [],[]; nine_x,nine_y = [],[]

Dr.S.Veena,Associate Professor/CSE
50
Principal component analysis - PCA
PCA applied on handwritten digits using scikit-learn
we are appending the relevant principal components to each digit separately so that we
can create a scatter plot of all 10 digits

Dr.S.Veena,Associate Professor/CSE
51
Principal component analysis - PCA
PCA applied on handwritten digits using scikit-learn
.

Dr.S.Veena,Associate Professor/CSE
52
Principal component analysis - PCA
PCA applied on handwritten digits using scikit-learn

Dr.S.Veena,Associate Professor/CSE
53
Principal component analysis - PCA
PCA applied on handwritten digits using scikit-learn

• In the following code, we have applied three PCAs so


that we can get a better view of the data in a 3D space.
• The procedure is very much similar as with two PCAs,
except for creating one extra dimension for each digit
(X, Y, and Z)

Dr.S.Veena,Associate Professor/CSE
54
Principal component analysis - PCA
PCA applied on handwritten digits using scikit-learn

Dr.S.Veena,Associate Professor/CSE
55
Principal component analysis - PCA
PCA applied on handwritten digits using scikit-learn

Dr.S.Veena,Associate Professor/CSE
56
Principal component analysis - PCA
PCA applied on handwritten digits using scikit-learn

Dr.S.Veena,Associate Professor/CSE
57
Principal component analysis - PCA
PCA applied on handwritten digits using scikit-learn
In a 3D plot, Digit 2 is at the extreme left and digit 0 is at the lower part of the plot. Whereas, digit 4 is at the top-right end, digit 6
seems to be more towards the PC 1 axis.

Dr.S.Veena,Associate Professor/CSE
58
Principal component analysis - PCA
PCA applied on handwritten digits using scikit-learn
● Choosing the number of PCAs to be extracted is an open-ended question in unsupervised
learning, but there are some turnarounds to get an approximated view.
● There are two ways we can determine the number of clusters:
○ Check where the total variance explained is diminishing marginally
○ Total variance explained greater than 80 percent
● The following code does provide the total variance explained with the change in number of
principal components.

Dr.S.Veena,Associate Professor/CSE
59
Principal component analysis - PCA
PCA applied on handwritten digits using scikit-learn

Dr.S.Veena,Associate Professor/CSE
60
Singular value decomposition - SVD
• Many implementations of PCA use singular value decomposition
to calculate eigenvectors and eigenvalues.
• SVD is given by the following equation:

• Columns of U are called left singular vectors of the data matrix, the
columns of V are its right singular vectors, and the diagonal entries
of are its singular values.
• Left singular vectors are the eigenvectors of the covariance matrix
and the diagonal element of are the square roots of the
eigenvalues of the covariance matrix

Dr.S.Veena,Associate Professor/CSE
61
Singular value decomposition - SVD

Advantages of SVD:
• SVD can be applied even on rectangular matrices; whereas, eigenvalues are
defined only for square matrices.
• The equivalent of eigenvalues obtained through the SVD method are called singular
values, and vectors obtained equivalent to eigenvectors are known as singular
vectors.
• However, as they are rectangular in nature, we need to have left singular vectors
and right singular vectors respectively for their dimensions.
• If a matrix A has a matrix of eigenvectors P that is not invertible, then A does not
have an eigen decomposition.
• However, if A is m x n real matrix with m > n, then A can be written using a singular
value decomposition.
• Both U and V are orthogonal matrices, which means UT U = I (I with m x m
dimension) or VT V = I (here I with n x n dimension), where two identity matrices
may have different dimensions.
• is a non-negative diagonal matrix with m x n dimensions.

Dr.S.Veena,Associate Professor/CSE
62
Singular value decomposition - SVD

• Computation of singular values and singular vectors

• First stage - singular values/eigenvalues are calculated with the equation. Once we
obtain the singular/eigenvalues, we will substitute to determine the V or right
singular/eigen vectors:

• we will substitute to obtain the left singular vectors U using the equation mentioned
as follows

Dr.S.Veena,Associate Professor/CSE
63
Singular value decomposition - SVD

SVD applied on handwritten digits using scikit-learn

SVD can be applied on the same handwritten digits data to perform an


apple-to-apple comparison of techniques
# SVD
>>> import matplotlib.pyplot as plt
>>> from sklearn.datasets import load_digits
>>> digits = load_digits()
>>> X = digits.data
>>> y = digits.target

Dr.S.Veena,Associate Professor/CSE
64
Singular value decomposition - SVD

SVD applied on handwritten digits using scikit-learn


• In the following code, 15 singular vectors with 300 iterations are used,
• Two types of SVD functions are used ,
– a function randomized_svd provide the decomposition of the original matrix
– a TruncatedSVD can provide total variance explained ratio.

>>> from sklearn.utils.extmath import randomized_svd


>>> U,Sigma,VT =
randomized_svd(X,n_components=15,n_iter=300,random_state=42)
>>> import pandas as pd

>>> VT_df = pd.DataFrame(VT)


>>> print ("\nShape of Original Matrix:",X.shape)

>>> print ("\nShape of Left Singular vector:",U.shape)


>>> print ("Shape of Singular value:",Sigma.shape)
>>> print ("Shape of Right Singular vector",VT.shape)

Dr.S.Veena,Associate Professor/CSE
65
Singular value decomposition - SVD

SVD applied on handwritten digits using scikit-learn

The original matrix of dimension (1797 x 64) has been decomposed into
a left singular vector (1797 x 15), singular value (diagonal matrix of 15),
and right singular vector (15 x 64).

Dr.S.Veena,Associate Professor/CSE
66
Singular value decomposition - SVD

SVD applied on handwritten digits using scikit-learn

>>> n_comps = 15
>>> from sklearn.decomposition import
TruncatedSVD
>>> svd = TruncatedSVD(n_components=n_comps,
n_iter=300, random_state=42)
>>> reduced_X = svd.fit_transform(X)
>>> print("\nTotal Variance explained for %d
singular features are %0.3f"%(n_comps,
svd.explained_variance_ratio_.sum()))

Dr.S.Veena,Associate Professor/CSE
67
Singular value decomposition - SVD

The following code illustrates the change in total variance explained


with respective to change in number of singular values:

Dr.S.Veena,Associate Professor/CSE
68
Singular value decomposition - SVD

Dr.S.Veena,Associate Professor/CSE
69

You might also like