0% found this document useful (0 votes)

268 views69 pages

K-Means and PCA

The document discusses K-means clustering, principal component analysis (PCA), and singular value decomposition (SVD). It provides an overview of K-means clustering, including how it works, determining the optimal number of clusters, and evaluating clusters. It demonstrates K-means clustering on the iris dataset. For PCA, it introduces the concept and shows how to apply it to handwritten digits using scikit-learn. Similarly for SVD, it introduces the concept and applies it to handwritten digits with scikit-learn. The document uses examples and illustrations to explain key aspects of these three unsupervised learning techniques.

Uploaded by

vdjohn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

268 views69 pages

K-Means and PCA

Uploaded by

vdjohn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 69

SRM INSTITUTE OF SCIENCE AND TECHNOLOGY,

RAMAPURAM CAMPUS

K-means Clustering,
Principal Component Analysis and
Singular Value Decomposition

Dr.S.Veena,Associate Professor/CSE 1
Overview

K-means Clustering
• Introduction to K-means Clustering
• K-means working methodology from first principles
• Optimal number of clusters and cluster evaluation
• The elbow method
• K-means clustering with the iris data example
Principal Component Analysis(PCA)
• Introduction to PCA
• PCA working methodology from first principles
• PCA applied on handwritten digits using scikit-learn
Singular Value Decomposition(SVD)
• Introduction to SVD
• SVD applied on handwritten digits using scikit-learn
Dr.S.Veena,Associate Professor/CSE
2
K-means Clustering
Introduction to K-Means Clustering
• Clustering is the task of grouping observations in such a way
that members of the same cluster are more similar to each
other and members of different clusters are very different from
each other.
• Examples
– In anti-money laundering measures, suspicious activities and individuals
can be identified using anomaly detection
– In biology, clustering is used to find groups of genes with similar
expression patterns
– In marketing analytics, clustering is used to find segments of similar
customers so that different marketing strategies can be applied to
different customer segments accordingly

Dr.S.Veena,Associate Professor/CSE
3
K-means Clustering

Introduction to K-Means Clustering

● K-means clustering algorithm
○ an iterative process of moving the centers of clusters or centroids
to the mean position of their constituent points
○ reassigning instances to their closest clusters iteratively until
there is no significant change in the number of cluster centers
possible or number of iterations reached.

• The cost function of k-means is determined by the Euclidean

distance (square-norm) between the observations belonging to that
cluster with its respective centroid value.

Dr.S.Veena,Associate Professor/CSE
4
K-means Clustering

Introduction to K-Means Clustering

• An intuitive way to understand the equation is, if there is only one cluster
(k=1), then the distances between all the observations are compared with its
single mean.
• if number of clusters increases to 2 (k= 2), then two-means are calculated
and a few of the observations are assigned to cluster 1 and other
observations are assigned to cluster two-based on proximity.
• Subsequently, distances are calculated in cost functions by applying the
same distance measure, but separately to their cluster centers:

Dr.S.Veena,Associate Professor/CSE
5
K-means Clustering
K-means working methodology from first principles
The k-means working methodology is illustrated in the following example in
which 12 instances are considered with their X and Y values. The task is to
determine the optimal clusters out of the data.

Dr.S.Veena,Associate Professor/CSE
6
K-means Clustering

K-means working methodology from first principles

Dr.S.Veena,Associate Professor/CSE
7
K-means Clustering

K-means working methodology from first principles

Iteration 1:
• Let us assume two centers from two instances out of all the 12 instances.
• chosen instance 1 (X = 7, Y = 8) and instance 8 (X = 1, Y = 4),
• For each instance, calculate its Euclidean distances with respect to
• both centroids and assign it to the nearest cluster center.

The Euclidean distance between two points A (X1, Y1) and B (X2, Y2) is shown
as follows:

Dr.S.Veena,Associate Professor/CSE
8
K-means Clustering

Dr.S.Veena,Associate Professor/CSE
9
K-means Clustering
Assignment of instances to both centroids

Dr.S.Veena,Associate Professor/CSE
10
K-means Clustering
Iteration 2: In this iteration, new centroids are calculated from the assigned
instances for that cluster or centroid. New centroids are calculated based on the
simple average of the assigned points.

Dr.S.Veena,Associate Professor/CSE
11
K-means Clustering

Dr.S.Veena,Associate Professor/CSE
12
K-means Clustering

Dr.S.Veena,Associate Professor/CSE
13
K-means Clustering
Iteration 3: In this iteration, new assignments are calculated based on the
Euclidean distance between instances and new centroids. In the event of any
changes, new centroids will be calculated iteratively until no changes in
assignments are possible or the number of iterations is reached.

Dr.S.Veena,Associate Professor/CSE
14
K-means Clustering

Optimal number of clusters and cluster evaluation

• Though selecting number of clusters is more of an art than science, optimal
number of clusters are chosen where, there will not be much marginal
increase in explanation ability by increasing number of clusters are
possible.
• In practical applications, usually business should be able to provide what
would be approximate number of clusters they are looking for.

Dr.S.Veena,Associate Professor/CSE
15
K-means Clustering

The elbow method

• The elbow method is used to determine the optimal number of clusters in k-
means clustering.
• The elbow method plots the value of the cost function produced by different
• values of k.
• As the value of K increases, there will be fewer elements in the cluster.
So average distortion will decrease. The lesser number of elements means
closer to the centroid. So, the point where this distortion declines the most
is the elbow point.
• if k increases, average distortion will decrease, each cluster will have fewer
constituent instances, and the instances will be closer to their respective
centroids.
• However, the improvements in average distortion will decline as k increases.
• The value of k at which improvement in distortion declines the most is called
the elbow, at which we should stop dividing the data into further clusters.

Dr.S.Veena,Associate Professor/CSE
16
K-means Clustering

The elbow method

Dr.S.Veena,Associate Professor/CSE
17
K-means Clustering

Evaluation of clusters with silhouette coefficient

• the silhouette coefficient is a measure of the compactness and separation of
the clusters.
• Higher values represent a better quality of cluster.
• The silhouette coefficient is higher for compact clusters that are well
separated and lower for overlapping clusters.
• Silhouette coefficient values do change from -1 to +1, and the higher the
value is, the better.
• The silhouette coefficient is calculated per instance. For a set of instances, it
is calculated as the mean of the individual sample's scores.

a is the mean distance between the instances in the cluster,

b is the mean distance between the instance and the instances in the next
closest cluster.

Dr.S.Veena,Associate Professor/CSE
18
K-means clustering with the iris data example

Data set : http://archive.ics.uci.edu/ml/datasets/Iris.

The iris data has three types of flowers: setosa, versicolor,

and virginica and their respective measurements of sepal
length, sepal width, petal length, and petal width.

The task is to group the flowers based on their

measurements.

Dr.S.Veena,Associate Professor/CSE
19
K-means clustering with the iris data example

K-means algorithm from scikit-learn has been utilized in the following

example

# K-means clustering
>>> import numpy as np

>>> import pandas as pd

>>> import matplotlib.pyplot as plt

>>> from scipy.spatial.distance import cdist, pdist

>>> from sklearn.cluster import KMeans

>>> from sklearn.metrics import silhouette_score

>>> iris = pd.read_csv("iris.csv"

>>> print (iris.head())

Dr.S.Veena,Associate Professor/CSE
20
K-means clustering with the iris data example

Following code is used to separate class variable as dependent variable for creating colors in plot and
unsupervised learning algorithm applied on given x variables without any target variable does present:

>>> x_iris = iris.drop(['class'],axis=1)

>>> y_iris = iris["class"]

Dr.S.Veena,Associate Professor/CSE
21
K-means clustering with the iris data example

three clusters have been used

The maximum number of iterations chosen here is 300

>>> k_means_fit = KMeans(n_clusters=3,max_iter=300)

>>> k_means_fit.fit(x_iris)

>>> print ("\nK-Means Clustering - Confusion

Matrix\n\n",pd.crosstab(y_iris, k_means_fit.labels_,rownames = ["Actuall"],

colnames = ["Predicted"]) )

>>> print ("\nSilhouette-score: %0.3f" % silhouette_score(x_iris, k_means_fit.labels_,

metric='euclidean')

Dr.S.Veena,Associate Professor/CSE
22
K-means clustering with the iris data example

From the previous confusion matrix, we can see that all the setosa flowers are
clustered correctly, whereas 2 out of 50 versicolor, and 14 out of 50 virginica
flowers are incorrectly classified.

Dr.S.Veena,Associate Professor/CSE
23
K-means clustering with the iris data example
To perform sensitivity analysis to check how many number of clusters does actually provide
better explanation of segments:
>>> for k in range(2,10):
... k_means_fitk = KMeans(n_clusters=k,max_iter=300)

... k_means_fitk.fit(x_iris)

... print ("For K value",k,",Silhouette-score: %0.3f" %

silhouette_score(x_iris, k_means_fitk.labels_,
metric='euclidean'))

The silhouette coefficient values in the preceding results shows

that K value 2 and K value 3 have better scores than all the other
values.

Dr.S.Veena,Associate Professor/CSE
24
K-means clustering with the iris data example

we also need to see the average within cluster variation value and elbow plot
before concluding the optimal K value.
# Avg. within-cluster sum of squares

>>> K = range(1,10)

>>> KM = [KMeans(n_clusters=k).fit(x_iris) for k in K]

>>> centroids = [k.cluster_centers_ for k in KM]

>>> D_k = [cdist(x_iris, centrds, 'euclidean') for centrds in centroids]

>>> cIdx = [np.argmin(D,axis=1) for D in D_k]
>>> dist = [np.min(D,axis=1) for D in D_k]

>>> avgWithinSS = [sum(d)/x_iris.shape[0] for d in dist

Dr.S.Veena,Associate Professor/CSE
25
K-means clustering with the iris data example

# Total with-in sum of square

>>> wcss = [sum(d**2) for d in dist]
>>> tss = sum(pdist(x_iris)**2)/x_iris.shape[0]
>>> bss = tss-wcss
# elbow curve - Avg. within-cluster sum of squares
>>> fig = plt.figure()
>>> ax = fig.add_subplot(111)
>>> ax.plot(K, avgWithinSS, 'b*-')
>>> plt.grid(True)
>>> plt.xlabel('Number of clusters')
>>> plt.ylabel('Average within-cluster sum of squares')
Dr.S.Veena,Associate Professor/CSE
26
K-means clustering with the iris data example

Dr.S.Veena,Associate Professor/CSE
27
K-means clustering with the iris data example

From the elbow plot, it seems that at the value of three, the
slope changes drastically.
Here,we can select the optimal k-value as three.
# elbow curve - percentage of variance explained
>>> fig = plt.figure()
>>> ax = fig.add_subplot(111)
>>> ax.plot(K, bss/tss*100, 'b*-')
>>> plt.grid(True)
>>> plt.xlabel('Number of clusters')
>>> plt.ylabel('Percentage of variance explained')
>>> plt.show()
Dr.S.Veena,Associate Professor/CSE
28
K-means clustering with the iris data example

Dr.S.Veena,Associate Professor/CSE
29
Principal component analysis - PCA

• Principal component analysis (PCA) is the

dimensionality reduction technique which has so many
utilities.
• PCA reduces the dimensions of a dataset by projecting the
data onto a lower-dimensional subspace.
• For example, a 2D dataset could be reduced by projecting
the points onto a line.
• Each instance in the dataset would then be represented by
a single value, rather than a pair of values.
• In a similar way, a 3D dataset could be reduced to two
dimensions by projecting variables onto a plane.

Dr.S.Veena,Associate Professor/CSE
30
Principal component analysis - PCA

• PCA can easily be explained with the following diagram of a mechanical

bracket which has been drawn in the machine drawing module of a
mechanical engineering course.

Dr.S.Veena,Associate Professor/CSE
31
Principal component analysis - PCA

• The left- hand side of the diagram depicts the top view, front view,
and side view of the component.
• However, on the right-hand side, an isometric view has been
drawn, in which one single image has been used to visualize how
the component looks.
• So, one can imagine that the left-hand images are the actual
variables and the right-hand side is the first principal component,
in which most variance has been captured.
• Finally, three images have been replaced by a single image by
rotating the axis of direction. we replicate the same technique in
PCA analysis

Dr.S.Veena,Associate Professor/CSE
32
Principal component analysis - PCA
• Actual data has been shown in a 2D space, in which X and Y axis
are used to plot the data.
• Principal components are the ones in which maximum variation of
the data is captured.

Dr.S.Veena,Associate Professor/CSE
33
Principal component analysis - PCA

Fitting the principal components.

• The first principal component covers the maximum variance in the data
• second principal component is orthogonal to the first principal component, as we
know all principal components are orthogonal to each other.

Dr.S.Veena,Associate Professor/CSE
34
Principal component analysis - PCA

• We can represent the whole data with the first principal

component itself.
• It is advantageous to represent the data with fewer dimensions, to
save space and also to grab maximum variance in the data,which
can be utilized for supervised learning in the next stage.
• This is the core advantage of computing principal components.
• Eigenvectors are the axes (directions) along which a linear
transformation acts simply by stretching/compressing and/or
flipping
• Eigenvalues give the factors by which the compression occurs. In
another way, an eigenvector of a linear transformation is a
nonzero vector whose direction does not change when that linear
transformation is applied to it.

Dr.S.Veena,Associate Professor/CSE
35
Principal component analysis - PCA
• More formally, A is a linear transformation from a vector space and is a nonzero
vector, then eigenvector of A if is a scalar multiple of .
• The condition can be written as the following equation:

• In the preceding equation, is an eigenvector, A is a square matrix, and λ is a scalar

called an eigenvalue.
• The direction of an eigenvector remains the same after it has been transformed by A;
only its magnitude has changed, as indicated by the eigenvalue,

Dr.S.Veena,Associate Professor/CSE
36
Principal component analysis - PCA

• The following example describes how to calculate eigenvectors and eigenvalues from the square
matrix and its understanding.
• Note that eigenvectors and eigenvalues can be calculated only for square matrices (those with
the same dimensions of rows and columns)

• product of A and any eigenvector of A must be equal to the eigenvector multiplied

by the magnitude of eigenvalue:

Dr.S.Veena,Associate Professor/CSE
37
Principal component analysis - PCA
• A characteristic equation states that the determinant of the matrix, that is the
difference between the data matrix and the product of the identity matrix and an
eigenvalue is 0

• Both eigenvalues for the preceding matrix are equal to -2. We can use
eigenvalues to substitute for eigenvectors in an equation:

Dr.S.Veena,Associate Professor/CSE
38
Principal component analysis - PCA

• Substituting the value of eigenvalue in the preceding equation, we will obtain the
following formula:

• The preceding equation can be rewritten as a system of equations, as follows

• This equation indicates it can have multiple solutions of eigenvectors we can substitute
with any values which hold the preceding equation for verification of equation. Here,
we have used the vector [1 1] for verification, which seems to be proved

Dr.S.Veena,Associate Professor/CSE
39
Principal component analysis - PCA

• PCA needs unit eigenvectors to be used in calculations, hence we need to divide

the same with the norm or we need to normalize the eigenvector. The 2-norm
equation is shown as follows:

• The norm of the output vector is calculated as follows

• The unit eigenvector is shown as follows:

Dr.S.Veena,Associate Professor/CSE
40
Principal component analysis - PCA
PCA working methodology from first principles
• PCA working methodology is described in the following sample data, which has two
dimensions for each instance or data point. The objective here is to reduce the 2D data into
one dimension (also known as the principal component):

Dr.S.Veena,Associate Professor/CSE
41
Principal component analysis - PCA
PCA working methodology from first principles
The first step, prior to proceeding with any analysis, is to subtract the mean from all the observations,
which removes the scale factor of variables and makes them more uniform across dimensions.

Dr.S.Veena,Associate Professor/CSE
42
Principal component analysis - PCA
PCA working methodology from first principles
Principal components are calculated using two different
techniques:
• Covariance matrix of the data
• Singular value decomposition

Covariance is a measure of how much two variables change together

and it is a measure of the strength of the correlation between two sets
of variables.
If the covariance of two variables is zero, we can conclude that there
will not be any correlation between two sets of the variables. The
formula for covariance is as follows

Dr.S.Veena,Associate Professor/CSE
43
Principal component analysis - PCA
PCA working methodology from first principles

A sample covariance calculation is shown for X and Y variables in the following formulas. However, it is a 2 x 2 matrix of an entire
covariance matrix (also, it is a square matrix).

Dr.S.Veena,Associate Professor/CSE
44
Principal component analysis - PCA
PCA working methodology from first principles
we can calculate eigenvectors and eigenvalue

By solving the preceding equation, we can obtain eigenvectors and eigenvalues, as

follows:

Dr.S.Veena,Associate Professor/CSE
45
Principal component analysis - PCA
PCA working methodology from first principles

Python syntax:

>>> import numpy as np

>>> w, v = np.linalg.eig(np.array([[ 0.91335 ,0.75969 ],[0.75969,0.69702]]))

\>>> print ("\nEigen Values\n", w)

>>> print ("\nEigen Vectors\n", v)

Dr.S.Veena,Associate Professor/CSE
46
Principal component analysis - PCA
PCA working methodology from first principles
Once we obtain the eigenvectors and eigenvalues, we can project data into principal
components.
The first eigenvector has the greatest eigenvalue and is the first principal component, as we
would like to reduce the original 2D data into 1D data.

From the preceding result, we can see the 1D projection of the first principal component from
the original 2D data.
Also, the eigenvalue of 1.5725 explains the fact that the principal component explains
variance of 57 percent more than the original variables.
Dr.S.Veena,Associate Professor/CSE
47
Principal component analysis - PCA
PCA applied on handwritten digits using scikit-learn
• Handwritten digits example from scikit- learn datasets
• Handwritten digits are created from 0-9 and its respective 64 features (8 x 8 matrix)
of pixel intensities.
• The idea is to represent the original features of 64 dimensions into as few as possible

# PCA - Principal Component Analysis

>>> import matplotlib.pyplot as plt
>>> from sklearn.decomposition import PCA
>>> from sklearn.datasets import load_digits
>>> digits = load_digits()

>>> X = digits.data
>>> y = digits.target

>>> print (digits.data[0].reshape(8,8))

Dr.S.Veena,Associate Professor/CSE
48
Principal component analysis - PCA
PCA applied on handwritten digits using scikit-learn
Plot the graph using the plt.show function:
>>> plt.matshow(digits.images[0])

>>> plt.show()

Dr.S.Veena,Associate Professor/CSE
49
Principal component analysis - PCA
PCA applied on handwritten digits using scikit-learn
Before performing PCA, it is advisable to perform scaling of input data to eliminate
any issues due to different dimensions of the data.
In the following code, we have applied scaling on all the columns separately:
>>> from sklearn.preprocessing import scale
>>> X_scale = scale(X,axis=0)
In the following, we have used two principal components, so that we can represent
the performance on a 2D graph.
>>> pca = PCA(n_components=2)
>>> reduced_X = pca.fit_transform(X_scale)
>>> zero_x, zero_y = [],[] ; one_x, one_y = [],[]
>>> two_x,two_y = [],[]; three_x, three_y = [],[]
>>> four_x,four_y = [],[]; five_x,five_y = [],[]
>>> six_x,six_y = [],[]; seven_x,seven_y = [],[]
>>> eight_x,eight_y = [],[]; nine_x,nine_y = [],[]

Dr.S.Veena,Associate Professor/CSE
50
Principal component analysis - PCA
PCA applied on handwritten digits using scikit-learn
we are appending the relevant principal components to each digit separately so that we
can create a scatter plot of all 10 digits

Dr.S.Veena,Associate Professor/CSE
51
Principal component analysis - PCA
PCA applied on handwritten digits using scikit-learn
.

Dr.S.Veena,Associate Professor/CSE
52
Principal component analysis - PCA
PCA applied on handwritten digits using scikit-learn

Dr.S.Veena,Associate Professor/CSE
53
Principal component analysis - PCA
PCA applied on handwritten digits using scikit-learn

• In the following code, we have applied three PCAs so

that we can get a better view of the data in a 3D space.
• The procedure is very much similar as with two PCAs,
except for creating one extra dimension for each digit
(X, Y, and Z)

Dr.S.Veena,Associate Professor/CSE
54
Principal component analysis - PCA
PCA applied on handwritten digits using scikit-learn

Dr.S.Veena,Associate Professor/CSE
55
Principal component analysis - PCA
PCA applied on handwritten digits using scikit-learn

Dr.S.Veena,Associate Professor/CSE
56
Principal component analysis - PCA
PCA applied on handwritten digits using scikit-learn

Dr.S.Veena,Associate Professor/CSE
57
Principal component analysis - PCA
PCA applied on handwritten digits using scikit-learn
In a 3D plot, Digit 2 is at the extreme left and digit 0 is at the lower part of the plot. Whereas, digit 4 is at the top-right end, digit 6
seems to be more towards the PC 1 axis.

Dr.S.Veena,Associate Professor/CSE
58
Principal component analysis - PCA
PCA applied on handwritten digits using scikit-learn
● Choosing the number of PCAs to be extracted is an open-ended question in unsupervised
learning, but there are some turnarounds to get an approximated view.
● There are two ways we can determine the number of clusters:
○ Check where the total variance explained is diminishing marginally
○ Total variance explained greater than 80 percent
● The following code does provide the total variance explained with the change in number of
principal components.

Dr.S.Veena,Associate Professor/CSE
59
Principal component analysis - PCA
PCA applied on handwritten digits using scikit-learn

Dr.S.Veena,Associate Professor/CSE
60
Singular value decomposition - SVD
• Many implementations of PCA use singular value decomposition
to calculate eigenvectors and eigenvalues.
• SVD is given by the following equation:

• Columns of U are called left singular vectors of the data matrix, the
columns of V are its right singular vectors, and the diagonal entries
of are its singular values.
• Left singular vectors are the eigenvectors of the covariance matrix
and the diagonal element of are the square roots of the
eigenvalues of the covariance matrix

Dr.S.Veena,Associate Professor/CSE
61
Singular value decomposition - SVD

Advantages of SVD:
• SVD can be applied even on rectangular matrices; whereas, eigenvalues are
defined only for square matrices.
• The equivalent of eigenvalues obtained through the SVD method are called singular
values, and vectors obtained equivalent to eigenvectors are known as singular
vectors.
• However, as they are rectangular in nature, we need to have left singular vectors
and right singular vectors respectively for their dimensions.
• If a matrix A has a matrix of eigenvectors P that is not invertible, then A does not
have an eigen decomposition.
• However, if A is m x n real matrix with m > n, then A can be written using a singular
value decomposition.
• Both U and V are orthogonal matrices, which means UT U = I (I with m x m
dimension) or VT V = I (here I with n x n dimension), where two identity matrices
may have different dimensions.
• is a non-negative diagonal matrix with m x n dimensions.

Dr.S.Veena,Associate Professor/CSE
62
Singular value decomposition - SVD

• Computation of singular values and singular vectors

• First stage - singular values/eigenvalues are calculated with the equation. Once we
obtain the singular/eigenvalues, we will substitute to determine the V or right
singular/eigen vectors:

• we will substitute to obtain the left singular vectors U using the equation mentioned
as follows

Dr.S.Veena,Associate Professor/CSE
63
Singular value decomposition - SVD

SVD applied on handwritten digits using scikit-learn

SVD can be applied on the same handwritten digits data to perform an

apple-to-apple comparison of techniques
# SVD
>>> import matplotlib.pyplot as plt
>>> from sklearn.datasets import load_digits
>>> digits = load_digits()
>>> X = digits.data
>>> y = digits.target

Dr.S.Veena,Associate Professor/CSE
64
Singular value decomposition - SVD

SVD applied on handwritten digits using scikit-learn

• In the following code, 15 singular vectors with 300 iterations are used,
• Two types of SVD functions are used ,
– a function randomized_svd provide the decomposition of the original matrix
– a TruncatedSVD can provide total variance explained ratio.

>>> from sklearn.utils.extmath import randomized_svd

>>> U,Sigma,VT =
randomized_svd(X,n_components=15,n_iter=300,random_state=42)
>>> import pandas as pd

>>> VT_df = pd.DataFrame(VT)

>>> print ("\nShape of Original Matrix:",X.shape)

>>> print ("\nShape of Left Singular vector:",U.shape)

>>> print ("Shape of Singular value:",Sigma.shape)
>>> print ("Shape of Right Singular vector",VT.shape)

Dr.S.Veena,Associate Professor/CSE
65
Singular value decomposition - SVD

SVD applied on handwritten digits using scikit-learn

The original matrix of dimension (1797 x 64) has been decomposed into
a left singular vector (1797 x 15), singular value (diagonal matrix of 15),
and right singular vector (15 x 64).

Dr.S.Veena,Associate Professor/CSE
66
Singular value decomposition - SVD

SVD applied on handwritten digits using scikit-learn

>>> n_comps = 15
>>> from sklearn.decomposition import
TruncatedSVD
>>> svd = TruncatedSVD(n_components=n_comps,
n_iter=300, random_state=42)
>>> reduced_X = svd.fit_transform(X)
>>> print("\nTotal Variance explained for %d
singular features are %0.3f"%(n_comps,
svd.explained_variance_ratio_.sum()))

Dr.S.Veena,Associate Professor/CSE
67
Singular value decomposition - SVD

The following code illustrates the change in total variance explained

with respective to change in number of singular values:

Dr.S.Veena,Associate Professor/CSE
68
Singular value decomposition - SVD

Dr.S.Veena,Associate Professor/CSE
69

Mastering The VC Games - Jeffrey Bussgang
0% (1)
Mastering The VC Games - Jeffrey Bussgang
10 pages
Document Management Plan Template
No ratings yet
Document Management Plan Template
7 pages
Unsupervised Learning 2024-PPG
No ratings yet
Unsupervised Learning 2024-PPG
85 pages
03 - K Means Clustering On Iris Datasets
No ratings yet
03 - K Means Clustering On Iris Datasets
4 pages
Supervised Learning (Classification and Regression)
No ratings yet
Supervised Learning (Classification and Regression)
14 pages
Cluster Analysis: Concepts and Techniques - Chapter 7
100% (1)
Cluster Analysis: Concepts and Techniques - Chapter 7
60 pages
Dimension Reduction
No ratings yet
Dimension Reduction
15 pages
Unit - 4 Machine Learning
100% (1)
Unit - 4 Machine Learning
84 pages
Curse of Dimensionality
No ratings yet
Curse of Dimensionality
9 pages
Data Preprocessing
No ratings yet
Data Preprocessing
77 pages
U L D R: Nsupervised Earning and Imensionality Eduction
No ratings yet
U L D R: Nsupervised Earning and Imensionality Eduction
58 pages
Data Mining
No ratings yet
Data Mining
27 pages
Introduction To Machine Learning: K-Nearest Neighbor Algorithm
No ratings yet
Introduction To Machine Learning: K-Nearest Neighbor Algorithm
25 pages
Assignment # 01 Bscs - 7 Semester: Machine Learning
100% (1)
Assignment # 01 Bscs - 7 Semester: Machine Learning
5 pages
Distance-Based Methods - KNN
No ratings yet
Distance-Based Methods - KNN
8 pages
Cluster
100% (1)
Cluster
72 pages
Support Vector Machines: Dominik Wisniewski Wojciech Wawrzyniak
No ratings yet
Support Vector Machines: Dominik Wisniewski Wojciech Wawrzyniak
16 pages
Day 5 Supervised Technique-Decision Tree For Classification PDF
100% (1)
Day 5 Supervised Technique-Decision Tree For Classification PDF
58 pages
Answers To Problems For Data Mining and Predictive Analytics (2nd Edition) by Larose
No ratings yet
Answers To Problems For Data Mining and Predictive Analytics (2nd Edition) by Larose
12 pages
Introduction To Machine Learning PDF
100% (1)
Introduction To Machine Learning PDF
17 pages
Support Vector Machines PDF
100% (1)
Support Vector Machines PDF
37 pages
ML Unit 2
No ratings yet
ML Unit 2
90 pages
Clustering (Unit 3)
100% (2)
Clustering (Unit 3)
71 pages
Clustering K-Means
100% (2)
Clustering K-Means
28 pages
02 - Decision Tree Classification On Iris Dataset
No ratings yet
02 - Decision Tree Classification On Iris Dataset
6 pages
K Mean Clustering
No ratings yet
K Mean Clustering
36 pages
Data Mining
No ratings yet
Data Mining
2 pages
Algorithms: K Nearest Neighbors
No ratings yet
Algorithms: K Nearest Neighbors
16 pages
Support Vector Machine - Explanation
No ratings yet
Support Vector Machine - Explanation
12 pages
CH 6
No ratings yet
CH 6
72 pages
Prof. Chandan Singhavi
No ratings yet
Prof. Chandan Singhavi
86 pages
Parallelism of Statistics and Machine Learning & Logistic Regression Versus Random Forest
100% (1)
Parallelism of Statistics and Machine Learning & Logistic Regression Versus Random Forest
72 pages
ML Unit-2
No ratings yet
ML Unit-2
26 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
24 pages
Ensemble Methods Bagging Boosting and Stacking
100% (1)
Ensemble Methods Bagging Boosting and Stacking
19 pages
Data Integration & Transformation
No ratings yet
Data Integration & Transformation
14 pages
K-Nearest Neighbor Learning
No ratings yet
K-Nearest Neighbor Learning
19 pages
Chapter 6 ML Classifications
100% (1)
Chapter 6 ML Classifications
51 pages
Maths Roadmap For Machine Learning
No ratings yet
Maths Roadmap For Machine Learning
16 pages
6 Different Ways To Compensate For Missing Values in A Dataset
No ratings yet
6 Different Ways To Compensate For Missing Values in A Dataset
6 pages
Regression Notes
100% (1)
Regression Notes
20 pages
Lab Program
100% (1)
Lab Program
15 pages
Naïve Bayes Classifier Algorithm
No ratings yet
Naïve Bayes Classifier Algorithm
10 pages
Supervised Learning - Regression - Annotated
No ratings yet
Supervised Learning - Regression - Annotated
97 pages
ML Unit-1
No ratings yet
ML Unit-1
26 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
3 pages
Data Mining: M.P.Geetha, Department of CSE, Sri Ramakrishna Institute of Technology, Coimbatore
100% (1)
Data Mining: M.P.Geetha, Department of CSE, Sri Ramakrishna Institute of Technology, Coimbatore
115 pages
Machine Learning Theory
100% (1)
Machine Learning Theory
12 pages
KNN ALGORITHM IN MACHINELEARNING
No ratings yet
KNN ALGORITHM IN MACHINELEARNING
10 pages
Transformer Architecture
No ratings yet
Transformer Architecture
18 pages
K Means Clustering Lecture
No ratings yet
K Means Clustering Lecture
32 pages
An Introduction Of: Support Vector Machine
No ratings yet
An Introduction Of: Support Vector Machine
36 pages
Gradient Descent
No ratings yet
Gradient Descent
15 pages
Ain Shams University Faculty of Engineering
No ratings yet
Ain Shams University Faculty of Engineering
2 pages
DS Mod 1 To 2 Complete Notes
No ratings yet
DS Mod 1 To 2 Complete Notes
63 pages
Attribute Oriented Induction
100% (1)
Attribute Oriented Induction
6 pages
Ch5 - Support Vector Machine (SVM)
No ratings yet
Ch5 - Support Vector Machine (SVM)
27 pages
Single Link Example
No ratings yet
Single Link Example
8 pages
Matplotlib PDF
No ratings yet
Matplotlib PDF
16 pages
Crime Prediction in Nigeria's Higer Institutions
No ratings yet
Crime Prediction in Nigeria's Higer Institutions
13 pages
KMean Merged
No ratings yet
KMean Merged
13 pages
Lecture 11 K Means Clustering
No ratings yet
Lecture 11 K Means Clustering
8 pages
Ex. 15 - Load Balancer Using AWS
No ratings yet
Ex. 15 - Load Balancer Using AWS
3 pages
Unit 4 - Ai
No ratings yet
Unit 4 - Ai
109 pages
Ex. 6 - Create Two Datacenters With One Host Each and Run Cloudlets of Two Users On Them
No ratings yet
Ex. 6 - Create Two Datacenters With One Host Each and Run Cloudlets of Two Users On Them
9 pages
Unit 1 - Ai
No ratings yet
Unit 1 - Ai
90 pages
Unit 5 - Ai
No ratings yet
Unit 5 - Ai
138 pages
Ex. 10 - EC2 Linux VM Launch
No ratings yet
Ex. 10 - EC2 Linux VM Launch
3 pages
Unit II-CAP
No ratings yet
Unit II-CAP
4 pages
Exercise 4: Cloudsim
No ratings yet
Exercise 4: Cloudsim
9 pages
Unit III Cap
No ratings yet
Unit III Cap
7 pages
IoT Using Arduino and Raspberry Pi
No ratings yet
IoT Using Arduino and Raspberry Pi
85 pages
Unit I-CAP QUESTION BANK
No ratings yet
Unit I-CAP QUESTION BANK
4 pages
KNN and Naive Bayes
No ratings yet
KNN and Naive Bayes
61 pages
Classes and Objects
100% (1)
Classes and Objects
20 pages
Introduction To Statistical Machine Learning
No ratings yet
Introduction To Statistical Machine Learning
84 pages
Unit - I-Object Oriented Programming Concepts
No ratings yet
Unit - I-Object Oriented Programming Concepts
22 pages
Support Vector Machines and Artificial Neural Networks: Dr.S.Veena, Associate Professor/CSE
No ratings yet
Support Vector Machines and Artificial Neural Networks: Dr.S.Veena, Associate Professor/CSE
78 pages
Operator Overloading: Dr.S.Veena, Associate Professor/CSE
No ratings yet
Operator Overloading: Dr.S.Veena, Associate Professor/CSE
13 pages
OODP - Unit - I - UML Diagram
No ratings yet
OODP - Unit - I - UML Diagram
64 pages
Kana Speed Tests
No ratings yet
Kana Speed Tests
5 pages
Perception Towards Online Shopping
No ratings yet
Perception Towards Online Shopping
6 pages
QUARTER 2 Third-summative-test-2nd-qrtr-M5M6-SCIENCE 7
100% (2)
QUARTER 2 Third-summative-test-2nd-qrtr-M5M6-SCIENCE 7
2 pages
Sexual Reproduction in Flowering Plants Worksheet
No ratings yet
Sexual Reproduction in Flowering Plants Worksheet
3 pages
Subud Voice Newsletter
No ratings yet
Subud Voice Newsletter
16 pages
Slang Between Oral and Written Usage
No ratings yet
Slang Between Oral and Written Usage
9 pages
AI IMP Question Bank
No ratings yet
AI IMP Question Bank
4 pages
Al Adab Al Mufrad (بدلأا درفملا) - Being Dutiful to the Parents (رب نيدلاولا) - Class #1
No ratings yet
Al Adab Al Mufrad (بدلأا درفملا) - Being Dutiful to the Parents (رب نيدلاولا) - Class #1
3 pages
Interrupt and Precise Exception: Computer System Architecture
No ratings yet
Interrupt and Precise Exception: Computer System Architecture
21 pages
The Natural History of Selborne by Gilbert White Preview
67% (3)
The Natural History of Selborne by Gilbert White Preview
20 pages
Bodh Prakash - Writing Partition - Aesthetics and Ideology in Hindi and Urdu Literature-Pearson (2008)
No ratings yet
Bodh Prakash - Writing Partition - Aesthetics and Ideology in Hindi and Urdu Literature-Pearson (2008)
232 pages
A Meta Analytic Review of Guided Notes
No ratings yet
A Meta Analytic Review of Guided Notes
25 pages
Mini Lesson Plan Outline
No ratings yet
Mini Lesson Plan Outline
3 pages
Theater Glossary: A Tempo: A Musical Marking Meaning That The Music Has Returned To The Original Speed of The Song
No ratings yet
Theater Glossary: A Tempo: A Musical Marking Meaning That The Music Has Returned To The Original Speed of The Song
7 pages
Dream Winter Vacation
No ratings yet
Dream Winter Vacation
3 pages
Abhishek Siddharth AARZOO FILR
No ratings yet
Abhishek Siddharth AARZOO FILR
43 pages
ENGLISH 2 - Set A 1st QUARTER TEST 2024-2025
100% (2)
ENGLISH 2 - Set A 1st QUARTER TEST 2024-2025
5 pages
Pyramus and Thisbe
No ratings yet
Pyramus and Thisbe
1 page
Shrila Sanatana Goswami
No ratings yet
Shrila Sanatana Goswami
8 pages
Paired Passage ECR Text Set - VIDEO GAMES English I-II
No ratings yet
Paired Passage ECR Text Set - VIDEO GAMES English I-II
21 pages
Midterm Exam in DIASS 2024 Final
No ratings yet
Midterm Exam in DIASS 2024 Final
9 pages
Transfer Chute Analysis Techniques Including Continuum Modelling and The Discrete Element Method (DEM)
No ratings yet
Transfer Chute Analysis Techniques Including Continuum Modelling and The Discrete Element Method (DEM)
14 pages
Grammer Exercise
100% (1)
Grammer Exercise
4 pages
Shinnie P. Erecilla (ENH 1 Activity)
No ratings yet
Shinnie P. Erecilla (ENH 1 Activity)
1 page
Theron Fox, Thomas Nguyen Jr. Julie H. Vo and Tina Nguyen - Mjudgement For Fradulaent Activities!
No ratings yet
Theron Fox, Thomas Nguyen Jr. Julie H. Vo and Tina Nguyen - Mjudgement For Fradulaent Activities!
3 pages
The Benefits of Cat Purring 12
No ratings yet
The Benefits of Cat Purring 12
2 pages
Warrants and Convertible Securities
No ratings yet
Warrants and Convertible Securities
2 pages
The Effects of Social (Media) Networking Sites On The Academic Performance of Senior High School Students
No ratings yet
The Effects of Social (Media) Networking Sites On The Academic Performance of Senior High School Students
6 pages

K-Means and PCA

Uploaded by

K-Means and PCA

Uploaded by

SRM INSTITUTE OF SCIENCE AND TECHNOLOGY,

Introduction to K-Means Clustering

• The cost function of k-means is determined by the Euclidean

Introduction to K-Means Clustering

K-means working methodology from first principles

K-means working methodology from first principles

Optimal number of clusters and cluster evaluation

The elbow method

The elbow method

Evaluation of clusters with silhouette coefficient

a is the mean distance between the instances in the cluster,

Data set : http://archive.ics.uci.edu/ml/datasets/Iris.

The iris data has three types of flowers: setosa, versicolor,

The task is to group the flowers based on their

K-means algorithm from scikit-learn has been utilized in the following

>>> import pandas as pd

>>> import matplotlib.pyplot as plt

>>> from scipy.spatial.distance import cdist, pdist

>>> from sklearn.cluster import KMeans

>>> iris = pd.read_csv("iris.csv"

>>> print (iris.head())

>>> x_iris = iris.drop(['class'],axis=1)

>>> y_iris = iris["class"]

three clusters have been used

The maximum number of iterations chosen here is 300

>>> k_means_fit = KMeans(n_clusters=3,max_iter=300)

>>> print ("\nK-Means Clustering - Confusion

Matrix\n\n",pd.crosstab(y_iris, k_means_fit.labels_,rownames = ["Actuall"],

>>> print ("\nSilhouette-score: %0.3f" % silhouette_score(x_iris, k_means_fit.labels_,

... print ("For K value",k,",Silhouette-score: %0.3f" %

The silhouette coefficient values in the preceding results shows

>>> KM = [KMeans(n_clusters=k).fit(x_iris) for k in K]

>>> D_k = [cdist(x_iris, centrds, 'euclidean') for centrds in centroids]

>>> avgWithinSS = [sum(d)/x_iris.shape[0] for d in dist

# Total with-in sum of square

• Principal component analysis (PCA) is the

• PCA can easily be explained with the following diagram of a mechanical

Fitting the principal components.

• We can represent the whole data with the first principal

• In the preceding equation, is an eigenvector, A is a square matrix, and λ is a scalar

• product of A and any eigenvector of A must be equal to the eigenvector multiplied

• The preceding equation can be rewritten as a system of equations, as follows

• PCA needs unit eigenvectors to be used in calculations, hence we need to divide

• The norm of the output vector is calculated as follows

• The unit eigenvector is shown as follows:

Covariance is a measure of how much two variables change together

By solving the preceding equation, we can obtain eigenvectors and eigenvalues, as

>>> import numpy as np

>>> w, v = np.linalg.eig(np.array([[ 0.91335 ,0.75969 ],[0.75969,0.69702]]))

\>>> print ("\nEigen Values\n", w)

>>> print ("\nEigen Vectors\n", v)

# PCA - Principal Component Analysis

>>> print (digits.data[0].reshape(8,8))

• In the following code, we have applied three PCAs so

• Computation of singular values and singular vectors

SVD applied on handwritten digits using scikit-learn

SVD can be applied on the same handwritten digits data to perform an

SVD applied on handwritten digits using scikit-learn

>>> from sklearn.utils.extmath import randomized_svd

>>> VT_df = pd.DataFrame(VT)

>>> print ("\nShape of Left Singular vector:",U.shape)

SVD applied on handwritten digits using scikit-learn

SVD applied on handwritten digits using scikit-learn

The following code illustrates the change in total variance explained

You might also like