0% found this document useful (0 votes)

27 views

ml2020 Pythonlab03

This document discusses principal component analysis (PCA) and its application to a wine dataset. It includes: 1) Exploratory data analysis of the wine dataset to understand feature correlations and class separation. Some features like alkalinity clearly separate the wine classes. 2) Performing PCA on the scaled wine data. The first two principal components explain 56% of the total variance in the data, showing better class separation than the original features. 3) An exercise is assigned to split a dataset into train and test, perform linear regression with and without PCA on the training data, and compare prediction accuracies.

Uploaded by

VINAY U PAI

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views

ml2020 Pythonlab03

Uploaded by

VINAY U PAI

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Birla Institute of Technology and Science, Pilani

Department of computer science & information system

BITS F464 - Machine Learning
I Semester 2020-21
3-Sep-20 Lab Sheet-03 – Principle Component Analysis

Singular-Value Decomposition

The most known and widely used matrix decomposition method is the Singular-Value
Decomposition, or SVD. All matrices have an SVD, which makes it more stable than other
methods, such as the eigendecomposition.
The SVD is used widely both in the calculation of other matrix operations, such as matrix inverse,
but also as a data reduction, compressing and denoising method in machine learning

Calculate Singular-Value Decomposition

The SVD can be calculated by calling the svd() function.

The function takes a matrix and returns the U, Sigma and V^T elements. The Sigma diagonal
matrix is returned as a vector of singular values. The V matrix is returned in a transposed form,
e.g. V.T.
The example below defines a 3×2 matrix and calculates the Singular-value decomposition.

# Singular-value decomposition
from numpy import array
from scipy.linalg import svd
# define a matrix
A = array([[1, 2], [3, 4], [5, 6]])
print(A)
# SVD
U, s, VT = svd(A)
print(U)
print(s)
print(VT)

Reconstruct Matrix from SVD

The original matrix can be reconstructed from the U, Sigma, and V^T elements.
The U, s, and V elements returned from the svd() cannot be multiplied directly.
The s vector must be converted into a diagonal matrix using the diag() function. By default, this
function will create a square matrix that is n x n, relative to our original matrix. This causes a
problem as the size of the matrices do not fit the rules of matrix multiplication, where the number
of columns in a matrix must match the number of rows in the subsequent matrix.
# Reconstruct SVD
from numpy import array
from numpy import diag
from numpy import dot
from numpy import zeros
from scipy.linalg import svd
# define a matrix
A = array([[1, 2], [3, 4], [5, 6]])
print(A)
# Singular-value decomposition
U, s, VT = svd(A)
# create m x n Sigma matrix
Sigma = zeros((A.shape[0], A.shape[1]))
# populate Sigma with n x n diagonal matrix
Sigma[:A.shape[1], :A.shape[1]] = diag(s)
# reconstruct matrix
B = U.dot(Sigma.dot(VT))
print(B)
SVD for Pseudoinverse

The pseudoinverse is the generalization of the matrix inverse for square matrices to rectangular
matrices where the number of rows and columns are not equal.
It is also called the the Moore-Penrose Inverse after two independent discoverers of the method
or the Generalized Inverse.
# Pseudoinverse via SVD
from numpy import array
from numpy.linalg import svd
from numpy import zeros
from numpy import diag
# define matrix
A = array([
[0.1, 0.2],
[0.3, 0.4],
[0.5, 0.6],
[0.7, 0.8]])
print(A)
# calculate svd
U, s, VT = svd(A)
# reciprocals of s
d = 1.0 / s
# create m x n D matrix
D = zeros(A.shape)
# populate D with n x n diagonal matrix
D[:A.shape[1], :A.shape[1]] = diag(d)
# calculate pseudoinverse
B = VT.T.dot(D.T).dot(U.T)
print(B)

Principal Component Analysis

PCA is mathematically defined as an orthogonal linear transformation that transforms the data to
a new coordinate system such that the greatest variance by some projection of the data comes
to lie on the first coordinate (called the first principal component), the second greatest variance
on the second coordinate, and so on.

Import packages and download the wine dataset from

“https://archive.ics.uci.edu/ml/datasets/wine”

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

Read in the data and perform basic exploratory analysis

df = pd.read_csv('./Datasets/wine.data.csv')
df.head(10)

Basic statistics
df.iloc[:,1:].describe()

Boxplots by output labels/classes

for c in df.columns[1:]:
df.boxplot(c,by='Class',figsize=(7,4),fontsize=14)
plt.title("{}\n".format(c),fontsize=16)
plt.xlabel("Wine Class", fontsize=16)

It can be seen that some features classify the wine labels pretty clearly. For example,
Alcalinity, Total Phenols, or Flavonoids produce boxplots with well-separated medians, which are
clearly indicative of wine classes.

Below is an example of class seperation using two variables

plt.figure(figsize=(10,6))
plt.scatter(df['OD280/OD315 of diluted wines'],df['Flavanoids'],c=df['Clas
s'],edgecolors='k',alpha=0.75,s=150)
plt.grid(True)
plt.title("Scatter plot of two features showing the \ncorrelation and clas
s seperation",fontsize=15)
plt.xlabel("diluted wines",fontsize=15)
plt.ylabel("Flavanoids",fontsize=15)
plt.show()

Are the features independent? Plot co-variance matrix

It can be seen that there is some good amount of correlation between features i.e. they are not
independent of each other.
def correlation_matrix(df):
from matplotlib import pyplot as plt
from matplotlib import cm as cm

fig = plt.figure(figsize=(16,12))
ax1 = fig.add_subplot(111)
cmap = cm.get_cmap('jet', 30)
cax = ax1.imshow(df.corr(), interpolation="nearest", cmap=cmap)
ax1.grid(True)
plt.title('Wine data set features correlation\n',fontsize=15)
labels=df.columns
ax1.set_xticklabels(labels,fontsize=9)
ax1.set_yticklabels(labels,fontsize=9)
# Add colorbar, make sure to specify tick locations to match desired t
icklabels
fig.colorbar(cax, ticks=[0.1*i for i in range(-11,11)])
plt.show()

correlation_matrix(df)

Principal Component Analysis

Data scaling

PCA requires scaling/normalization of the data to work properly

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X = df.drop('Class',axis=1)
y = df['Class']
X = scaler.fit_transform(X)
dfx = pd.DataFrame(data=X,columns=df.columns[1:])
dfx.head(10)
dfx.describe()

PCA class import and analysis

from sklearn.decomposition import PCA
pca = PCA(n_components=None)
dfx_pca = pca.fit(dfx)

Plot the explained variance ratio

plt.figure(figsize=(10,6))
plt.scatter(x=[i+1 for i in range(len(dfx_pca.explained_variance_ratio_))]
,y=dfx_pca.explained_variance_ratio_, s=200, alpha=0.75,c='orange',edgecol
or='k')
plt.grid(True)
plt.title("Explained variance ratio of the \nfitted principal component ve
ctor\n",fontsize=25)
plt.xlabel("Principal components",fontsize=15)
plt.xticks([i+1 for i in range(len(dfx_pca.explained_variance_ratio_))],fo
ntsize=15)
plt.yticks(fontsize=15)
plt.ylabel("Explained variance ratio",fontsize=15)
plt.show()

The above plot means that the 1st principal component explains about 36% of the total variance
in the data and the 2 ND component explains further 20%. Therefore, if we just consider first two
components, they together explain 56% of the total variance.
Showing better class separation using principal components

Transform the scaled data set using the fitted PCA object
dfx_trans = pca.transform(dfx)

Put it in a data frame

dfx_trans = pd.DataFrame(data=dfx_trans)
dfx_trans.head(10)

Plot the first two columns of this transformed data set with the color set to original ground truth
class label
plt.figure(figsize=(10,6))
plt.scatter(dfx_trans[0],dfx_trans[1],c=df['Class'],edgecolors='k',alpha=0
.75,s=150)
plt.grid(True)
plt.title("Class separation using first two principal components\n",fontsi
ze=20)
plt.xlabel("Principal component-1",fontsize=15)
plt.ylabel("Principal component-2",fontsize=15)
plt.show()

Lab 03 Exercise (Submit the code in given time):

Download any of the Integer attribute type dataset (you can use wine dataset also)
and split the data into training and testing then perform linear regression using any
methods that introduced in previous Lab. Then compare the prediction accuracy
with and without PCA on the training datasets.

PT6A Small Customer Training Optimizado
100% (3)
PT6A Small Customer Training Optimizado
221 pages
Dimensionality Reduction Using PCA (Principal Component Analysis)
No ratings yet
Dimensionality Reduction Using PCA (Principal Component Analysis)
13 pages
EWM - Cancel Picking/ Cancel Outbound Process: Usage
No ratings yet
EWM - Cancel Picking/ Cancel Outbound Process: Usage
7 pages
Sensitivity Analysis Interpretation
No ratings yet
Sensitivity Analysis Interpretation
8 pages
AML Non Evaluative Assignment 2 Fe82d2aded8429c766345d5b671eaee1
No ratings yet
AML Non Evaluative Assignment 2 Fe82d2aded8429c766345d5b671eaee1
2 pages
ML Unit - 3 DimensionalitY Reduction
No ratings yet
ML Unit - 3 DimensionalitY Reduction
39 pages
ML Chapter 4 Part3
No ratings yet
ML Chapter 4 Part3
82 pages
Unit-3
No ratings yet
Unit-3
28 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
14 pages
# Loop Over Classes: 6.2 Principal Components Analysis (Pca)
No ratings yet
# Loop Over Classes: 6.2 Principal Components Analysis (Pca)
10 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
34 pages
Remote Sensing Assignment
No ratings yet
Remote Sensing Assignment
10 pages
A COMPLETE GUIDE TO PRINCIPAL COMPONENT ANALYSIS in ML 1598272724
No ratings yet
A COMPLETE GUIDE TO PRINCIPAL COMPONENT ANALYSIS in ML 1598272724
16 pages
MBC W1-2 Notes
No ratings yet
MBC W1-2 Notes
21 pages
Unit V Foml
No ratings yet
Unit V Foml
18 pages
ModuleAr Merged
No ratings yet
ModuleAr Merged
42 pages
Principal Component Analysis (PCA)
No ratings yet
Principal Component Analysis (PCA)
18 pages
Steps for PCA
No ratings yet
Steps for PCA
5 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
10 pages
16. Principal Component Analysis
No ratings yet
16. Principal Component Analysis
27 pages
SVD Note PDF
No ratings yet
SVD Note PDF
2 pages
SVD Note
No ratings yet
SVD Note
2 pages
MLSP Exp02
No ratings yet
MLSP Exp02
10 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
9 pages
DimensionalitY Reduction
No ratings yet
DimensionalitY Reduction
29 pages
MLSP Exp2
No ratings yet
MLSP Exp2
7 pages
Kinya Sharon - Ass2 - Machine Learning
No ratings yet
Kinya Sharon - Ass2 - Machine Learning
12 pages
PCA - Ensemble Classifiers
No ratings yet
PCA - Ensemble Classifiers
9 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
17 pages
Aim: Theory: Experiment 3
No ratings yet
Aim: Theory: Experiment 3
3 pages
Dvpd11 Merged Merged 27 83
No ratings yet
Dvpd11 Merged Merged 27 83
57 pages
Feature Extraction: - Saheni Patra
No ratings yet
Feature Extraction: - Saheni Patra
17 pages
L 10 Principal Component Analysis 09052024 072206pm
No ratings yet
L 10 Principal Component Analysis 09052024 072206pm
37 pages
Section 7.4 Notes (The SVD)
No ratings yet
Section 7.4 Notes (The SVD)
9 pages
Assignment 1 A
No ratings yet
Assignment 1 A
12 pages
DR Pca
No ratings yet
DR Pca
22 pages
Exp 3
No ratings yet
Exp 3
4 pages
Projecting Data To A Lower Dimension With PCA
No ratings yet
Projecting Data To A Lower Dimension With PCA
6 pages
ACPusingR
No ratings yet
ACPusingR
25 pages
Writeup
No ratings yet
Writeup
9 pages
SVD
No ratings yet
SVD
3 pages
Feature Extraction
No ratings yet
Feature Extraction
90 pages
PCA Explained Stepbystep
No ratings yet
PCA Explained Stepbystep
4 pages
mahoney-drineas-2009-cur-matrix-decompositions-for-improved-data-analysis
No ratings yet
mahoney-drineas-2009-cur-matrix-decompositions-for-improved-data-analysis
6 pages
Principal Component Analysis - A Tutorial
No ratings yet
Principal Component Analysis - A Tutorial
37 pages
Machine Learning Lab Manual 8
No ratings yet
Machine Learning Lab Manual 8
12 pages
Fem2063 Data Analytics - May 2020 Lab Practice 5 (Week 6)
No ratings yet
Fem2063 Data Analytics - May 2020 Lab Practice 5 (Week 6)
8 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
13 pages
Code2pdf 65c2c5f87b4a2
No ratings yet
Code2pdf 65c2c5f87b4a2
2 pages
PCA 1 Geladi Comprehensive Chemometrics 2020
No ratings yet
PCA 1 Geladi Comprehensive Chemometrics 2020
21 pages
Imm4000 PDF
No ratings yet
Imm4000 PDF
5 pages
1-Python Algebra Maths
No ratings yet
1-Python Algebra Maths
26 pages
Multivariate Analysis of Mixed Data The R Package PCAmixdata
No ratings yet
Multivariate Analysis of Mixed Data The R Package PCAmixdata
34 pages
Comp3314 9. Dimensionality Reduction
No ratings yet
Comp3314 9. Dimensionality Reduction
44 pages
Eigenfaces With Pca
No ratings yet
Eigenfaces With Pca
12 pages
DAAI - Lecture - 04 - With - Solutions - 10oct22
No ratings yet
DAAI - Lecture - 04 - With - Solutions - 10oct22
84 pages
Dimensional Reduction in R
No ratings yet
Dimensional Reduction in R
24 pages
Machine Learning Lab Manual (1)
No ratings yet
Machine Learning Lab Manual (1)
42 pages
How Do You Do A Principal Component Analysis?
No ratings yet
How Do You Do A Principal Component Analysis?
13 pages
Lecture6 PCA
No ratings yet
Lecture6 PCA
30 pages
Presentation
No ratings yet
Presentation
31 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
82 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Week 23 RD Grade Math
No ratings yet
Week 23 RD Grade Math
4 pages
Computer Class VII- Ch.1
No ratings yet
Computer Class VII- Ch.1
6 pages
Problem 1
No ratings yet
Problem 1
17 pages
Essentials of Business Analytics Jeffrey D. Camm - The ebook in PDF format is ready for immediate access
100% (5)
Essentials of Business Analytics Jeffrey D. Camm - The ebook in PDF format is ready for immediate access
63 pages
Modbus Map-Vortex Crowcon
No ratings yet
Modbus Map-Vortex Crowcon
44 pages
Paper-Cdtyr224x03ph4-Frt04-Jee Adv-170723
No ratings yet
Paper-Cdtyr224x03ph4-Frt04-Jee Adv-170723
16 pages
Test-1-10 Science Chemical Reactions and Equations Test 01 PDF
No ratings yet
Test-1-10 Science Chemical Reactions and Equations Test 01 PDF
2 pages
TRILUX Smart Parking of The Future - EN
No ratings yet
TRILUX Smart Parking of The Future - EN
9 pages
Diwali Assignment English ..-Jeemain - Guru
No ratings yet
Diwali Assignment English ..-Jeemain - Guru
28 pages
Lesson Plan 4
No ratings yet
Lesson Plan 4
3 pages
iSERV For SVCT16 - DOC2028794 - r1
No ratings yet
iSERV For SVCT16 - DOC2028794 - r1
9 pages
An Overview of Gas Insulated Substation
No ratings yet
An Overview of Gas Insulated Substation
30 pages
CU7008-Ultra Wideband Communication
100% (2)
CU7008-Ultra Wideband Communication
16 pages
Dumps-Level-0-It Is Yellow Belt-Atita8
No ratings yet
Dumps-Level-0-It Is Yellow Belt-Atita8
23 pages
DLL Format
No ratings yet
DLL Format
4 pages
3D Woven Fabric
No ratings yet
3D Woven Fabric
9 pages
Home Flag Off Set Procedure PDF
No ratings yet
Home Flag Off Set Procedure PDF
4 pages
Java Lab
No ratings yet
Java Lab
44 pages
Math 215 Lecture Notes
No ratings yet
Math 215 Lecture Notes
90 pages
QUESTION BANK UNIT 1
No ratings yet
QUESTION BANK UNIT 1
29 pages
076bct041 AI Lab5
No ratings yet
076bct041 AI Lab5
5 pages
M1M2-CAL-30-02-A4 Piping Stress Analysis Report Discharge Line (DPPU)
100% (3)
M1M2-CAL-30-02-A4 Piping Stress Analysis Report Discharge Line (DPPU)
13 pages
Document 5
No ratings yet
Document 5
16 pages
Otis Software Basic Data
83% (18)
Otis Software Basic Data
23 pages
How To Add A Data Model To MiCom S1 Studio
0% (1)
How To Add A Data Model To MiCom S1 Studio
16 pages
HEMATOLOGY-LAB-PREFINALS
No ratings yet
HEMATOLOGY-LAB-PREFINALS
9 pages

ml2020 Pythonlab03

Uploaded by

ml2020 Pythonlab03

Uploaded by

Birla Institute of Technology and Science, Pilani

Department of computer science & information system

Calculate Singular-Value Decomposition

The SVD can be calculated by calling the svd() function.

Reconstruct Matrix from SVD

Principal Component Analysis

Import packages and download the wine dataset from

Read in the data and perform basic exploratory analysis

Boxplots by output labels/classes

Below is an example of class seperation using two variables

Are the features independent? Plot co-variance matrix

Principal Component Analysis

PCA requires scaling/normalization of the data to work properly

from sklearn.preprocessing import StandardScaler

PCA class import and analysis

Plot the explained variance ratio

Put it in a data frame

Lab 03 Exercise (Submit the code in given time):

You might also like