0% found this document useful (0 votes)
32 views5 pages

45B AIML Practical 06

Uploaded by

Ahmed Shaikh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views5 pages

45B AIML Practical 06

Uploaded by

Ahmed Shaikh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Name of Student: Ahmed Mobin Ahmed Shaikh

Roll Number: 45 Lab Practical Number: 06

Title of Lab Assignment: Implementation of dimensionality


reduction techniques: Normalization, Transformation, Principal
Components Analysis.

DOP: 19/03/24 DOS: 23/03/24

CO Mapped: PO Mapped: Signature:


CO3, CO6 PO1, PO2,
PO3, PO4,
PO5, PO6,
PO7, PO8,
PO9, PO11,
PO12, PSO1,
PSO2.
4/2/24, 10:06 PM 45_AIML_Practical_06.ipynb - Colaboratory

AIM: Implementation of dimensionality reduction techniques: Normalization,


keyboard_arrow_down Transformation, Principal Components Analysis.

keyboard_arrow_down Theory:
Dimensionality reduction techniques such as feature extraction and selection are crucial in various fields including machine learning, data
analysis, and signal processing. Here's an overview of the theory behind the implementation of these techniques:

Feature Extraction: Feature extraction involves transforming the original high-dimensional data into a lower-dimensional space by creating new
features that capture the most relevant information. Common techniques for feature extraction include Principal Component Analysis (PCA).

Principal Component Analysis (PCA):

PCA extracts linear combinations of the original features, called principal components, that capture the maximum variance in the data.

The principal components are computed as the eigenvectors of the covariance matrix of the data.

Implementation involves centering the data, computing the covariance matrix, performing eigenvalue decomposition, and selecting the
top k eigenvectors.

import numpy as np
import pandas as pd
import pprint
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
%matplotlib inline
%precision 3
np.set_printoptions(precision=3)
import pylab as pl

ahmed_df = pd.read_csv("/content/Fish.csv")
#feature_columns = ['priceUSD', 'transactions', 'size', 'sentbyaddress'] # Specify your feature columns
ahmed_df.head(20)

Category Species Weight Height Width Length1 Length2 Length3

0 1 Bream 242.0 11.5200 4.0200 23.2 25.4 30.0

1 1 Bream 290.0 12.4800 4.3056 24.0 26.3 31.2

2 1 Bream 340.0 12.3778 4.6961 23.9 26.5 31.1

3 1 Bream 363.0 12.7300 4.4555 26.3 29.0 33.5

4 1 Bream 430.0 12.4440 5.1340 26.5 29.0 34.0

5 1 Bream 450.0 13.6024 4.9274 26.8 29.7 34.7

6 1 Bream 500.0 14.1795 5.2785 26.8 29.7 34.5

7 1 Bream 390.0 12.6700 4.6900 27.6 30.0 35.0

8 1 Bream 450.0 14.0049 4.8438 27.6 30.0 35.1

9 1 Bream 500.0 14.2266 4.9594 28.5 30.7 36.2

10 1 Bream 475.0 14.2628 5.1042 28.4 31.0 36.2

11 1 Bream 500.0 14.3714 4.8146 28.7 31.0 36.2

12 1 Bream 500.0 13.7592 4.3680 29.1 31.5 36.4

13 1 Bream 340.0 13.9129 5.0728 29.5 32.0 37.3

14 1 Bream 600.0 14.9544 5.1708 29.4 32.0 37.2

15 1 Bream 600.0 15.4380 5.5800 29.4 32.0 37.2

16 1 Bream 700.0 14.8604 5.2854 30.4 33.0 38.3

17 1 Bream 700.0 14.9380 5.1975 30.4 33.0 38.5

18 1 Bream 610.0 15.6330 5.1338 30.9 33.5 38.6

19 1 Bream 650.0 14.4738 5.7276 31.0 33.5 38.7

Next steps: toggle_off View recommended plots

https://colab.research.google.com/drive/1gJ2K4xsmVl5LyYrPZmxSPTwDKBVvuABn#scrollTo=Htv5XQQTbrGf&printMode=true 1/4
4/2/24, 10:06 PM 45_AIML_Practical_06.ipynb - Colaboratory
feature_columns = ['Weight', 'Length1', 'Length2', 'Length3'] # Specify your feature columns
ahmed_X = ahmed_df[feature_columns]
ahmed_df.shape

(159, 8)

from sklearn.preprocessing import StandardScaler


X_std = StandardScaler().fit_transform(ahmed_X)
print (X_std[0:5])
print ("The shape of Feature Matrix is -",X_std.shape)

[[-0.438 -0.306 -0.282 -0.106]


[-0.304 -0.226 -0.198 -0.002]
[-0.163 -0.236 -0.179 -0.011]
[-0.099 0.005 0.055 0.196]
[ 0.089 0.025 0.055 0.24 ]]
The shape of Feature Matrix is - (159, 4)

ahmed_X_covariance_matrix = np.cov(X_std.T)
ahmed_X_covariance_matrix

array([[1.006, 0.922, 0.924, 0.929],


[0.922, 1.006, 1.006, 0.998],
[0.924, 1.006, 1.006, 1. ],
[0.929, 0.998, 1. , 1.006]])

eig_vals, eig_vecs = np.linalg.eig(ahmed_X_covariance_matrix)


print('Eigenvectors \n%s' %eig_vecs)
print('\nEigenvalues \n%s' %eig_vals)

Eigenvectors
[[-0.485 -0.873 -0.047 -0.005]
[-0.505 0.309 -0.482 -0.646]
[-0.505 0.292 -0.296 0.756]
[-0.505 0.237 0.823 -0.106]]

Eigenvalues
[3.897e+00 1.188e-01 8.993e-03 3.174e-04]

Make a list of (eigenvalue, eigenvector) tuples

ahmed_eig_pairs = [(np.abs(eig_vals[i]), eig_vecs[:,i]) for i in range(len


(eig_vals))]

Sort the (eigenvalue, eigenvector) tuples from high to low

ahmed_eig_pairs.sort(key=lambda x: x[0], reverse=True)

Visually confirm that the list is correctly sorted by decreasing eigenvalues

print('Eigenvalues in descending order:')


for i in ahmed_eig_pairs:
print(i[0])

Eigenvalues in descending order:


3.8971801711761636
0.11882554021972752
0.008993300121559085
0.0003174441787529088

tot = sum(eig_vals)
var_exp = [(i / tot)*100 for i in sorted(eig_vals, reverse=True)]
cum_var_exp = np.cumsum(var_exp)
print ("Variance captured by each component is \n",var_exp)
print(40 * '-')
print ("Cumulative variance captured as we travel each component \n"
,cum_var_exp)

Variance captured by each component is


[96.81674010154617, 2.9519552444523494, 0.22341846213936087, 0.007886191862100564]
----------------------------------------
Cumulative variance captured as we travel each component
[ 96.817 99.769 99.992 100. ]

https://colab.research.google.com/drive/1gJ2K4xsmVl5LyYrPZmxSPTwDKBVvuABn#scrollTo=Htv5XQQTbrGf&printMode=true 2/4
4/2/24, 10:06 PM 45_AIML_Practical_06.ipynb - Colaboratory
print ("All Eigen Values along with Eigen Vectors")
pprint.pprint(ahmed_eig_pairs)
print(40 * '-')
matrix_w = np.hstack((ahmed_eig_pairs[0][1].reshape(4,1),ahmed_eig_pairs[1][1].reshape(4,1)))
print ('Matrix W:\n', matrix_w)

All Eigen Values along with Eigen Vectors


[(3.8971801711761636, array([-0.485, -0.505, -0.505, -0.505])),
(0.11882554021972752, array([-0.873, 0.309, 0.292, 0.237])),
(0.008993300121559085, array([-0.047, -0.482, -0.296, 0.823])),
(0.0003174441787529088, array([-0.005, -0.646, 0.756, -0.106]))]
----------------------------------------
Matrix W:
[[-0.485 -0.873]
[-0.505 0.309]
[-0.505 0.292]
[-0.505 0.237]]

ahmed_Y = X_std.dot(matrix_w)
print (ahmed_Y[0:5])

[[ 0.563 0.18 ]
[ 0.362 0.137]
[ 0.294 0.015]
[-0.081 0.151]
[-0.204 0.003]]

from sklearn.decomposition import PCA


from sklearn.preprocessing import StandardScaler

# Assuming df contains features and target column


X = ahmed_df.drop('Species', axis=1) # Features
y = ahmed_df['Species'] # Target

# Standardize the features


scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Perform PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)

# Plot the PCA


plt.figure(figsize=(8, 6))
targets = y.unique()
colors = ['r', 'g', 'b'] # Adjust based on the number of unique target values
for target, color in zip(targets, colors):
indicesToKeep = y == target
plt.scatter(X_pca[indicesToKeep, 0], X_pca[indicesToKeep, 1], c=color, label=target)
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.legend(title='Target')
plt.title('PCA of Custom Dataset')
plt.show()

https://colab.research.google.com/drive/1gJ2K4xsmVl5LyYrPZmxSPTwDKBVvuABn#scrollTo=Htv5XQQTbrGf&printMode=true 3/4
4/2/24, 10:06 PM 45_AIML_Practical_06.ipynb - Colaboratory

https://colab.research.google.com/drive/1gJ2K4xsmVl5LyYrPZmxSPTwDKBVvuABn#scrollTo=Htv5XQQTbrGf&printMode=true 4/4

You might also like