45B AIML Practical 06
45B AIML Practical 06
keyboard_arrow_down Theory:
Dimensionality reduction techniques such as feature extraction and selection are crucial in various fields including machine learning, data
analysis, and signal processing. Here's an overview of the theory behind the implementation of these techniques:
Feature Extraction: Feature extraction involves transforming the original high-dimensional data into a lower-dimensional space by creating new
features that capture the most relevant information. Common techniques for feature extraction include Principal Component Analysis (PCA).
PCA extracts linear combinations of the original features, called principal components, that capture the maximum variance in the data.
The principal components are computed as the eigenvectors of the covariance matrix of the data.
Implementation involves centering the data, computing the covariance matrix, performing eigenvalue decomposition, and selecting the
top k eigenvectors.
import numpy as np
import pandas as pd
import pprint
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
%matplotlib inline
%precision 3
np.set_printoptions(precision=3)
import pylab as pl
ahmed_df = pd.read_csv("/content/Fish.csv")
#feature_columns = ['priceUSD', 'transactions', 'size', 'sentbyaddress'] # Specify your feature columns
ahmed_df.head(20)
https://colab.research.google.com/drive/1gJ2K4xsmVl5LyYrPZmxSPTwDKBVvuABn#scrollTo=Htv5XQQTbrGf&printMode=true 1/4
4/2/24, 10:06 PM 45_AIML_Practical_06.ipynb - Colaboratory
feature_columns = ['Weight', 'Length1', 'Length2', 'Length3'] # Specify your feature columns
ahmed_X = ahmed_df[feature_columns]
ahmed_df.shape
(159, 8)
ahmed_X_covariance_matrix = np.cov(X_std.T)
ahmed_X_covariance_matrix
Eigenvectors
[[-0.485 -0.873 -0.047 -0.005]
[-0.505 0.309 -0.482 -0.646]
[-0.505 0.292 -0.296 0.756]
[-0.505 0.237 0.823 -0.106]]
Eigenvalues
[3.897e+00 1.188e-01 8.993e-03 3.174e-04]
tot = sum(eig_vals)
var_exp = [(i / tot)*100 for i in sorted(eig_vals, reverse=True)]
cum_var_exp = np.cumsum(var_exp)
print ("Variance captured by each component is \n",var_exp)
print(40 * '-')
print ("Cumulative variance captured as we travel each component \n"
,cum_var_exp)
https://colab.research.google.com/drive/1gJ2K4xsmVl5LyYrPZmxSPTwDKBVvuABn#scrollTo=Htv5XQQTbrGf&printMode=true 2/4
4/2/24, 10:06 PM 45_AIML_Practical_06.ipynb - Colaboratory
print ("All Eigen Values along with Eigen Vectors")
pprint.pprint(ahmed_eig_pairs)
print(40 * '-')
matrix_w = np.hstack((ahmed_eig_pairs[0][1].reshape(4,1),ahmed_eig_pairs[1][1].reshape(4,1)))
print ('Matrix W:\n', matrix_w)
ahmed_Y = X_std.dot(matrix_w)
print (ahmed_Y[0:5])
[[ 0.563 0.18 ]
[ 0.362 0.137]
[ 0.294 0.015]
[-0.081 0.151]
[-0.204 0.003]]
# Perform PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)
https://colab.research.google.com/drive/1gJ2K4xsmVl5LyYrPZmxSPTwDKBVvuABn#scrollTo=Htv5XQQTbrGf&printMode=true 3/4
4/2/24, 10:06 PM 45_AIML_Practical_06.ipynb - Colaboratory
https://colab.research.google.com/drive/1gJ2K4xsmVl5LyYrPZmxSPTwDKBVvuABn#scrollTo=Htv5XQQTbrGf&printMode=true 4/4