0% found this document useful (0 votes)

81 views34 pages

Understanding Principal Component Analysis

Uploaded by

asinghal2122003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

81 views34 pages

Understanding Principal Component Analysis

Uploaded by

asinghal2122003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Principal Component Analysis

• Principal component analysis (PCA) is a popular dimensionality-reduction method that is

often used to reduce the dimensionality of large data sets, by transforming a large set of
variables into a smaller one that still contains most of the information in the large set.

• It is primarily used to transform a high-dimensional dataset into a lower-dimensional

representation while preserving the most important information.

• Because smaller data sets are easier to explore and visualize and make analyzing data
much easier and faster for machine learning algorithms without extraneous variables to
process.
Goal of PCA:
1. Identify Patterns in Data
2. Detect the correlation between variables

3. Reduce the dimension of a d-dimensional datasets by projecting it on a

k-dimensional subspace where k<d

4. Find list of Principal axes

The main goal of PCA is to identify the directions (principal components) along
which the data varies the most.

These principal components are orthogonal to each other, meaning they are
uncorrelated.

The first principal component captures the largest amount of variation in the
data, followed by the second principal component, and so on.
• PCA is an unsupervised pre-processing task that is carried out before
applying any ML algorithm.

• PCA is based on “orthogonal linear transformation” which is a

mathematical technique to project the attributes of a data set onto a
new coordinate system.

• The attribute which describes the most variance is called the first
principal component and is placed at the first coordinate.
•Similarly, the attribute which stands second in describing variance is called a
second principal component and so on. In short, the complete dataset can be
expressed in terms of principal components

•. Usually, more than 90% of the variance is explained by two/three principal

components.

•Principal component analysis, or PCA, thus converts data from high

dimensional space to low dimensional space by selecting the most important
attributes that capture maximum information about the dataset.
Objectives of PCA
1. The new features are distinct i.e. the covariance between the new
features (in case of PCA, they are the principal components) is 0.

2. The principal components are generated in order of the variability in the

data that it captures. Hence, the first principal component should capture
the maximum variability, the second one should capture the next highest
variability etc.

3. The sum of the variance of the new features / the principal components
should be equal to the sum of the variance of the original features.
Working of PCA
Step 1: Standardize the dataset.

Step 2: Calculate the covariance matrix for the features in the dataset.

Step 3: Calculate the eigenvalues and eigenvectors for the covariance matrix.

Step 4: Sort eigenvalues and their corresponding eigenvectors.

Step 5: Pick k eigenvalues and form a matrix of eigenvectors.

Step 6: Transform the original matrix.

1. Standardization: PCA begins by standardizing the dataset, where each
feature is scaled to have zero mean and unit variance.

This step ensures that all features contribute equally to the analysis and
prevents variables with larger scales from dominating the principal
components.
2 Covariance matrix: PCA computes the covariance matrix of the
standardized dataset. The covariance matrix indicates the relationships and
dependencies between pairs of features.

The diagonal elements of the covariance matrix represent the variances of

the features.
3. Eigen decomposition: The next step is to perform an eigen
decomposition of the covariance matrix. This involves finding the
eigenvalues and eigenvectors of the covariance matrix.

The eigenvalues represent the variance explained by each principal

component, and the eigenvectors represent the directions or axes of the
principal components.
4. Selection of principal components: The principal components are ranked
based on their corresponding eigenvalues, with the largest eigenvalue
associated with the first principal component, the second largest with the
second principal component, and so on.

By selecting the top-k principal components, where k is the desired lower

dimensionality, we retain the most significant variation in the data.
5. Projection: Finally, the data is projected onto the selected principal
components to obtain the lower-dimensional representation.

This projection is achieved by taking the dot product between the

standardized data and the eigenvectors corresponding to the selected
principal components.
1. Standardize the Dataset

Assume we have the below dataset which has 4 features and a total of 5
training examples.
First, we need to standardize the dataset and for that, we need to calculate the
mean and standard deviation for each feature.
After applying the formula for each feature in the dataset is transformed
as below:
2. Calculate the covariance matrix for the whole dataset

The formula to calculate the covariance matrix:

cov(x, y) = (1 / n-1 ) * ∑((x[i] - mean(x)) * (y[i] - mean(y)))

Where:

•cov(x, y) represents the covariance between variables x and y.

•n is the number of data points.

•x[i] and y[i] are the values of x and y for each data point.

•mean(x) and mean(y) are the means of variables x and y, respectively.

the covariance matrix for the given dataset will be calculated as
below
Since we have standardized the dataset, so the mean for each feature is 0 and
the standard deviation is 1.

var(f1) = ((-1.0-0)² + (0.33-0)² + (-1.0-0)² +(0.33–0)² +(1.33–0)²)/4

var (f1) = 1

cov(f1,f2) =
((-1.0–0)*(-0.632456-0) + (0.33–0)*(1.264911-0) + (-1.0–0)* (0.632456-0)+
(0.33–0)*(0.000000 -0)+ (1.33–0)*(-1.264911–0))/4
cov(f1,f2) = -0.25298
In the similar way, we can calculate the other covariances and which will result
in the below covariance matrix
3. Calculate eigenvalues and eigen vectors.

Let A be a square matrix (in our case the covariance matrix), ν a vector and λ a scalar
that satisfies Aν = λν, then λ is called eigenvalue associated with eigenvector ν of A.

Rearranging the above equation,

Aν-λν =0 ;

(A-λI)ν = 0

Since we have already know ν is a non- zero vector, only way this equation can be
equal to zero, if
det(A-λI) = 0
det(A-λI) = 0

Solving the above equation = 0

λ = 2.51579324 , 1.0652885 , 0.39388704 , 0.02503121
Eigenvectors:
Solving the (A-λI)ν = 0 equation for ν vector with different λ values:

For λ = 2.51579324, solving the above equation using Cramer's rule,

the values for v vector are
v1 = 0.16195986
v2 = -0.52404813
v3 = -0.58589647
v4 = -0.59654663
Going by the same approach, we can calculate the eigen vectors for the other eigen
values.

We can from a matrix using the eigen vectors.

eigenvectors(4 * 4 matrix)
4. Sort eigenvalues and their corresponding eigenvectors.
Since eigenvalues are already sorted in this case so no need to sort them again.

5. Pick k eigenvalues and form a matrix of eigenvectors

If we choose the top 2 eigenvectors, the matrix will look like this:

Top 2 eigenvectors(4*2 matrix)

6. Transform the original matrix.

Feature matrix * top k eigenvectors = Transformed Data

Python Implementation of PCA

1. Import all the libraries

2. Loading Data

3. Apply PCA

•Standardize the dataset prior to PCA.

•Import PCA from [Link].
•Choose the number of principal components.
Let us select it to 3. After executing this code, we get to know that the dimensions of x are
(569,3) while the dimension of actual data is (569,30).

Thus, it is clear that with PCA, the number of dimensions has reduced to 3 from 30. If we
choose n_components=2, the dimensions would be reduced to 2.
4. Check Components

The principal.components_ provide an array in which the number of rows

tells the number of principal components while the number of columns is
equal to the number of features in actual data.

We can easily see that there are three rows as n_components was chosen
to be 3. However, each row has 30 columns as in actual data.
5. Plot the components (Visualization)

Plot the principal components for better data visualization.

Though we had taken n_components =3, here we are plotting a 2d graph as well as 3d
using first two principal components and 3 principal components respectively.

For three principal components, we need to plot a 3d graph.

The colors show the 2 output classes of the original dataset-benign and malignant.
It is clear that principal components show clear separation between two output classes.

For three principal components, we need to plot a 3d graph. x[:,0] signifies the first
principal component. Similarly, x[:,1] and x[:,2] represent the second and the third
principal component.
Python implementation

from [Link] import PCA

pca = PCA(n_components = 2)
X_train = pca.fit_transform(X_train)
X_test = [Link](X_test)
To implement PCA in Scikit learn, it is essential to standardize/normalize the data before
applying PCA.

•PCA is imported from [Link]. We need to select the required number of

principal components.

•Usually, n_components is chosen to be 2 for better visualization but it matters and

depends on data.

•By the fit and transform method, the attributes are passed.

•The values of principal components can be checked using components_

To Visualize PCA

[Link]
here pca.components_ gives Principal axes in feature
space, representing the directions of maximum variance in
the data

Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
13 pages
Principal Component Analysis Explained
No ratings yet
Principal Component Analysis Explained
12 pages
Understanding PCA and Its Applications
No ratings yet
Understanding PCA and Its Applications
8 pages
PCA Implementation in Python
No ratings yet
PCA Implementation in Python
11 pages
Understanding Principal Component Analysis
100% (1)
Understanding Principal Component Analysis
18 pages
Dimensionality Reduction and PCA Guide
No ratings yet
Dimensionality Reduction and PCA Guide
28 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
10 pages
PCA Lab: A Beginner's Guide
No ratings yet
PCA Lab: A Beginner's Guide
5 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
22 pages
Math of Principal Component Analysis
No ratings yet
Math of Principal Component Analysis
3 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
8 pages
PCA in Machine Learning: A Step-by-Step Guide
No ratings yet
PCA in Machine Learning: A Step-by-Step Guide
16 pages
PCA: Dimensionality Reduction Explained
No ratings yet
PCA: Dimensionality Reduction Explained
28 pages
Principal Component Analysis Explained
No ratings yet
Principal Component Analysis Explained
3 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
7 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
4 pages
Principal Component Analysis Overview
No ratings yet
Principal Component Analysis Overview
16 pages
PCA on Iris Dataset: Dimensionality Reduction
No ratings yet
PCA on Iris Dataset: Dimensionality Reduction
7 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
12 pages
PCA Analysis on Multivariate Dataset
No ratings yet
PCA Analysis on Multivariate Dataset
57 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
16 pages
PCA for Dimension Reduction in Data Analysis
No ratings yet
PCA for Dimension Reduction in Data Analysis
22 pages
PCA: Dimension Reduction Explained
No ratings yet
PCA: Dimension Reduction Explained
22 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
16 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
10 pages
DimensionalitY Reduction
No ratings yet
DimensionalitY Reduction
29 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
26 pages
Principal Component Analysis Concepts
No ratings yet
Principal Component Analysis Concepts
16 pages
PCA in Pattern Recognition Overview
No ratings yet
PCA in Pattern Recognition Overview
19 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
13 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
14 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
25 pages
PCA in Remote Sensing Explained
No ratings yet
PCA in Remote Sensing Explained
10 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
13 pages
Data Reduction and Visualization Techniques
No ratings yet
Data Reduction and Visualization Techniques
22 pages
Dimensionality Reduction with PCA in Python
No ratings yet
Dimensionality Reduction with PCA in Python
11 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
8 pages
PCA for Dimensionality Reduction
No ratings yet
PCA for Dimensionality Reduction
27 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
3 pages
Step-by-Step Guide to PCA Explained
No ratings yet
Step-by-Step Guide to PCA Explained
8 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
13 pages
Principal Component Analysis Explained
No ratings yet
Principal Component Analysis Explained
7 pages
PCA for Feature Extraction Explained
No ratings yet
PCA for Feature Extraction Explained
23 pages
Understanding PCA in Image Processing
No ratings yet
Understanding PCA in Image Processing
17 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
8 pages
PCA and LDA for Dimensionality Reduction
No ratings yet
PCA and LDA for Dimensionality Reduction
25 pages
PCA Implementation and Example
No ratings yet
PCA Implementation and Example
12 pages
Seminar Report on PCA in Machine Learning
No ratings yet
Seminar Report on PCA in Machine Learning
10 pages
Understanding Principal Components in PCA
No ratings yet
Understanding Principal Components in PCA
21 pages
Beginner's Guide to Principal Component Analysis
No ratings yet
Beginner's Guide to Principal Component Analysis
6 pages
PCA Implementation in Python
No ratings yet
PCA Implementation in Python
10 pages
Data Reduction Techniques in PCA
No ratings yet
Data Reduction Techniques in PCA
36 pages
PCA Mathematical Steps Explained
No ratings yet
PCA Mathematical Steps Explained
8 pages
Principal Component Analysis Overview
No ratings yet
Principal Component Analysis Overview
28 pages
ML Unit - 3 DimensionalitY Reduction
No ratings yet
ML Unit - 3 DimensionalitY Reduction
39 pages
PCA Explained: Implementation in Python
No ratings yet
PCA Explained: Implementation in Python
9 pages
PCA Overview by Eesha Tur Razia Babar
No ratings yet
PCA Overview by Eesha Tur Razia Babar
38 pages
PCA and Factor Analysis Overview
No ratings yet
PCA and Factor Analysis Overview
67 pages
Principal Components Analysis Overview
No ratings yet
Principal Components Analysis Overview
3 pages
AI and Machine Learning Overview
No ratings yet
AI and Machine Learning Overview
13 pages
Optical Network Automation Insights
No ratings yet
Optical Network Automation Insights
30 pages
I Avaliação Parcial - 25.0 PTS - Gabarito
No ratings yet
I Avaliação Parcial - 25.0 PTS - Gabarito
9 pages
Money Management System Project Overview
No ratings yet
Money Management System Project Overview
13 pages
Machine Learning Tools in ArcGIS
No ratings yet
Machine Learning Tools in ArcGIS
19 pages
Deep Learning in Industrial Anomaly Detection
No ratings yet
Deep Learning in Industrial Anomaly Detection
19 pages
Types and Applications of AI Explained
No ratings yet
Types and Applications of AI Explained
10 pages
Python for Cloud Development Guide
No ratings yet
Python for Cloud Development Guide
19 pages
Overview of Data Science Fundamentals
No ratings yet
Overview of Data Science Fundamentals
5 pages
Personalized Mood Prediction via MTL
No ratings yet
Personalized Mood Prediction via MTL
14 pages
Machine Learning for Online Exam Fraud Detection
No ratings yet
Machine Learning for Online Exam Fraud Detection
6 pages
Deep Learning's Impact on NLP
No ratings yet
Deep Learning's Impact on NLP
3 pages
Neural Networks for MNIST Digit Classification
No ratings yet
Neural Networks for MNIST Digit Classification
7 pages
Overfitting vs Underfitting Explained
No ratings yet
Overfitting vs Underfitting Explained
15 pages
AI & ML Question Bank for 5th Sem
No ratings yet
AI & ML Question Bank for 5th Sem
10 pages
Algorithms & AI Lab Record Book
No ratings yet
Algorithms & AI Lab Record Book
10 pages
Cyberbullying Detection Using ML Techniques
No ratings yet
Cyberbullying Detection Using ML Techniques
23 pages
Machine Learning Basics Explained
No ratings yet
Machine Learning Basics Explained
4 pages
NLP-Based Automated Answer Evaluation
No ratings yet
NLP-Based Automated Answer Evaluation
7 pages
Machine Learning Basics and Types
No ratings yet
Machine Learning Basics and Types
63 pages
Enhancing Healthcare with CNN and IoT
No ratings yet
Enhancing Healthcare with CNN and IoT
4 pages
AI Tools and Python Programming Exam
No ratings yet
AI Tools and Python Programming Exam
2 pages
Business Analytics Overview and Applications
No ratings yet
Business Analytics Overview and Applications
18 pages
Decision Trees and Ensembles Overview
No ratings yet
Decision Trees and Ensembles Overview
24 pages
Six Steps of Denoising Diffusion Models
No ratings yet
Six Steps of Denoising Diffusion Models
15 pages
Dispersive Flies Optimisation Explained
No ratings yet
Dispersive Flies Optimisation Explained
6 pages
Structuring NoSQL Databases with ML
No ratings yet
Structuring NoSQL Databases with ML
11 pages
Find S, Logistic Regression, SVM in Python
No ratings yet
Find S, Logistic Regression, SVM in Python
18 pages
DDoS Detection in SDN Using ANN Model
No ratings yet
DDoS Detection in SDN Using ANN Model
8 pages
Understanding AI and Machine Learning Basics
No ratings yet
Understanding AI and Machine Learning Basics
31 pages

Understanding Principal Component Analysis

Uploaded by

Understanding Principal Component Analysis

Uploaded by

Principal Component Analysis

• Principal component analysis (PCA) is a popular dimensionality-reduction method that is

• It is primarily used to transform a high-dimensional dataset into a lower-dimensional

3. Reduce the dimension of a d-dimensional datasets by projecting it on a

4. Find list of Principal axes

• PCA is based on “orthogonal linear transformation” which is a

•. Usually, more than 90% of the variance is explained by two/three principal

•Principal component analysis, or PCA, thus converts data from high

2. The principal components are generated in order of the variability in the

Step 4: Sort eigenvalues and their corresponding eigenvectors.

Step 5: Pick k eigenvalues and form a matrix of eigenvectors.

Step 6: Transform the original matrix.

The diagonal elements of the covariance matrix represent the variances of

The eigenvalues represent the variance explained by each principal

By selecting the top-k principal components, where k is the desired lower

This projection is achieved by taking the dot product between the

The formula to calculate the covariance matrix:

cov(x, y) = (1 / n-1 ) * ∑((x[i] - mean(x)) * (y[i] - mean(y)))

•cov(x, y) represents the covariance between variables x and y.

•n is the number of data points.

•mean(x) and mean(y) are the means of variables x and y, respectively.

var(f1) = ((-1.0-0)² + (0.33-0)² + (-1.0-0)² +(0.33–0)² +(1.33–0)²)/4

Rearranging the above equation,

Solving the above equation = 0

For λ = 2.51579324, solving the above equation using Cramer's rule,

We can from a matrix using the eigen vectors.

5. Pick k eigenvalues and form a matrix of eigenvectors

Top 2 eigenvectors(4*2 matrix)

Feature matrix * top k eigenvectors = Transformed Data

1. Import all the libraries

•Standardize the dataset prior to PCA.

The principal.components_ provide an array in which the number of rows

Plot the principal components for better data visualization.

For three principal components, we need to plot a 3d graph.

from [Link] import PCA

•PCA is imported from [Link]. We need to select the required number of

•Usually, n_components is chosen to be 2 for better visualization but it matters and

•The values of principal components can be checked using components_

You might also like