Principal Computer Analysis(PCA)
Principal Computer Analysis(PCA)
COMPONENT
ANALYSIS(PCA)
What is PCA?
Dimensionality Reduction
Why PCA?
Application of USA
1 2 3 4 5 6 7
Dimensionality Why DR? Less Redundancy is Data It helps to find Leads to better
reduction dimensions for removed after Compression out the most human
refers to the a given dataset removing (Reduce significant interpretations
techniques that means less similar entries storage space) features and
reduce the computation or from the skip the rest
number of training time dataset
input variables
in a dataset.
WHY PCA?
Covariance
IMPORTANT
TERMINOLOGIES Eigenvalues
Eigenvectors
Principle Component
IMPORTANT TERMINOLOGIES (VARIANCE)
• The variance is a measure that indicates how much data scatter around the
mean
IMPORTANT TERMINOLOGIES (VARIANCE)
Step 1: Standardization
The main aim of this step is to standardize the range of the attributes so that each one of them lie within
similar boundaries
• Z = (x - μ) / σ
• σ = √[ Σ(x - x̄)² / N ]
STANDARDIZATION
Dataset:
Consider a small dataset with two variables, X and Y, represented by the following data points:
• X: [2, 3, 5, 7, 10]
• Y: [4, 5, 7, 8, 11]
For variable X:
• Mean (μX) = (2 + 3 + 5 + 7 + 10) / 5 = 5.4
• Standard Deviation (σX) = √[Σ(Xi - μX)² / (n - 1)] = √[(0.64 + 0.04 + 0.16 + 1.44 + 20.25) / 4] = 2.40
For variable Y:
• Mean (μY) = (4 + 5 + 7 + 8 + 11) / 5 = 7
• Standard Deviation (σY) = √[Σ(Yi - μY)² / (n - 1)] = √[(9 + 4 + 0 + 1 + 16) / 4] = 2.38
• Standardized X: [-1.25, -0.71, 0.36, 1.43, 0.17]
• Standardized Y: [-1.34, -0.87, 0.11, 0.61, 1.50]
Covariance Matrix Computation
Covariance matrix is used to express the correlation between any two or more attributes in
a multidimensional dataset
Cov(X, X) Cov(X, Y)
Cov(Y, X) Cov(Y, Y)
• Using the formula for covariance:
1.305 0.133
0.133 1.24
Important Terminologies (Covariance)
It is the relationship It can take any value negative relationship It is used for the linear It gives the direction of
between a pair of between -infinity to whereas a positive value relationship between relationship between
random variables where +infinity, where the represents the positive variables. variables.
change in one variable negative value relationship.
causes change in represents the
another variable.
IMPORTANT TERMINOLOGIES
(COVARIANCE)
The formula for the covariance (Cov) between two random variables X and Y,
each with N data points, is as follows:
Cov(X, Y) = (1/N) * Σ (from i=1 to N) [(Xi - X̄ ) * (Yi - Ȳ)]
Where:
• Cov(X, Y) is the covariance between X and Y.
• N is the number of data points.
• Xi and Yi represent individual data points for X and Y, respectively.
COMPUTE EIGENVALUES AND EIGENVECTORS OF
COVARIANCE MATRIX TO IDENTIFY PRINCIPAL COMPONENTS
Netflix Movie
Grocery
Recommendation Fitness Trackers Car Shopping Real Estate
Shopping
s
Manufacturing
Renewable
and Quality Sports Analytics Smart Cities
Energy
Control
Advantages of PCA
Prevents Overfitting
Speeds Up Other Machine Learning
Algorithms
Improves Visualization
Dimensionality Reduction
Noise Reduction
LIMITATIONS OF PCA
Linearity Assumption
Loss of Interpretability
Loss of Information
Sensitivity to Scaling
Orthogonal Components