Chapter 12 - Dimension Reduction

Uploaded by

Neilsen Kort

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

Chapter 12 - Dimension Reduction

Uploaded by

Neilsen Kort

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 14

Official Business

Chapter 12: Dimension

Reduction
Introduction to Data Science
Official Business

The Need for Dimension Reduction

• High Dimensionality: Refers to datasets with a large number of
predictors (e.g., 100 predictors describe a 100-dimensional space).
Managing high-dimensional data can be challenging.
Official Business

Why Reduce Dimension?

1. Multicollinearity:
• Definition: When predictors are highly correlated, leading to unstable regression models.
• Issue: Correlated predictors can cause difficulties in estimating accurate model coefficients.
2. Double-Counting:
• Example: When correlated predictors like height and weight are both used to estimate age,
it overemphasizes the physical aspect, effectively double-counting.
• Solution: Dimension reduction can help eliminate redundant predictors.
3. Curse of Dimensionality:
• Explanation: As the number of dimensions (predictors) increases, the volume of the
predictor space grows exponentially, making data sparse.
• Impact: Even large datasets can struggle with high-dimensional spaces, making it difficult
to draw meaningful conclusions.
Official Business

4. Violation of Parsimony:
• Principle: Parsimony emphasizes simplicity; models should use as few predictors as
necessary for accurate interpretation.
• Issue: Too many predictors complicate models, violating this principle.
5. Overfitting:
• Problem: Models with many predictors may perform well on training data but poorly on
new data, as they are too specific to the training set.
• Benefit of Dimension Reduction: Reduces overfitting by simplifying the model.
6. Missing the Bigger Picture:
• Example: Variables like savings account balance, checking account balance, and 401(k)
balance might be better grouped under a single component (e.g., assets).
• Dimension Reduction Goal: Helps reveal underlying relationships among variables by
grouping them.
Official Business

Variance Inflation Factor (VIF) and

Multicollinearity
Detecting Multicollinearity:
• Even if correlation among predictors is not checked before running a
regression, the results can still warn of multicollinearity using Variance
Inflation Factors (VIFs).
Official Business

What is VIF?
• The Variance Inflation Factor (VIF) measures how much the variance of a
regression coefficient is inflated due to multicollinearity.
• The VIF for the i-th predictor is given by:

• R²ᵢ: Represents the R² value obtained by regressing the predictor on all other predictor
variables.
• Interpretation: A high R²ᵢ indicates that is highly correlated with other predictors, leading
to a high VIF.
Official Business

Interpreting VIF Values:

• VIF ≥ 5: Indicates moderate multicollinearity.
• VIF ≥ 10: Indicates severe multicollinearity.

Example:
• A VIF of 5 corresponds to R² = 0.80, suggesting that 80% of the variance in is
explained by other predictors.
• A VIF of 10 also corresponds to R² = 0.80, reinforcing high multicollinearity.
Official Business

• High VIF values suggest that the model may struggle to determine the
true effect of individual predictors due to their interdependence.
o Solution: Address multicollinearity by removing or combining correlated
predictors.
Official Business

Principal Components Analysis

What to Do When Multicollinearity is Detected?
• Solution: Apply Principal Components Analysis (PCA) to address
multicollinearity.
What is PCA?
• Purpose: PCA transforms a set of correlated predictors into a smaller set of
uncorrelated linear combinations, called components.
• Dimension Reduction: Instead of working with m correlated predictors, PCA
allows you to reduce the data to k < m components that capture most of the
original variability.
Official Business

How PCA Works

1. Correlation Structure
• PCA accounts for the correlation structure among predictor variables, simplifying
the data.
2. Components
• Each component is an uncorrelated linear combination of the original predictors.
• The first component captures the most variability, followed by the second, which is
uncorrelated with the first, and so on.
3. Replacing Predictors:
• The original m variables can be replaced with the k components, reducing the
dataset's dimensionality.
• This results in a new dataset of n records on k components instead of n records on
m predictors.
Official Business

Key Points About PCA

• Focus on Predictors Only: PCA operates only on the predictor
variables and does not consider the target variable.
• Standardization Required: The predictors should be standardized or
normalized before applying PCA.
Official Business

Characteristics of Principal
Components
1. First Component: Accounts for the most variability in the dataset.
2. Second Component: Accounts for the second-most variability and
is uncorrelated with the first.
3. Subsequent Components: Continue capturing variability, each being
uncorrelated with the preceding components.
Official Business

Benefits of PCA
• Reduces Dimensionality: Helps to simplify complex datasets.
• Removes Multicollinearity: By creating uncorrelated components,
PCA addresses issues caused by correlated predictors.
Official Business

Bone Fracture Identification With Deep Learning Model Using Resnet50
No ratings yet
Bone Fracture Identification With Deep Learning Model Using Resnet50
14 pages
Dimensionality Reduction Techniques for ML Class_2813f3a7aa17cc18bb7fd5e1c5779838
No ratings yet
Dimensionality Reduction Techniques for ML Class_2813f3a7aa17cc18bb7fd5e1c5779838
17 pages
1.variable Reduction 2.principal Component Analysis: Topic UNIT-4
No ratings yet
1.variable Reduction 2.principal Component Analysis: Topic UNIT-4
19 pages
Dimension Reduction
No ratings yet
Dimension Reduction
38 pages
PCA - Feb 8
No ratings yet
PCA - Feb 8
28 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
19 pages
Ai & ML Week-9
No ratings yet
Ai & ML Week-9
30 pages
Dimensionality Reduction Techniques You Should Know in 2021
No ratings yet
Dimensionality Reduction Techniques You Should Know in 2021
12 pages
Monograph PCA-FA Final Version
No ratings yet
Monograph PCA-FA Final Version
40 pages
Module 3
No ratings yet
Module 3
41 pages
DS Unit 3 Essay Answers
No ratings yet
DS Unit 3 Essay Answers
15 pages
Unit No.02 - Feature Extraction and Selection
No ratings yet
Unit No.02 - Feature Extraction and Selection
17 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
30 pages
Dimentiality
No ratings yet
Dimentiality
4 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
27 pages
Dimensionality Reduction2023
No ratings yet
Dimensionality Reduction2023
20 pages
Sess03 Dimension Reduction Methods
No ratings yet
Sess03 Dimension Reduction Methods
36 pages
DMBAR Chapter 4 Dimension Reduction
No ratings yet
DMBAR Chapter 4 Dimension Reduction
25 pages
Unit 4 Dimenstionality Reduction
No ratings yet
Unit 4 Dimenstionality Reduction
104 pages
ML Unit 4 @ VS
No ratings yet
ML Unit 4 @ VS
33 pages
Unit 3dimentionality Reduction
No ratings yet
Unit 3dimentionality Reduction
13 pages
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
No ratings yet
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
59 pages
ML RUSA Module 5 Dim Red
No ratings yet
ML RUSA Module 5 Dim Red
85 pages
Dimensonality Reduction
No ratings yet
Dimensonality Reduction
25 pages
22AIP3101A Session 7
No ratings yet
22AIP3101A Session 7
28 pages
Multicollinearity and Remedies
No ratings yet
Multicollinearity and Remedies
23 pages
data reduction
No ratings yet
data reduction
9 pages
03 Dimensionality Reduction
No ratings yet
03 Dimensionality Reduction
38 pages
Chapter Five Principal Comonent Analysis (PCA)
No ratings yet
Chapter Five Principal Comonent Analysis (PCA)
33 pages
U5@-Data Reduction
No ratings yet
U5@-Data Reduction
22 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
PCA Using Python
No ratings yet
PCA Using Python
18 pages
9 ML
No ratings yet
9 ML
39 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
20 pages
U4 - PCA - 5th Sem - DS
No ratings yet
U4 - PCA - 5th Sem - DS
14 pages
Monograph PCA-FA Final Version
No ratings yet
Monograph PCA-FA Final Version
40 pages
Principal_Components_Analysis (1)
100% (1)
Principal_Components_Analysis (1)
24 pages
Principal Component Analysis: Jianxin Wu
No ratings yet
Principal Component Analysis: Jianxin Wu
24 pages
Principal Component Analysis and Cluster Analysis
No ratings yet
Principal Component Analysis and Cluster Analysis
14 pages
Class8-9 DataPreprocessing DataReduction 30Sept-05Oct2020
No ratings yet
Class8-9 DataPreprocessing DataReduction 30Sept-05Oct2020
22 pages
5-dimension reduction
No ratings yet
5-dimension reduction
48 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
3 pages
Dimensional Reduction in R
No ratings yet
Dimensional Reduction in R
24 pages
Chapter 3 Normalized Principal Components Analysis
No ratings yet
Chapter 3 Normalized Principal Components Analysis
4 pages
14: Dimensionality Reduction (PCA) : Motivation 1: Data Compression
No ratings yet
14: Dimensionality Reduction (PCA) : Motivation 1: Data Compression
7 pages
10-601 Machine Learning (Fall 2010) Principal Component Analysis
No ratings yet
10-601 Machine Learning (Fall 2010) Principal Component Analysis
8 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
9 pages
Module 5 - BECE309L - AIML - Part2
No ratings yet
Module 5 - BECE309L - AIML - Part2
34 pages
2101 Week11PCA
No ratings yet
2101 Week11PCA
32 pages
3.2 Pca
No ratings yet
3.2 Pca
27 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
82 pages
Unit 4 Part 2
No ratings yet
Unit 4 Part 2
17 pages
W4.2 DataPreProcessing-PCA (1)
No ratings yet
W4.2 DataPreProcessing-PCA (1)
22 pages
Machine Learning Unit-5
No ratings yet
Machine Learning Unit-5
49 pages
Dimensionality Reduction (Pca)
No ratings yet
Dimensionality Reduction (Pca)
32 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
24 pages
Advanced Data Analysis Techniques 2
No ratings yet
Advanced Data Analysis Techniques 2
32 pages
Things To Remember - Principal Component Analysis
No ratings yet
Things To Remember - Principal Component Analysis
2 pages
MiM Predictive Analytics Sessions 1 2 (PCA)
No ratings yet
MiM Predictive Analytics Sessions 1 2 (PCA)
26 pages
IT Specialist: Artificial Intelligence Exam Prep - 500 Questions for Certification Success (0225)
From Everand
IT Specialist: Artificial Intelligence Exam Prep - 500 Questions for Certification Success (0225)
Satou Takahiro
No ratings yet
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet
Unit 4 - Machine Learning - WWW - Rgpvnotes.in
No ratings yet
Unit 4 - Machine Learning - WWW - Rgpvnotes.in
15 pages
Syllabus
No ratings yet
Syllabus
2 pages
Presentation 2
No ratings yet
Presentation 2
9 pages
Salazar CPE124 Courswork 1
No ratings yet
Salazar CPE124 Courswork 1
22 pages
Top 170 Machine Learning Interview Questions and Answers (2024) - Reader View
No ratings yet
Top 170 Machine Learning Interview Questions and Answers (2024) - Reader View
51 pages
Data Mining: Dosen: Dr. Vitri Tundjungsari
No ratings yet
Data Mining: Dosen: Dr. Vitri Tundjungsari
64 pages
DS & AI_Syllabus Page
No ratings yet
DS & AI_Syllabus Page
2 pages
@machine Learning Applied To The Design and Inspection of Reinforced Concrete Bridges Resilient Methods and Emerging Applications
No ratings yet
@machine Learning Applied To The Design and Inspection of Reinforced Concrete Bridges Resilient Methods and Emerging Applications
10 pages
Data Visualization - Spring 2017
No ratings yet
Data Visualization - Spring 2017
57 pages
MathsforMachineLearning-GeeksforGeeks_1738137698972
No ratings yet
MathsforMachineLearning-GeeksforGeeks_1738137698972
14 pages
2DMT
No ratings yet
2DMT
73 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
13 pages
Journal Tea Eng
No ratings yet
Journal Tea Eng
7 pages
Local - PCA - It Is Against PCA
No ratings yet
Local - PCA - It Is Against PCA
24 pages
UMAP for Dimensionality Reduction in Sleep Stage Classification Using EEG Data
No ratings yet
UMAP for Dimensionality Reduction in Sleep Stage Classification Using EEG Data
4 pages
Manoj Intern Data Science
No ratings yet
Manoj Intern Data Science
37 pages
OMSA6740 Summer2023 Xie Syllabus
No ratings yet
OMSA6740 Summer2023 Xie Syllabus
6 pages
UNIT 1
No ratings yet
UNIT 1
27 pages
COP K Means Algo
No ratings yet
COP K Means Algo
10 pages
51 DA5400_FML51_20250501 ProblemSet06
No ratings yet
51 DA5400_FML51_20250501 ProblemSet06
4 pages
IAI&ML UNIT-4
No ratings yet
IAI&ML UNIT-4
34 pages
Factoextra R Package - Easy Multivariate Data Analyses and Elegant Visualization - Easy Guides - Wiki - STHDA
No ratings yet
Factoextra R Package - Easy Multivariate Data Analyses and Elegant Visualization - Easy Guides - Wiki - STHDA
33 pages
Unit 2 - Speech and Video Processing (SVP) - 1
No ratings yet
Unit 2 - Speech and Video Processing (SVP) - 1
23 pages
Project Report
No ratings yet
Project Report
29 pages
Stat841 Outline
No ratings yet
Stat841 Outline
3 pages
Machine Learning
No ratings yet
Machine Learning
17 pages
s00521-022-08017-3
No ratings yet
s00521-022-08017-3
42 pages
Real Internship Report
No ratings yet
Real Internship Report
49 pages
YAO-SENIORTHESIS-2016
No ratings yet
YAO-SENIORTHESIS-2016
59 pages

Chapter 12 - Dimension Reduction

Uploaded by

Chapter 12 - Dimension Reduction

Uploaded by

Official Business

Chapter 12: Dimension

The Need for Dimension Reduction

Why Reduce Dimension?

Variance Inflation Factor (VIF) and

Interpreting VIF Values:

Principal Components Analysis

How PCA Works

Key Points About PCA

You might also like