Chapter 12 - Dimension Reduction
Chapter 12 - Dimension Reduction
4. Violation of Parsimony:
• Principle: Parsimony emphasizes simplicity; models should use as few predictors as
necessary for accurate interpretation.
• Issue: Too many predictors complicate models, violating this principle.
5. Overfitting:
• Problem: Models with many predictors may perform well on training data but poorly on
new data, as they are too specific to the training set.
• Benefit of Dimension Reduction: Reduces overfitting by simplifying the model.
6. Missing the Bigger Picture:
• Example: Variables like savings account balance, checking account balance, and 401(k)
balance might be better grouped under a single component (e.g., assets).
• Dimension Reduction Goal: Helps reveal underlying relationships among variables by
grouping them.
Official Business
What is VIF?
• The Variance Inflation Factor (VIF) measures how much the variance of a
regression coefficient is inflated due to multicollinearity.
• The VIF for the i-th predictor is given by:
• R²ᵢ: Represents the R² value obtained by regressing the predictor on all other predictor
variables.
• Interpretation: A high R²ᵢ indicates that is highly correlated with other predictors, leading
to a high VIF.
Official Business
Example:
• A VIF of 5 corresponds to R² = 0.80, suggesting that 80% of the variance in is
explained by other predictors.
• A VIF of 10 also corresponds to R² = 0.80, reinforcing high multicollinearity.
Official Business
• High VIF values suggest that the model may struggle to determine the
true effect of individual predictors due to their interdependence.
o Solution: Address multicollinearity by removing or combining correlated
predictors.
Official Business
Characteristics of Principal
Components
1. First Component: Accounts for the most variability in the dataset.
2. Second Component: Accounts for the second-most variability and
is uncorrelated with the first.
3. Subsequent Components: Continue capturing variability, each being
uncorrelated with the preceding components.
Official Business
Benefits of PCA
• Reduces Dimensionality: Helps to simplify complex datasets.
• Removes Multicollinearity: By creating uncorrelated components,
PCA addresses issues caused by correlated predictors.
Official Business