How to Use R prcomp Results for Prediction?
Last Updated :
20 Jun, 2024
Principal Component Analysis (PCA) is a powerful technique used for dimensionality reduction. The prcomp function in R is commonly used to perform PCA. Once you have obtained the principal components, you may want to use these results to make predictions about new data. This article provides a step-by-step guide on how to use prcomp results for prediction in R Programming Language.
What is prcomp?
The prcomp function in R performs PCA and returns an object containing several components:
- sdev: The standard deviations of the principal components.
- rotation: The matrix of variable loadings (eigenvectors).
- center: The means of the variables.
- scale: The Scale is applied to the variables.
- x: The data coordinates in the principal component space (scores).
Steps to Use prcomp Results for Prediction
Perform PCA using prcomp: Fit the PCA model on the training data.
- Transform the Training Data: Use the principal components to transform the training data.
- Fit a Predictive Model: Use the transformed training data to fit a predictive model.
- Transform New Data: Apply the PCA transformation to new data.
- Predict on New Data: Use the fitted model to make predictions on the transformed new data.
Now we will discuss all the required steps for How to Use R prcomp Results for Prediction in R Programming Language.
Step 1: Perform PCA using prcomp
First, let's create a sample dataset and perform PCA using the prcomp function.
R
# Load necessary packages
library(stats)
# Generate sample data
set.seed(123)
n <- 100
p <- 5
data <- matrix(rnorm(n * p), nrow = n, ncol = p)
colnames(data) <- paste0("Var", 1:p)
# Perform PCA
pca_result <- prcomp(data, center = TRUE, scale. = TRUE)
summary(pca_result)
Output:
Importance of components:
PC1 PC2 PC3 PC4 PC5
Standard deviation 1.1077 1.0610 1.0191 0.9468 0.8440
Proportion of Variance 0.2454 0.2251 0.2077 0.1793 0.1425
Cumulative Proportion 0.2454 0.4705 0.6783 0.8575 1.0000
Step 2: Transform the Training Data
Use the principal components to transform the training data. The transformed data will be in the principal component space.
R
# Extract the principal components (scores)
pca_scores <- pca_result$x
Step 3: Fit a Predictive Model
Fit a predictive model using the transformed training data. For simplicity, we'll use linear regression as an example.
R
# Generate a response variable
response <- rnorm(n)
# Fit a linear regression model using the first two principal components
model <- lm(response ~ pca_scores[, 1] + pca_scores[, 2])
summary(model)
Output:
Call:
lm(formula = response ~ pca_scores[, 1] + pca_scores[, 2])
Residuals:
Min 1Q Median 3Q Max
-2.70692 -0.54540 0.01939 0.60011 2.76839
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.04230 0.09386 -0.451 0.653
pca_scores[, 1] -0.12064 0.08516 -1.417 0.160
pca_scores[, 2] -0.01179 0.08891 -0.133 0.895
Residual standard error: 0.9386 on 97 degrees of freedom
Multiple R-squared: 0.02044, Adjusted R-squared: 0.0002459
F-statistic: 1.012 on 2 and 97 DF, p-value: 0.3672
Step 4: Transform New Data
Apply the PCA transformation to new data to obtain the principal component scores for the new data.
R
# Generate new sample data
new_data <- matrix(rnorm(n * p), nrow = n, ncol = p)
colnames(new_data) <- paste0("Var", 1:p)
# Center and scale the new data using the means and standard deviations from the training data
new_data_scaled <- scale(new_data, center = pca_result$center, scale = pca_result$scale)
# Compute the principal component scores for the new data
new_pca_scores <- as.matrix(new_data_scaled) %*% pca_result$rotation
Step 5: Predict on New Data
Use the fitted model to make predictions on the transformed new data.
R
predictions <- predict(model, newdata = data.frame(PC1 = new_pca_scores[, 1],
PC2 = new_pca_scores[, 2]))
head(predictions)
Output:
1 2 3 4 5 6
0.10301954 -0.06358980 -0.24969282 -0.02915302 -0.01309235 -0.35260156
Conclusion
Using prcomp results for prediction involves transforming both the training and new data using the principal components obtained from PCA. This approach helps in reducing dimensionality and potentially improving the performance of predictive models by focusing on the most significant components. By following the steps outlined in this guide, you can effectively use PCA results for making predictions in R.