Open In App

How to Use R prcomp Results for Prediction?

Last Updated : 20 Jun, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

Principal Component Analysis (PCA) is a powerful technique used for dimensionality reduction. The prcomp function in R is commonly used to perform PCA. Once you have obtained the principal components, you may want to use these results to make predictions about new data. This article provides a step-by-step guide on how to use prcomp results for prediction in R Programming Language.

What is prcomp?

The prcomp function in R performs PCA and returns an object containing several components:

  • sdev: The standard deviations of the principal components.
  • rotation: The matrix of variable loadings (eigenvectors).
  • center: The means of the variables.
  • scale: The Scale is applied to the variables.
  • x: The data coordinates in the principal component space (scores).

Steps to Use prcomp Results for Prediction

Perform PCA using prcomp: Fit the PCA model on the training data.

  • Transform the Training Data: Use the principal components to transform the training data.
  • Fit a Predictive Model: Use the transformed training data to fit a predictive model.
  • Transform New Data: Apply the PCA transformation to new data.
  • Predict on New Data: Use the fitted model to make predictions on the transformed new data.

Now we will discuss all the required steps for How to Use R prcomp Results for Prediction in R Programming Language.

Step 1: Perform PCA using prcomp

First, let's create a sample dataset and perform PCA using the prcomp function.

R
# Load necessary packages
library(stats)

# Generate sample data
set.seed(123)
n <- 100
p <- 5
data <- matrix(rnorm(n * p), nrow = n, ncol = p)
colnames(data) <- paste0("Var", 1:p)

# Perform PCA
pca_result <- prcomp(data, center = TRUE, scale. = TRUE)
summary(pca_result)

Output:

Importance of components:
                          PC1    PC2    PC3    PC4    PC5
Standard deviation     1.1077 1.0610 1.0191 0.9468 0.8440
Proportion of Variance 0.2454 0.2251 0.2077 0.1793 0.1425
Cumulative Proportion  0.2454 0.4705 0.6783 0.8575 1.0000

Step 2: Transform the Training Data

Use the principal components to transform the training data. The transformed data will be in the principal component space.

R
# Extract the principal components (scores)
pca_scores <- pca_result$x

Step 3: Fit a Predictive Model

Fit a predictive model using the transformed training data. For simplicity, we'll use linear regression as an example.

R
# Generate a response variable
response <- rnorm(n)

# Fit a linear regression model using the first two principal components
model <- lm(response ~ pca_scores[, 1] + pca_scores[, 2])
summary(model)

Output:

Call:
lm(formula = response ~ pca_scores[, 1] + pca_scores[, 2])

Residuals:
     Min       1Q   Median       3Q      Max 
-2.70692 -0.54540  0.01939  0.60011  2.76839 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)
(Intercept)     -0.04230    0.09386  -0.451    0.653
pca_scores[, 1] -0.12064    0.08516  -1.417    0.160
pca_scores[, 2] -0.01179    0.08891  -0.133    0.895

Residual standard error: 0.9386 on 97 degrees of freedom
Multiple R-squared:  0.02044,    Adjusted R-squared:  0.0002459 
F-statistic: 1.012 on 2 and 97 DF,  p-value: 0.3672

Step 4: Transform New Data

Apply the PCA transformation to new data to obtain the principal component scores for the new data.

R
# Generate new sample data
new_data <- matrix(rnorm(n * p), nrow = n, ncol = p)
colnames(new_data) <- paste0("Var", 1:p)

# Center and scale the new data using the means and standard deviations from the training data
new_data_scaled <- scale(new_data, center = pca_result$center, scale = pca_result$scale)

# Compute the principal component scores for the new data
new_pca_scores <- as.matrix(new_data_scaled) %*% pca_result$rotation

Step 5: Predict on New Data

Use the fitted model to make predictions on the transformed new data.

R
predictions <- predict(model, newdata = data.frame(PC1 = new_pca_scores[, 1], 
                                                   PC2 = new_pca_scores[, 2]))
head(predictions)

Output:

          1           2           3           4           5           6 
 0.10301954 -0.06358980 -0.24969282 -0.02915302 -0.01309235 -0.35260156 

Conclusion

Using prcomp results for prediction involves transforming both the training and new data using the principal components obtained from PCA. This approach helps in reducing dimensionality and potentially improving the performance of predictive models by focusing on the most significant components. By following the steps outlined in this guide, you can effectively use PCA results for making predictions in R.


Next Article

Similar Reads