How to Use R prcomp Results for Prediction?
Last Updated :
20 Jun, 2024
Principal Component Analysis (PCA) is a powerful technique used for dimensionality reduction. The prcomp function in R is commonly used to perform PCA. Once you have obtained the principal components, you may want to use these results to make predictions about new data. This article provides a step-by-step guide on how to use prcomp results for prediction in R Programming Language.
What is prcomp?
The prcomp function in R performs PCA and returns an object containing several components:
- sdev: The standard deviations of the principal components.
- rotation: The matrix of variable loadings (eigenvectors).
- center: The means of the variables.
- scale: The Scale is applied to the variables.
- x: The data coordinates in the principal component space (scores).
Steps to Use prcomp Results for Prediction
Perform PCA using prcomp: Fit the PCA model on the training data.
- Transform the Training Data: Use the principal components to transform the training data.
- Fit a Predictive Model: Use the transformed training data to fit a predictive model.
- Transform New Data: Apply the PCA transformation to new data.
- Predict on New Data: Use the fitted model to make predictions on the transformed new data.
Now we will discuss all the required steps for How to Use R prcomp Results for Prediction in R Programming Language.
Step 1: Perform PCA using prcomp
First, let's create a sample dataset and perform PCA using the prcomp function.
R
# Load necessary packages
library(stats)
# Generate sample data
set.seed(123)
n <- 100
p <- 5
data <- matrix(rnorm(n * p), nrow = n, ncol = p)
colnames(data) <- paste0("Var", 1:p)
# Perform PCA
pca_result <- prcomp(data, center = TRUE, scale. = TRUE)
summary(pca_result)
Output:
Importance of components:
PC1 PC2 PC3 PC4 PC5
Standard deviation 1.1077 1.0610 1.0191 0.9468 0.8440
Proportion of Variance 0.2454 0.2251 0.2077 0.1793 0.1425
Cumulative Proportion 0.2454 0.4705 0.6783 0.8575 1.0000
Step 2: Transform the Training Data
Use the principal components to transform the training data. The transformed data will be in the principal component space.
R
# Extract the principal components (scores)
pca_scores <- pca_result$x
Step 3: Fit a Predictive Model
Fit a predictive model using the transformed training data. For simplicity, we'll use linear regression as an example.
R
# Generate a response variable
response <- rnorm(n)
# Fit a linear regression model using the first two principal components
model <- lm(response ~ pca_scores[, 1] + pca_scores[, 2])
summary(model)
Output:
Call:
lm(formula = response ~ pca_scores[, 1] + pca_scores[, 2])
Residuals:
Min 1Q Median 3Q Max
-2.70692 -0.54540 0.01939 0.60011 2.76839
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.04230 0.09386 -0.451 0.653
pca_scores[, 1] -0.12064 0.08516 -1.417 0.160
pca_scores[, 2] -0.01179 0.08891 -0.133 0.895
Residual standard error: 0.9386 on 97 degrees of freedom
Multiple R-squared: 0.02044, Adjusted R-squared: 0.0002459
F-statistic: 1.012 on 2 and 97 DF, p-value: 0.3672
Step 4: Transform New Data
Apply the PCA transformation to new data to obtain the principal component scores for the new data.
R
# Generate new sample data
new_data <- matrix(rnorm(n * p), nrow = n, ncol = p)
colnames(new_data) <- paste0("Var", 1:p)
# Center and scale the new data using the means and standard deviations from the training data
new_data_scaled <- scale(new_data, center = pca_result$center, scale = pca_result$scale)
# Compute the principal component scores for the new data
new_pca_scores <- as.matrix(new_data_scaled) %*% pca_result$rotation
Step 5: Predict on New Data
Use the fitted model to make predictions on the transformed new data.
R
predictions <- predict(model, newdata = data.frame(PC1 = new_pca_scores[, 1],
PC2 = new_pca_scores[, 2]))
head(predictions)
Output:
1 2 3 4 5 6
0.10301954 -0.06358980 -0.24969282 -0.02915302 -0.01309235 -0.35260156
Conclusion
Using prcomp results for prediction involves transforming both the training and new data using the principal components obtained from PCA. This approach helps in reducing dimensionality and potentially improving the performance of predictive models by focusing on the most significant components. By following the steps outlined in this guide, you can effectively use PCA results for making predictions in R.
Similar Reads
How to Interpret Results of h2o.predict in R
H2O.ai is a great platform for machine learning and data science, offering open-source and enterprise-level solutions. It facilitates high-performance computing for large-scale data analysis. It provides a suite of tools designed to simplify and accelerate the development of machine learning models.
6 min read
How do you Make a Prediction with Random Forest in R?
In the field of data science, random forests have become a tool. They are known for their ability to handle types of data prevent overfitting and provide interpretability. In this article, we shall take a look at the procedure for making forecasts with forests in R Programming Language and show how
10 min read
How to Plot Predicted Values in R?
In this article, we will discuss how to plot predicted values in the R Programming Language. A linear model is used to predict the value of an unknown variable based on independent variables using the technique linear regression. It is mostly used for finding out the relationship between variables a
4 min read
Prediction Interval for Linear Regression in R
Linear Regression model is used to establish a connection between two or more variables. These variables are either dependent or independent. Linear Regression In R Programming Language is used to give predictions based on the given data about a particular topic, It helps us to have valuable insight
15+ min read
How to use Summary Function in R?
The summary() function provides a quick statistical overview of a given dataset or vector. When applied to numeric data, it returns the following key summary statistics:Min: The minimum value in the data1st Qu: The first quartile (25th percentile)Median: The middle value (50th percentile)3rd Qu: The
2 min read
How to Use the (?) Operator in R
The ? operator in R is a simple yet powerful tool that provides quick access to documentation and help pages for functions, datasets, and other objects within the R environment. Understanding how to effectively use this operator can significantly enhance your productivity and help you learn R Progra
4 min read
Data Prediction using Decision Tree of rpart
Decision trees are a popular choice due to their simplicity and interpretation, and effectiveness at handling both numerical and categorical data. The rpart (Recursive Partitioning) package in R specializes in constructing these trees, offering a robust framework for building predictive models.Overv
3 min read
How to Extract the Residuals and Predicted Values from Linear Model in R?
Extracting residuals and predicted (fitted) values from a linear model is essential in understanding the model's performance. The lm() function fits linear models in R and you can easily extract residuals and predicted values using built-in functions. This article will guide you through the steps an
3 min read
Churn Prediction for Subscription Services in R
Churn Predictions have become an important part of today's economic world for all the companies providing subscriptions to their consumers. Churn prediction is a process identifying the customers who are more likely to cancel their subscriptions to a service. These predictions help the service provi
10 min read
How to Import SAS Files into R?
In this article, we are going to see how to import SAS files(.sas7bdat) into R Programming Language. SAS stands for Statistical Analysis Software, it contains SAS program code saved in a propriety binary format. The R packages discussed, haven and sas7bdat, involved reverse engineering this proprie
1 min read