Polynomial Regression in R Programming
Last Updated :
21 Jul, 2021
Polynomial Regression is a form of linear regression in which the relationship between the independent variable x and dependent variable y is modeled as an nth degree polynomial. Polynomial regression fits a nonlinear relationship between the value of x and the corresponding conditional mean of y, denoted E(y|x). Basically it adds the quadratic or polynomial terms to the regression. Generally, this kind of regression is used for one resultant variable and one predictor.
Need of Polynomial Regression
- Unlike linear data set, if one tries to apply linear model on non-linear data set without any modification, then there will be a very unsatisfactory and drastic result .
- This may lead to increase in loss function, decrease in accuracy and high error rate.
- Unlike linear model, polynomial model covers more data points.
Applications of Polynomial Regression
Generally, polynomial regression is used in the following scenarios :
- Rate of growth of tissues.
- Progression of the epidemics related to disease.
- Distribution phenomenon of the isotopes of carbon in lake sediments.
Explanation of Polynomial Regression in R Programming
Polynomial Regression is also known as Polynomial Linear Regression since it depends on the linearly arranged coefficients rather than the variables. In R, if one wants to implement polynomial regression then he must install the following packages:
- tidyverse package for better visualization and manipulation.
- caret package for a smoother and easier machine learning workflow.
After proper installation of the packages, one needs to set the data properly. For that, first one needs to split the data into two sets(train set and test set). Then one can visualize the data into various plots. In R, in order to fit a polynomial regression, first one needs to generate pseudo random numbers using the set.seed(n) function.
The polynomial regression adds polynomial or quadratic terms to the regression equation as follow:
medv = b0 + b1 * lstat + b2 * lstat 2
where
mdev: is the median house value
lstat: is the predictor variable
In R, to create a predictor x2 one should use the function I(), as follow: I(x2). This raise x to the power 2. The polynomial regression can be computed in R as follow:
lm(medv ~ lstat + I(lstat^2), data = train.data)
For this following example let's take the Boston data set of MASS package.
Example:
r
# R program to illustrate
# Polynomial regression
# Importing required library
library(tidyverse)
library(caret)
theme_set(theme_classic())
# Load the data
data("Boston", package = "MASS")
# Split the data into training and test set
set.seed(123)
training.samples <- Boston$medv %>%
createDataPartition(p = 0.8, list = FALSE)
train.data <- Boston[training.samples, ]
test.data <- Boston[-training.samples, ]
# Build the model
model <- lm(medv ~ poly(lstat, 5, raw = TRUE),
data = train.data)
# Make predictions
predictions <- model %>% predict(test.data)
# Model performance
modelPerfomance = data.frame(
RMSE = RMSE(predictions, test.data$medv),
R2 = R2(predictions, test.data$medv)
)
print(lm(medv ~ lstat + I(lstat^2), data = train.data))
print(modelPerfomance)
Output:
Call:
lm(formula = medv ~ lstat + I(lstat^2), data = train.data)
Coefficients:
(Intercept) lstat I(lstat^2)
42.5736 -2.2673 0.0412
RMSE R2
1 5.270374 0.6829474
Graph plotting of Polynomial Regression
In R, if one wants to plot a graph for the output generated on implementing Polynomial Regression he can use the ggplot() function.
Example:
r
# R program to illustrate
# Graph plotting in
# Polynomial regression
# Importing required library
library(tidyverse)
library(caret)
theme_set(theme_classic())
# Load the data
data("Boston", package = "MASS")
# Split the data into training and test set
set.seed(123)
training.samples <- Boston$medv %>%
createDataPartition(p = 0.8, list = FALSE)
train.data <- Boston[training.samples, ]
test.data <- Boston[-training.samples, ]
# Build the model
model <- lm(medv ~ poly(lstat, 5, raw = TRUE), data = train.data)
# Make predictions
predictions <- model %>% predict(test.data)
# Model performance
data.frame(RMSE = RMSE(predictions, test.data$medv),
R2 = R2(predictions, test.data$medv))
ggplot(train.data, aes(lstat, medv) ) + geom_point() +
stat_smooth(method = lm, formula = y ~ poly(x, 5, raw = TRUE))
Output:
Similar Reads
Regression Analysis in R Programming
In statistics, Logistic Regression is a model that takes response variables (dependent variable) and features (independent variables) to determine the estimated probability of an event. A logistic model is used when the response variable has categorical values such as 0 or 1. For example, a student
6 min read
Regression and its Types in R Programming
Regression analysis is a statistical tool to estimate the relationship between two or more variables. There is always one response variable and one or more predictor variables. Regression analysis is widely used to fit the data accordingly and further, predicting the data for forecasting. It helps b
5 min read
Decision Tree for Regression in R Programming
Decision tree is a type of algorithm in machine learning that uses decisions as the features to represent the result in the form of a tree-like structure. It is a common tool used to visually represent the decisions made by the algorithm. Decision trees use both classification and regression. Regres
4 min read
R-squared Regression Analysis in R Programming
For the prediction of one variableâs value(dependent variable) through other variables (independent variables) some models are used that are called regression models. For further calculating the accuracy of this prediction another mathematical tool is used, which is R-squared Regression Analysis or
5 min read
Polymorphism in R Programming
R language implements parametric polymorphism, which means that methods in R refer to functions, not classes. Parametric polymorphism primarily lets us define a generic method or function for types of objects we havenât yet defined and may never do. This means that one can use the same name for seve
6 min read
Perform Linear Regression Analysis in R Programming - lm() Function
lm() function in R Language is a linear model function, used for linear regression analysis. Syntax: lm(formula) Parameters: formula: model description, such as x ~ y Example 1: Python3 # R program to illustrate # lm function # Creating two vectors x and y x <- c(rep(1:20)) y <- x * 2 # Callin
1 min read
Practice Questions on Polynomials
Polynomials are fundamental algebraic expressions that consist of variables and coefficients, incorporating the operations of addition, subtraction, multiplication, and non-negative integer exponents of variables. Understanding polynomials is crucial for solving various mathematical problems in alge
3 min read
How to Include Factors in Regression using R Programming?
Categorical variables (also known as a factor or qualitative variables) are variables that classify observational values into groups. They are either string or numeric are called factor variables in statistical modeling. Saving normal string variables as factors save a lot of memory. Factors can als
4 min read
Parallel Programming In R
Parallel programming is a type of programming that involves dividing a large computational task into smaller, more manageable tasks that can be executed simultaneously. This approach can significantly speed up the execution time of complex computations and is particularly useful for data-intensive a
6 min read
Assigning Vectors in R Programming
Vectors are one of the most basic data structure in R. They contain data of same type. Vectors in R is equivalent to arrays in other programming languages. In R, array is a vector of one or more dimensions and every single object created is stored in the form of a vector. The members of a vector are
5 min read