Significance Test for Linear Regression in R
Last Updated :
29 Jul, 2024
Linear regression is a statistical method for modeling the relationship between one or more independent variables and a dependent variable. It is frequently used to forecast the value of a dependent variable using the values of one or more independent factors. The lm() function in R can be used to conduct linear regression. We may want to evaluate the significance of the regression coefficients after we have fitted a linear regression model to our data. A relevance test can be used to accomplish this. In this tutorial, we will look at how to run a linear regression significance test in R.
What is the significance test for linear regression?
Significance tests for linear regression are used to determine if the relationship between the dependent variable and one or more independent variables is statistically significant. In other words, they help us determine if the independent variables are good predictors of the dependent variable.
Several tests can be used to determine the significance of a linear regression model, but the most common test is the t-test. The t-test is used to test whether the slope coefficient(s) in the linear regression model is significantly different from zero.
Types of linear regression in R
A statistical method for simulating the relationship between one or more independent variables and a dependent variable is called linear regression. The two types of linear regression are as follows:
a. Simple linear regression:
Only one independent variable is involved in simple linear regression, a type of linear regression. Finding the best-fit line that illustrates the relationship between the independent and dependent factors is the goal. An elementary linear regression model has the solution shown below:
y = β0 + β1x + ε
In this scenario, y is the dependent variable, x is the independent variable, 0 denotes the intercept, 1 denotes the slope, and denotes the error term.
b. Multiple linear regression:
A type of linear regression called multiple linear regression takes into account two or more independent variables. Finding the plane or hyperplane that best captures the connection between the independent variables and the dependent variable is the objective. The multiple linear regression model's formulations is as follows:
y = β0 + β1x1 + β2x2 + ... + βnxn + ε
In this scenario, y is the dependent variable, x1, x2,..., xn are the independent variables, 0 is the intercept, 1, 2,..., n are the independent variable values, and is the error term.
Significance Test for Linear Regression in R
The summary function in R can be used to perform the linear regression relevance test on a built-in linear regression model. For each predictor variable, the summary function provides comprehensive information about the linear regression model, including predicted coefficients, standard errors, t-statistics, and p-values.
Here's an example:
R
# Load the dataset
data(mtcars)
# Fit the linear regression model
model <- lm(mpg ~ wt + hp, data = mtcars)
# Perform the significance test
summary(model)
In this example, we will use the mtcars dataset to estimate kilometers per gallon (mpg) based on the car's weight (wt) and horsepower. (hp). The lm function is used to estimate the linear regression model, and the summary function is used to evaluate for significance.
Output of the above codeExample 2:
R
# Load the dataset
data(iris)
# Split the dataset into training and testing sets
set.seed(123)
train_index <- sample(1:nrow(iris), 0.7 * nrow(iris))
train_data <- iris[train_index, ]
test_data <- iris[-train_index, ]
# Fit the linear regression model on the training data
model <- lm(Sepal.Length ~ Sepal.Width + Petal.Length, data = train_data)
# Perform the significance test
summary(model)
# Make predictions on the testing data
predictions <- predict(model, newdata = test_data)
# Calculate the root mean squared error (RMSE)
RMSE <- sqrt(mean((test_data$Sepal.Length - predictions)^2))
# Print the RMSE
cat("RMSE:", RMSE, "\n")
# Visualize the relationship between Sepal.Length and Sepal.Width
plot(train_data$Sepal.Width, train_data$Sepal.Length, main = "Sepal.Width vs Sepal.Length", xlab = "Sepal.Width", ylab = "Sepal.Length")
abline(model, col = "red")
Explanation & Output
The iris dataset is loaded in the first sentence.
The information was randomly divided into a training set (70% of the data) and a testing set (30% of the data) on the second and third lines.
A linear regression model with Sepal is matched by the fourth line.Length and Sepal are the answer variables.Petals and width.Using the training data, length is used as the predictive variable.
Using the summary() method, a significance test is run on the model in the fifth line. The summary contains details on the model's parameters, standard errors, t-values, and p-values.
The trained algorithm is used in the sixth line to forecast the Sepal.Length of the material used for testing by predict().
The root mean squared error (RMSE) between the predicted and real Sepal is calculated in the seventh and final line.
In this case, we predicted Sepal. Length using the Sepal.Width and Petal.Length variables. We divided the iris dataset into training and testing groups, then applied the linear regression model to the training data before making forecasts on the testing data. To assess the precision of our model on the trial data, we compute the RMSE. We also display the regression line on the scatter plot to visualize the connection between Sepal. Width and Sepal.Length.
Conclusion
In this article, we covered how to run a linear regression significance test in R. We showed the process using the "mtcars" dataset and the variables "mpg" and "hp." We can use the significance test to find the statistical significance of the regression coefficients.
Similar Reads
Simple Linear Regression in R Regression shows a line or curve that passes through all the data points on the target-predictor graph in such a way that the vertical distance between the data points and the regression line is minimum What is Linear Regression?Linear Regression is a commonly used type of predictive analysis. Linea
12 min read
How to Plot the Linear Regression in R In this article, we are going to learn to plot linear regression in R. But, to plot Linear regression, we first need to understand what exactly is linear regression. What is Linear Regression?Linear Regression is a supervised learning model, which computes and predicts the output implemented from th
8 min read
Significance of Categorical Predictor in Logistic Regression in R When performing logistic regression in R, evaluating the significance of categorical predictors is crucial to understanding their impact on the response variable. This involves assessing whether the levels of the categorical predictor significantly affect the outcome variable. Hereâs a complete guid
5 min read
Standardizing regression coefficients changed significance in R Interpreting regression coefficients is fundamental to understanding the relationships between variables. Often, standardizing these coefficients can alter their significance levels, which has significant implications for the interpretation of your models. This article delves into why and how standa
5 min read
How to Interpret Significance Codes in R? In this article, we will discuss how to interpret Significance Codes in the R programming Language. The significance codes indicate how certain we can be that the following coefficient will have an impact on the dependent variable. This helps us in determining the Principal components that affect t
3 min read
Significance Testing in R Significance testing is a fundamental aspect of statistical analysis used to determine if the observed data provides sufficient evidence to reject a null hypothesis. This guide provides an overview of significance testing in R, including common tests, their implementation, and how to interpret resul
4 min read
Multiple linear regression using ggplot2 in R A regression line is basically used in statistical models which help to estimate the relationship between a dependent variable and at least one independent variable. There are two types of regression lines : Single Regression Line.Multiple Regression Lines. In this article, we are going to discuss h
3 min read
7 Steps to Run a Linear Regression Analysis using R Linear Regression is a useful statistical tool for modelling the relationship between a dependent variable and one or more independent variables. It is widely used in many disciplines, such as science, medicine, economics, and education. For instance, several areas of education employ linear regress
9 min read
Linear Regression and Group By in R Linear regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables. In R programming language it can be performed using the lm() function which stands for "linear model". Sometimes, analysts need to apply linear regression sepa
3 min read