How to Plot the Linear Regression in R
Last Updated :
24 Apr, 2025
In this article, we are going to learn to plot linear regression in R. But, to plot Linear regression, we first need to understand what exactly is linear regression.
What is Linear Regression?
Linear Regression is a supervised learning model, which computes and predicts the output implemented from the linear relationship the model established based on the data it gets fed with. The aim of this model is to find the linear equation that best fits the relationship between the independent variables (features) and the dependent variable (target). This was some basic insight into the concept of Linear Regression. With this, now we can dive into how to plot linear regression in R.
Now we need to understand, why we chose R for this purpose, as we can go with any other language like Python which also has a number of libraries for machine learning and data analysis.
Equation for Linear Regression
The Equation for Linear Regression is very simple which was covered in our primary standards,
y = bx + a
Here,
a: y-Intercept
b: Slope of the regression line (line of best fit)
x: independent variable or features
y: dependent variable or target
Now for a dataset Slope and Intercept can be calculated with the following formulas
Slope = sample covariance/sample variance
Intercept = ymean – slope* xmean
Loss Function
Loss function, often referred as Cost Function is basically the difference between the true value (y) and the value predicted by our Machine Learning model (Å·) which is called error in general terms. This can be calculated using Mean Squared Error method which is a commonly used method in statistics to find out errors.
Calculation of Mean Squared Error
Mean Squared Error or MSE can be calculated by squaring the difference between the predicted and true value for the whole dataset and then calculating its mean by dividing it by the total number of observations
Cost Function = Mean Squared Error = \frac{1}{n}\sum_{0}^{i}(\widehat{y_{i}}-y_{i})^{2}
Why R?
R is a popular programming language that was developed solely for statistical analysis and data visualization. Some of the machine learning models we use are already pre-trained into R (no need for installation of external libraries) including Linear Model (or Regression). Also, this language comes with a huge community and one of the best tools and libraries for Machine Learning, which makes it the first choice for Data Science and Machine Learning enthusiasts.
Now, let's start with plotting linear regression.
Plotting Linear Regression in R
The dataset we are using for this is: placement.csv
R
# Reading Dataset
SalaryData <- read.csv("placement.csv")
# Preparing model
model <- lm(package ~ cgpa, data = SalaryData)
# Fetching R-square, p value, f statistics ...
summary(model)
png(file = "placement_stats.png")
# Plotting graph
plot(SalaryData$cgpa, SalaryData$package, col = "blue",
main = "CGPA and Package regression",
abline(model), cex = 1.3, pch = 16,
xlab = "CGPA", ylab = "Package (LPA)")
Output:
Call:
lm(formula = package ~ cgpa, data = SalaryData)
Residuals:
Min 1Q Median 3Q Max
-15.517 -9.794 -4.762 2.376 53.174
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -94.141 47.167 -1.996 0.0613 .
cgpa 17.415 6.704 2.598 0.0182 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 16.56 on 18 degrees of freedom
Multiple R-squared: 0.2726, Adjusted R-squared: 0.2322
F-statistic: 6.747 on 1 and 18 DF, p-value: 0.01819
placement_stats.png- Here, first we are importing the dataset (placement.csv) available in csv (Comma Separated Values) format using read.csv function in R and assigning it to a variable SalaryData.
- Next we are using the pre-built Linear Regression model using lm, feeding x, y and our dataset to the function as arguments.
- Next, we are creating a png file to store the graph plotted using png function which takes the file name as the argument.
- Finally, we are using the plot function given by R to plot the graph for the data.
Running this script, will create a file "placement_stats.png" which has the plotted graph for the dataset in the current directory
Changing pch
R
# Reading Dataset
SalaryData <- read.csv("placement.csv")
# Preparing model
model <- lm(package ~ cgpa, data = SalaryData)
# Fetching R-square, p value, f statistics ...
summary(model)
png(file = "placement_stats.png")
# Plotting graph
plot(SalaryData$cgpa, SalaryData$package, col = "blue",
main = "CGPA and Package regression",
abline(model), cex = 1.3, pch = 17, # pch: 16 -> 17
xlab = "CGPA", ylab = "Package (LPA)")
Output:
Call:
lm(formula = package ~ cgpa, data = SalaryData)
Residuals:
Min 1Q Median 3Q Max
-15.517 -9.794 -4.762 2.376 53.174
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -94.141 47.167 -1.996 0.0613 .
cgpa 17.415 6.704 2.598 0.0182 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 16.56 on 18 degrees of freedom
Multiple R-squared: 0.2726, Adjusted R-squared: 0.2322
F-statistic: 6.747 on 1 and 18 DF, p-value: 0.01819
triangular plotting pointsChanging the pch value from 16 to 17 changed the shape of plotting points from solid circles to triangles, while keeping everything same.
Changing col
R
# Reading Dataset
SalaryData <- read.csv("placement.csv")
# Preparing model
model <- lm(package ~ cgpa, data = SalaryData)
png(file = "placement_stats.png")
# Plotting graph
plot(SalaryData$cgpa, SalaryData$package, col = "red", # col: blue -> red
main = "CGPA and Package regression",
abline(model), cex = 1.3, pch = 16,
xlab = "CGPA", ylab = "Package (LPA)")
Output:
red plotting pointsChanging the col from blue to red lead to change in the color of plotting points from blue (initial) to red (new), while keeping everything the same.
Additionally, abline function is used to add the regression line (line of best fit) to the plot.
Removing it will result in a scatterplot.
Removing abline parameter
R
# Reading Dataset
SalaryData <- read.csv("placement.csv")
# Preparing model
model <- lm(package ~ cgpa, data = SalaryData)
png(file = "placement_stats.png")
# Plotting graph
plot(SalaryData$cgpa, SalaryData$package, col = "blue",
main = "CGPA and Package regression",
cex = 1.3, pch = 16, # Removed abline parameter
xlab = "CGPA", ylab = "Package (LPA)")
Output:

When abline parameter is removed from the call of plot function, the line of best fit disappears from the grap, hence making it a scatterplot, while keeping everything the same.
This can also be plotted by using external libraries like ggplot2.
Plotting using ggplot2
First, we need to install the library using following command
install.packages("ggplot2")
- ggplot is a popular data visualization package in R programming.
Then we can use it as usual by importing the library into script.
R
# Importing library
library(ggplot2)
# Reading Dataset
SalaryData <- read.csv("placement.csv")
# Preparing model
model <- lm(package ~ cgpa, data = SalaryData)
png(file = "placement_stats.png")
ggplot(SalaryData, aes(x = cgpa, y = package)) +
geom_point() + # For displaying points
geom_smooth(method = "lm", se = FALSE) + # for displaying line
labs(title = "CGPA and Package regression", x = "CGPA", y = "Y")
Output:

It works in similar manner as previous one, both line of best fit and points are plotted on the graph
Without geom_point (plotting points absent)
R
# Importing library
library(ggplot2)
# Reading Dataset
SalaryData <- read.csv("placement.csv")
# Preparing model
model <- lm(package ~ cgpa, data = SalaryData)
png(file = "placement_stats.png")
ggplot(SalaryData, aes(x = cgpa, y = package)) +
geom_smooth(method = "lm", se = FALSE) + # geom_points removed
labs(title = "CGPA and Package regression", x = "CGPA", y = "Y")
Output:
plotting points absentWhen removing geom_point parameter the plotting points are not displayed on the graph while keeping everything same.
Without geom_smooth (line absent)
R
# Importing library
library(ggplot2)
# Reading Dataset
SalaryData <- read.csv("placement.csv")
# Preparing model
model <- lm(package ~ cgpa, data = SalaryData)
png(file = "placement_stats.png")
ggplot(SalaryData, aes(x = cgpa, y = package)) +
geom_point()+
labs(title = "CGPA and Package regression", x = "CGPA", y = "Y")
Output:
line absentWhen removing geom_smooth parameter the line of best fit is not displayed on the graph while keeping everything same.
Exploring the Relationship Between Hours Studied and Test Scores
R
library(ggplot2)
data <- data.frame(
Hours_Studied = c(2, 3, 4, 5, 6, 7, 8),
Test_Score = c(56, 65, 74, 82, 88, 92, 95)
)
model <- lm(Test_Score ~ Hours_Studied, data = data)
summary(model)
png(file = "score_stats.png")
plot(data$Hours_Studied, data$Test_Score, col = "blue",
main = "Hours Studied vs. Test Score",
xlab = "Hours Studied", ylab = "Test Score",
cex = 1.3, pch = 16,
abline(model))
Output:
Call:
lm(formula = Test_Score ~ Hours_Studied, data = data)
Residuals:
1 2 3 4 5 6 7
-3.03571 -0.64286 1.75000 3.14286 2.53571 -0.07143 -3.67857
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 45.8214 2.9683 15.44 2.07e-05 ***
Hours_Studied 6.6071 0.5512 11.99 7.13e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.917 on 5 degrees of freedom
Multiple R-squared: 0.9664, Adjusted R-squared: 0.9596
F-statistic: 143.7 on 1 and 5 DF, p-value: 7.128e-05
score_stats.pngA graph has been plotted with the number of Hours studied on the X-axis and Score in the Test on the Y-axis along with a line of best fit.
Similar Reads
How to Calculate Log-Linear Regression in R?
Logarithmic regression is a sort of regression that is used to simulate situations in which growth or decay accelerates quickly initially and then slows down over time. The graphic below, for example, shows an example of logarithmic decay: Â The relationship between a predictor variable and a respon
3 min read
Add Regression Line to ggplot2 Plot in R
Regression models a target prediction value based on independent variables. It is mostly used for finding out the relationship between variables and forecasting. Different regression models differ based on â the kind of relationship between dependent and independent variables, they are considering a
4 min read
Non-Linear Regression in R
Non-Linear Regression is a statistical method that is used to model the relationship between a dependent variable and one of the independent variable(s). In non-linear regression, the relationship is modeled using a non-linear equation. This means that the model can capture more complex and non-line
6 min read
How to Plot a Logistic Regression Curve in R?
In this article, we will learn how to plot a Logistic Regression Curve in the R programming Language. Logistic regression is basically a supervised classification algorithm. That helps us in creating a differentiating curve that separates two classes of variables. To Plot the Logistic Regression cur
3 min read
Find the Regression Output in R
In R Programming Language we can Interpret Regression Output by using various functions depending on the type of regression analysis you are conducting. The two most common types of regression analysis are linear regression and logistic regression. Here, I'll provide examples of how to find the regr
11 min read
Simple Linear Regression in R
Regression shows a line or curve that passes through all the data points on the target-predictor graph in such a way that the vertical distance between the data points and the regression line is minimum What is Linear Regression?Linear Regression is a commonly used type of predictive analysis. Linea
12 min read
How to change color of regression line in R ?
A regression line is basically used in statistical models which help to estimate the relationship between a dependent variable and at least one independent variable. In this article, we are going to see how to plot a regression line using ggplot2 in R programming language and different methods to ch
4 min read
How to Extract the Intercept from a Linear Regression Model in R
Linear regression is a method of predictive analysis in machine learning. It is basically used to check two things: If a set of predictor variables (independent) does a good job predicting the outcome variable (dependent).Which of the predictor variables are significant in terms of predicting the ou
4 min read
Simple Linear Regression in Python
Simple linear regression models the relationship between a dependent variable and a single independent variable. In this article, we will explore simple linear regression and it's implementation in Python using libraries such as NumPy, Pandas, and scikit-learn.Understanding Simple Linear RegressionS
7 min read
Significance Test for Linear Regression in R
Linear regression is a statistical method for modeling the relationship between one or more independent variables and a dependent variable. It is frequently used to forecast the value of a dependent variable using the values of one or more independent factors. The lm() function in R can be used to c
5 min read