Open In App

How to Create a Residual Plot in Python

Last Updated : 31 May, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

A residual plot is a graph in which the residuals are displayed on the y axis and the independent variable is displayed on the x-axis. A linear regression model is appropriate for the data if the dots in a residual plot are randomly distributed across the horizontal axis. Let's see how to create a residual plot in python.

Using seaborn.residplot()

Seaborn's residplot() draws a scatter plot showing how far the predictions are from the actual values. If the points are randomly spread around the horizontal line at 0, that means your model is likely doing a good job. It’s great for quick checks and looks visually nice too.

Python
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
data = pd.read_csv('headbrain3.csv')

sns.residplot(x='Head_size', y='Brain_weight', data=data)
plt.xlabel("Head Size")
plt.ylabel("Residuals")
plt.show()

Output

Output
Residual Plot using seaborn.residplot

Explanation:

  • sns.residplot(...) plots residuals from a linear regression of Brain_weight on Head_size to assess model fit.
  • plt.xlabel(...) / plt.ylabel(...) label the x-axis as "Head Size" and y-axis as "Residuals" for clarity.

Using plot_regress_exog()

This method is like a full report card for your regression model. It gives you four plots in one figure, showing the fitted line, residuals and how your model behaves with your input variable. It’s very helpful when you want to deeply understand how well your model is doing. You also get a detailed summary of the regression, including p-values and R-squared, which helps you judge the model's performance statistically.

CSV Used: headbrain3

Python
import numpy as np  
import pandas as pd 
import matplotlib.pyplot as plt  
import statsmodels.api as sm  
from statsmodels.formula.api import ols 
df = pd.read_csv('headbrain3.csv')  

lm = ols('Brain_weight ~ Head_size', data=df).fit()  # model
print(lm.summary())  # summary
fig = plt.figure(figsize=(14, 8))  # figure
fig = sm.graphics.plot_regress_exog(lm, 'Head_size', fig=fig)  

Output

Explanation:

  • ols(...) fits a linear regression model with Brain_weight as the response and Head_size as the predictor.
  • lm.summary() outputs statistical details like coefficients, R-squared and p-values of the fitted model.
  • plt.figure(...) & plot_regress_exog(...) sets figure size and plots regression diagnostics (residuals, leverage, etc.) for Head_size.

Using Residuals

Here, you manually calculate residuals by subtracting predicted values from actual ones. Then you plot them to see how well the model is performing. This method gives you more control and helps you learn how residuals work under the hood. It’s good if you want to get hands-on and understand what’s happening behind the scenes in regression.

Python
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

df = pd.read_csv('headbrain3.csv')
X = df[['Head_size']]
y = df['Brain_weight']

lr = LinearRegression()
lr.fit(X, y)

y_pred = lr.predict(X)
res = y - y_pred

plt.scatter(X, res)
plt.axhline(y=0, color='r', linestyle='--')
plt.xlabel('Head Size')
plt.ylabel('Residuals')
plt.show()

Output

Output
Using residuals

Explanation:

  • LinearRegression() and lr.fit(X, y) create and train a linear regression model to predict Brain_weight from Head_size.
  • lr.predict(X) and res = y - y_pred predict values and calculate residuals (errors between actual and predicted).

Using sklearn with statsmodels.api.add_constant()

Sometimes, to match how statistical models work, we need to explicitly add a constant to our data. This method does that using add_constant() and then fits the model using sklearn. It helps when you're switching between sklearn and statsmodels, ensuring that both include the intercept properly.

Python
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
from sklearn.linear_model import LinearRegression

df = pd.read_csv('headbrain3.csv')
X = df[['Head_size']]
y = df['Brain_weight']

X_c = sm.add_constant(X)  # Add constant for intercept

lr = LinearRegression()
lr.fit(X_c, y)

y_pred = lr.predict(X_c)
res = y - y_pred

plt.scatter(X['Head_size'], res)
plt.axhline(y=0, color='r', linestyle='--')
plt.xlabel("Head Size")
plt.ylabel("Residuals")
plt.show()

Output

Output
Using sklearn with statsmodels.api.add_constant()

Explanation:

  • LinearRegression() & lr.fit(...) initializes and fits a linear regression model to predict Brain_weight using Head_size.
  • y_pred & res computes predicted brain weights and calculates residuals (actual - predicted).
  • plt.scatter(...) & plt.axhline(...) plots residuals vs Head_size with a horizontal reference line at 0 to assess model fit.

Next Article

Similar Reads