How to Create a Residual Plot in Python
Last Updated :
31 May, 2025
A residual plot is a graph in which the residuals are displayed on the y axis and the independent variable is displayed on the x-axis. A linear regression model is appropriate for the data if the dots in a residual plot are randomly distributed across the horizontal axis. Let's see how to create a residual plot in python.
Using seaborn.residplot()
Seaborn's residplot() draws a scatter plot showing how far the predictions are from the actual values. If the points are randomly spread around the horizontal line at 0, that means your model is likely doing a good job. It’s great for quick checks and looks visually nice too.
Python
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
data = pd.read_csv('headbrain3.csv')
sns.residplot(x='Head_size', y='Brain_weight', data=data)
plt.xlabel("Head Size")
plt.ylabel("Residuals")
plt.show()
Output
Residual Plot using seaborn.residplotExplanation:
- sns.residplot(...) plots residuals from a linear regression of Brain_weight on Head_size to assess model fit.
- plt.xlabel(...) / plt.ylabel(...) label the x-axis as "Head Size" and y-axis as "Residuals" for clarity.
Using plot_regress_exog()
This method is like a full report card for your regression model. It gives you four plots in one figure, showing the fitted line, residuals and how your model behaves with your input variable. It’s very helpful when you want to deeply understand how well your model is doing. You also get a detailed summary of the regression, including p-values and R-squared, which helps you judge the model's performance statistically.
CSV Used: headbrain3
Python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.formula.api import ols
df = pd.read_csv('headbrain3.csv')
lm = ols('Brain_weight ~ Head_size', data=df).fit() # model
print(lm.summary()) # summary
fig = plt.figure(figsize=(14, 8)) # figure
fig = sm.graphics.plot_regress_exog(lm, 'Head_size', fig=fig)
Output


Explanation:
- ols(...) fits a linear regression model with Brain_weight as the response and Head_size as the predictor.
- lm.summary() outputs statistical details like coefficients, R-squared and p-values of the fitted model.
- plt.figure(...) & plot_regress_exog(...) sets figure size and plots regression diagnostics (residuals, leverage, etc.) for Head_size.
Using Residuals
Here, you manually calculate residuals by subtracting predicted values from actual ones. Then you plot them to see how well the model is performing. This method gives you more control and helps you learn how residuals work under the hood. It’s good if you want to get hands-on and understand what’s happening behind the scenes in regression.
Python
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
df = pd.read_csv('headbrain3.csv')
X = df[['Head_size']]
y = df['Brain_weight']
lr = LinearRegression()
lr.fit(X, y)
y_pred = lr.predict(X)
res = y - y_pred
plt.scatter(X, res)
plt.axhline(y=0, color='r', linestyle='--')
plt.xlabel('Head Size')
plt.ylabel('Residuals')
plt.show()
Output
Using residualsExplanation:
- LinearRegression() and lr.fit(X, y) create and train a linear regression model to predict Brain_weight from Head_size.
- lr.predict(X) and res = y - y_pred predict values and calculate residuals (errors between actual and predicted).
Using sklearn with statsmodels.api.add_constant()
Sometimes, to match how statistical models work, we need to explicitly add a constant to our data. This method does that using add_constant() and then fits the model using sklearn. It helps when you're switching between sklearn and statsmodels, ensuring that both include the intercept properly.
Python
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
from sklearn.linear_model import LinearRegression
df = pd.read_csv('headbrain3.csv')
X = df[['Head_size']]
y = df['Brain_weight']
X_c = sm.add_constant(X) # Add constant for intercept
lr = LinearRegression()
lr.fit(X_c, y)
y_pred = lr.predict(X_c)
res = y - y_pred
plt.scatter(X['Head_size'], res)
plt.axhline(y=0, color='r', linestyle='--')
plt.xlabel("Head Size")
plt.ylabel("Residuals")
plt.show()
Output
Using sklearn with statsmodels.api.add_constant()Explanation:
- LinearRegression() & lr.fit(...) initializes and fits a linear regression model to predict Brain_weight using Head_size.
- y_pred & res computes predicted brain weights and calculates residuals (actual - predicted).
- plt.scatter(...) & plt.axhline(...) plots residuals vs Head_size with a horizontal reference line at 0 to assess model fit.
Related articles