How to Calculate Studentized Residuals in Python?
Last Updated :
19 Jan, 2023
Studentized residual is a statistical term and it is defined as the quotient obtained by dividing a residual by its estimated standard deviation. This is a crucial technique used in the detection of outlines. Practically, one can claim that any type of observation in a dataset having a studentized residual of more than 3 (absolute value) is an outlier.
The following Python libraries should already be installed in our system:
You can install these packages on your system by using the below command on the terminal.
pip3 install pandas numpy statsmodels matplotlib
Steps to calculate studentized residuals in Python
Step 1: Import the libraries.
We need to import the libraries in the program that we have installed above.
Python3
import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols
import matplotlib.pyplot as plt
|
Step 2: Create a data frame.
Firstly, we are required to create a data frame. With the help of the pandas’ package, we can create a data frame. The snippet is given below,
Python3
dataframe = pd.DataFrame({ 'Score' : [ 80 , 95 , 80 , 78 , 84 ,
96 , 86 , 75 , 97 , 89 ],
'Benchmark' : [ 27 , 28 , 18 , 18 , 29 , 30 ,
25 , 25 , 24 , 29 ]})
|
Step 3: Build a simple linear regression model.
Now we need to build a simple linear regression model of the created dataset. For fitting a simple linear regression model Python provides ols() function from statsmodels package.
Syntax:
statsmodels.api.OLS(y, x)
Parameters:
- y : It represents the variable that depends on x
- x :It represents independent variable
Example:
Python3
simple_regression_model = ols( 'Score ~ Benchmark' , data = dataframe).fit()
|
Step 4: Producing studentized residual.
For producing a dataFrame that would contain the studentized residuals of each observation in the dataset we can use outlier_test() function.
Syntax:
simple_regression_model.outlier_test()
This function will produce a dataFrame that would contain the studentized residuals for each observation in the dataset
Python3
stud_res = simple_regression_model.outlier_test()
|
Below is the complete implementation.
Python3
import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols
import matplotlib.pyplot as plt
dataframe = pd.DataFrame({ 'Score' : [ 80 , 95 , 80 , 78 , 84 ,
96 , 86 , 75 , 97 , 89 ],
'Benchmark' : [ 27 , 28 , 18 , 18 , 29 , 30 ,
25 , 25 , 24 , 29 ]})
simple_regression_model = ols( 'Score ~ Benchmark' , data = dataframe).fit()
result = simple_regression_model.outlier_test()
print (result)
|
Output:

The output is a data frame that contains:
- The studentized residual
- The unadjusted p-value of the studentized residual
- The Bonferroni-corrected p-value of the studentized residual
We can see that the studentized residual for the first observation in the dataset is -1.121201, the studentized residual for the second observation is 0.954871, and so on.
Visualization:
Now let us go into the visualization of the studentized residual. With the help of matplotlib we can make a plot of the predictor variable values VS the corresponding studentized residuals.
Example:
Python3
import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols
import matplotlib.pyplot as plt
dataframe = pd.DataFrame({ 'Score' : [ 80 , 95 , 80 , 78 , 84 ,
96 , 86 , 75 , 97 , 89 ],
'Benchmark' : [ 27 , 28 , 18 , 18 , 29 , 30 ,
25 , 25 , 24 , 29 ]})
simple_regression_model = ols( 'Score ~ Benchmark' , data = dataframe).fit()
result = simple_regression_model.outlier_test()
x = dataframe[ 'Score' ]
y = result[ 'student_resid' ]
plt.scatter(x, y)
plt.axhline(y = 0 , color = 'black' , linestyle = '--' )
plt.xlabel( 'Points' )
plt.ylabel( 'Studentized Residuals' )
plt.savefig( "Plot.png" )
|
Output:

Plot.png:

Similar Reads
How to Calculate Residual Sum of Squares in Python
The residual sum of squares (RSS) calculates the degree of variance in a regression model. It estimates the level of error in the model's prediction. The smaller the residual sum of squares, the better your model fits your data; the larger the residual sum of squares, the worse. It is the sum of squ
2 min read
How to Calculate SMAPE in Python?
In this article, we will see how to compute one of the methods to determine forecast accuracy called the Symmetric Mean Absolute Percentage Error (or simply SMAPE) in Python. The SMAPE is one of the alternatives to overcome the limitations with MAPE forecast error measurement. In contrast to the mea
3 min read
How to Create a Residual Plot in Python
A residual plot is a graph in which the residuals are displayed on the y axis and the independent variable is displayed on the x-axis. A linear regression model is appropriate for the data if the dots in a residual plot are randomly distributed across the horizontal axis. Let's see how to create a r
5 min read
How to Create a Residual Plot in R
In this article, we will be looking at a step-wise procedure to create a residual plot in the R programming language. Residual plots are often used to assess whether or not the residuals in regression analysis are normally distributed and whether or not they exhibit heteroscedasticity. Let's create
2 min read
How to Calculate MAPE in Python?
In this article, we will see how to compute one of the methods to determine forecast accuracy called the Mean. Absolute Percentage Error (or simply MAPE) also known as Mean Absolute Percentage Deviation (MAPD) in python. The MAPE term determines how better accuracy does our forecast gives. The 'M' i
4 min read
How to Calculate Skewness and Kurtosis in Python?
Skewness is a statistical term and it is a way to estimate or measure the shape of a distribution. It is an important statistical methodology that is used to estimate the asymmetrical behavior rather than computing frequency distribution. Skewness can be two types: Symmetrical: A distribution can be
3 min read
How to Calculate F1 Score in R?
In this article, we will be looking at the approach to calculate F1 Score using the various packages and their various functionalities in the R language. F1 Score The F-score or F-measure is a measure of a test's accuracy. It is calculated from the precision and recall of the test, where the precisi
5 min read
How to Return the Fit Error in Python curve_fit
The curve fitting method is used in statistics to estimate the output for the best-fit curvy line of a set of data values. Curve fitting is a powerful tool in data analysis that allows us to model the relationship between variables. In Python, the scipy.optimize.curve_fit function is widely used for
5 min read
Python - Studentâs t Distribution in Statistics
We know the mathematics behind t-distribution. However, we can also use Python to implement t-distribution on a dataset. Python provides a unique package scipy for various statical techniques and methods. We will use this package for t-distribution implementation. prerequisite: t-distribution What i
5 min read
How to Calculate Mean Absolute Error in Python?
When building machine learning models, our aim is to make predictions as accurately as possible. However, not all models are perfect some predictions will surely deviate from the actual values. To evaluate how well a model performs, we rely on error metrics. One widely used metric for measuring pred
4 min read