Implementing Generalized Least Squares (GLS) in Python

Last Updated : 23 Jul, 2025

Generalized Least Squares (GLS) is a statistical technique used to estimate the unknown parameters in a linear regression model. It extends the Ordinary Least Squares (OLS) method by addressing situations where the assumptions of OLS are violated, specifically when there is heteroscedasticity or autocorrelation in the error terms. GLS is crucial in providing more accurate and reliable estimates in such scenarios. This article provides a detailed exploration of the GLS method, its underlying principles, and its implementation in Python.

Table of Content

What is Generalized Least Squares (GLS)?
Implementing GLS in Python

Example 1: Implementing GLS with statsmodels
Example 2: Implementing Feasible Generalized Least Squares (FGLS)

Evaluating GLS Models

What is Generalized Least Squares (GLS)?

Generalized Least Squares (GLS) is an extension of the Ordinary Least Squares (OLS) regression method used to estimate the unknown parameters in a linear regression model. GLS is particularly useful when the assumptions of OLS, such as homoscedasticity (constant variance of errors) and absence of autocorrelation (independence of errors), are violated. By addressing heteroscedasticity and autocorrelation in the error terms, GLS provides more accurate and reliable parameter estimates.

Why Use GLS?

In real-world data analysis, the assumptions of OLS are often violated. When this happens, OLS estimates become inefficient and biased.
GLS corrects for these issues by transforming the data to meet the OLS assumptions, ensuring that the estimates remain the Best Linear Unbiased Estimators (BLUE).

Implementing GLS in Python

Python offers several libraries for implementing GLS, with statsmodels being one of the most popular. statsmodels provides comprehensive tools for statistical modeling, including GLS.

To install statsmodels, you can use pip:

pip install statsmodels

Example 1: Implementing GLS with `statsmodels`

Let's walk through an example using the Longley dataset, a classic dataset in econometrics.

1. Loading the Data

Python

import numpy as np
import statsmodels.api as sm

np.random.seed(0)
n = 100
X = np.random.rand(n, 2)
X = sm.add_constant(X)  # Adds a constant term to the predictor

# Generate heteroscedastic and autocorrelated errors
errors = np.random.randn(n)
for i in range(1, n):
    errors[i] += 0.5 * errors[i-1]

beta = np.array([1, 0.5, -0.3])
y = X @ beta + errors

2. Fitting an OLS Model

First, we fit an OLS model to obtain the residuals.

Python

ols_model = sm.OLS(y, X).fit()
print(ols_model.summary())

Output:

OLS Regression Results:
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.003
Model:                            OLS   Adj. R-squared:                 -0.018
Method:                 Least Squares   F-statistic:                    0.1307
Date:                Wed, 17 Jul 2024   Prob (F-statistic):              0.878
Time:                        11:30:25   Log-Likelihood:                -147.01
No. Observations:                 100   AIC:                             300.0
Df Residuals:                      97   BIC:                             307.8
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          1.0183      0.296      3.435      0.001       0.430       1.607
x1            -0.0074      0.380     -0.019      0.985      -0.762       0.748
x2            -0.1917      0.375     -0.511      0.610      -0.936       0.553
==============================================================================
Omnibus:                        3.211   Durbin-Watson:                   1.053
Prob(Omnibus):                  0.201   Jarque-Bera (JB):                2.550
Skew:                           0.339   Prob(JB):                        0.279
Kurtosis:                       3.389   Cond. No.                         5.58
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

3. Estimating the Covariance Matrix

Assume the error terms follow an AR(1) process. We estimate the autocorrelation parameter 𝜌ρ by regressing the residuals on their lagged values.

Python

residuals = ols_model.resid
cov_matrix = np.diag(residuals**2)

4. Fitting the GLS Model

Finally, we fit the GLS model using the constructed covariance matrix.

Python

gls_model = sm.GLS(y, X, sigma=cov_matrix).fit()
print(gls_model.summary())

Output:

GLS Regression Results:
                            GLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.120
Model:                            GLS   Adj. R-squared:                  0.102
Method:                 Least Squares   F-statistic:                     6.633
Date:                Wed, 17 Jul 2024   Prob (F-statistic):            0.00200
Time:                        11:30:25   Log-Likelihood:                -81.748
No. Observations:                 100   AIC:                             169.5
Df Residuals:                      97   BIC:                             177.3
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          1.0210      0.035     29.584      0.000       0.952       1.089
x1            -0.0338      0.054     -0.628      0.531      -0.141       0.073
x2            -0.1543      0.044     -3.482      0.001      -0.242      -0.066
==============================================================================
Omnibus:                      866.058   Durbin-Watson:                   1.353
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               16.264
Skew:                           0.200   Prob(JB):                     0.000294
Kurtosis:                       1.065   Cond. No.                         8.08
==============================================================================
Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Example 2: Implementing Feasible Generalized Least Squares (FGLS)

In many practical situations, the exact form of the covariance matrix \Omega is unknown. Feasible Generalized Least Squares (FGLS) is an iterative method that estimates \Omega from the data and refines the GLS estimates.

Steps for FGLS:

Fit an initial OLS model and obtain residuals.
Estimate the covariance matrix \Omega from the residuals.
Fit a GLS model using the estimated \Omega.
Iterate the process to refine the estimates.

Python

import numpy as np
import statsmodels.api as sm
from scipy.linalg import toeplitz

# Step 1: Generate sample data or use a dataset
np.random.seed(0)
n = 100
X = np.random.rand(n, 2)
X = sm.add_constant(X)  # Adds a constant term to the predictor

# Generate heteroscedastic and autocorrelated errors
errors = np.random.randn(n)
for i in range(1, n):
    errors[i] += 0.5 * errors[i-1]

beta = np.array([1, 0.5, -0.3])
y = X @ beta + errors

# Fit initial OLS model
ols_model = sm.OLS(y, X).fit()
ols_resid = ols_model.resid

# Step 2: Estimate covariance matrix
resid_fit = sm.OLS(ols_resid[1:], sm.add_constant(ols_resid[:-1])).fit()
rho = resid_fit.params[1]
sigma = rho**toeplitz(np.arange(len(ols_resid)))

# Step 3: Fit GLS model using estimated covariance matrix
gls_model = sm.GLS(y, X, sigma=sigma)
gls_results = gls_model.fit()

# Step 4: Iterate to refine estimates
for _ in range(5):
    ols_resid = gls_results.resid
    resid_fit = sm.OLS(ols_resid[1:], sm.add_constant(ols_resid[:-1])).fit()
    rho = resid_fit.params[1]
    sigma = rho**toeplitz(np.arange(len(ols_resid)))
    gls_model = sm.GLS(y, X, sigma=sigma)
    gls_results = gls_model.fit()

print(gls_results.summary())

Output:

                            GLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.022
Model:                            GLS   Adj. R-squared:                  0.001
Method:                 Least Squares   F-statistic:                     1.073
Date:                Wed, 17 Jul 2024   Prob (F-statistic):              0.346
Time:                        11:35:00   Log-Likelihood:                -134.18
No. Observations:                 100   AIC:                             274.4
Df Residuals:                      97   BIC:                             282.2
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.9643      0.300      3.212      0.002       0.368       1.560
x1             0.2923      0.313      0.935      0.352      -0.328       0.913
x2            -0.3527      0.340     -1.038      0.302      -1.027       0.322
==============================================================================
Omnibus:                        0.332   Durbin-Watson:                   1.982
Prob(Omnibus):                  0.847   Jarque-Bera (JB):                0.478
Skew:                           0.114   Prob(JB):                        0.787
Kurtosis:                       2.750   Cond. No.                         3.05
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Evaluating GLS Models

Evaluating the performance of GLS models involves several diagnostic checks:

R-squared and Adjusted R-squared: Measure the proportion of variance explained by the model.
F-statistic: Tests the overall significance of the model.
Residual Analysis: Check for patterns in the residuals to ensure that the assumptions of homoscedasticity and no autocorrelation are met.

Conclusion

Generalized Least Squares (GLS) is a powerful tool for regression analysis, especially when dealing with heteroscedasticity and autocorrelation in the error terms. By transforming the data to meet the OLS assumptions, GLS provides more accurate and reliable parameter estimates.

frisbevhwy

Improve

Article Tags :

Implementing Generalized Least Squares (GLS) in Python

What is Generalized Least Squares (GLS)?

Why Use GLS?

Implementing GLS in Python

Example 1: Implementing GLS with statsmodels

Example 2: Implementing Feasible Generalized Least Squares (FGLS)

Evaluating GLS Models

Conclusion

Explore

Machine Learning Basics

Python for Machine Learning

Feature Engineering

Supervised Learning

Unsupervised Learning

Model Evaluation and Tuning

Advanced Techniques

Machine Learning Practice

Thank You!

What kind of Experience do you want to share?

Example 1: Implementing GLS with `statsmodels`