Implementing Generalized Least Squares (GLS) in Python
Last Updated :
17 Jul, 2024
Generalized Least Squares (GLS) is a statistical technique used to estimate the unknown parameters in a linear regression model. It extends the Ordinary Least Squares (OLS) method by addressing situations where the assumptions of OLS are violated, specifically when there is heteroscedasticity or autocorrelation in the error terms. GLS is crucial in providing more accurate and reliable estimates in such scenarios. This article provides a detailed exploration of the GLS method, its underlying principles, and its implementation in Python.
What is Generalized Least Squares (GLS)?
Generalized Least Squares (GLS) is an extension of the Ordinary Least Squares (OLS) regression method used to estimate the unknown parameters in a linear regression model. GLS is particularly useful when the assumptions of OLS, such as homoscedasticity (constant variance of errors) and absence of autocorrelation (independence of errors), are violated. By addressing heteroscedasticity and autocorrelation in the error terms, GLS provides more accurate and reliable parameter estimates.
Why Use GLS?
- In real-world data analysis, the assumptions of OLS are often violated. When this happens, OLS estimates become inefficient and biased.
- GLS corrects for these issues by transforming the data to meet the OLS assumptions, ensuring that the estimates remain the Best Linear Unbiased Estimators (BLUE).
Implementing GLS in Python
Python offers several libraries for implementing GLS, with statsmodels
being one of the most popular. statsmodels
provides comprehensive tools for statistical modeling, including GLS.
To install statsmodels
, you can use pip:
pip install statsmodels
Example 1: Implementing GLS with statsmodels
Let's walk through an example using the Longley dataset, a classic dataset in econometrics.
1. Loading the Data
Python
import numpy as np
import statsmodels.api as sm
np.random.seed(0)
n = 100
X = np.random.rand(n, 2)
X = sm.add_constant(X) # Adds a constant term to the predictor
# Generate heteroscedastic and autocorrelated errors
errors = np.random.randn(n)
for i in range(1, n):
errors[i] += 0.5 * errors[i-1]
beta = np.array([1, 0.5, -0.3])
y = X @ beta + errors
2. Fitting an OLS Model
First, we fit an OLS model to obtain the residuals.
Python
ols_model = sm.OLS(y, X).fit()
print(ols_model.summary())
Output:
OLS Regression Results:
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.003
Model: OLS Adj. R-squared: -0.018
Method: Least Squares F-statistic: 0.1307
Date: Wed, 17 Jul 2024 Prob (F-statistic): 0.878
Time: 11:30:25 Log-Likelihood: -147.01
No. Observations: 100 AIC: 300.0
Df Residuals: 97 BIC: 307.8
Df Model: 2
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 1.0183 0.296 3.435 0.001 0.430 1.607
x1 -0.0074 0.380 -0.019 0.985 -0.762 0.748
x2 -0.1917 0.375 -0.511 0.610 -0.936 0.553
==============================================================================
Omnibus: 3.211 Durbin-Watson: 1.053
Prob(Omnibus): 0.201 Jarque-Bera (JB): 2.550
Skew: 0.339 Prob(JB): 0.279
Kurtosis: 3.389 Cond. No. 5.58
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
3. Estimating the Covariance Matrix
Assume the error terms follow an AR(1) process. We estimate the autocorrelation parameter 𝜌ρ by regressing the residuals on their lagged values.
Python
residuals = ols_model.resid
cov_matrix = np.diag(residuals**2)
4. Fitting the GLS Model
Finally, we fit the GLS model using the constructed covariance matrix.
Python
gls_model = sm.GLS(y, X, sigma=cov_matrix).fit()
print(gls_model.summary())
Output:
GLS Regression Results:
GLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.120
Model: GLS Adj. R-squared: 0.102
Method: Least Squares F-statistic: 6.633
Date: Wed, 17 Jul 2024 Prob (F-statistic): 0.00200
Time: 11:30:25 Log-Likelihood: -81.748
No. Observations: 100 AIC: 169.5
Df Residuals: 97 BIC: 177.3
Df Model: 2
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 1.0210 0.035 29.584 0.000 0.952 1.089
x1 -0.0338 0.054 -0.628 0.531 -0.141 0.073
x2 -0.1543 0.044 -3.482 0.001 -0.242 -0.066
==============================================================================
Omnibus: 866.058 Durbin-Watson: 1.353
Prob(Omnibus): 0.000 Jarque-Bera (JB): 16.264
Skew: 0.200 Prob(JB): 0.000294
Kurtosis: 1.065 Cond. No. 8.08
==============================================================================
Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Example 2: Implementing Feasible Generalized Least Squares (FGLS)
In many practical situations, the exact form of the covariance matrix \Omega is unknown. Feasible Generalized Least Squares (FGLS) is an iterative method that estimates \Omega from the data and refines the GLS estimates.
Steps for FGLS:
- Fit an initial OLS model and obtain residuals.
- Estimate the covariance matrix \Omega from the residuals.
- Fit a GLS model using the estimated \Omega.
- Iterate the process to refine the estimates.
Python
import numpy as np
import statsmodels.api as sm
from scipy.linalg import toeplitz
# Step 1: Generate sample data or use a dataset
np.random.seed(0)
n = 100
X = np.random.rand(n, 2)
X = sm.add_constant(X) # Adds a constant term to the predictor
# Generate heteroscedastic and autocorrelated errors
errors = np.random.randn(n)
for i in range(1, n):
errors[i] += 0.5 * errors[i-1]
beta = np.array([1, 0.5, -0.3])
y = X @ beta + errors
# Fit initial OLS model
ols_model = sm.OLS(y, X).fit()
ols_resid = ols_model.resid
# Step 2: Estimate covariance matrix
resid_fit = sm.OLS(ols_resid[1:], sm.add_constant(ols_resid[:-1])).fit()
rho = resid_fit.params[1]
sigma = rho**toeplitz(np.arange(len(ols_resid)))
# Step 3: Fit GLS model using estimated covariance matrix
gls_model = sm.GLS(y, X, sigma=sigma)
gls_results = gls_model.fit()
# Step 4: Iterate to refine estimates
for _ in range(5):
ols_resid = gls_results.resid
resid_fit = sm.OLS(ols_resid[1:], sm.add_constant(ols_resid[:-1])).fit()
rho = resid_fit.params[1]
sigma = rho**toeplitz(np.arange(len(ols_resid)))
gls_model = sm.GLS(y, X, sigma=sigma)
gls_results = gls_model.fit()
print(gls_results.summary())
Output:
GLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.022
Model: GLS Adj. R-squared: 0.001
Method: Least Squares F-statistic: 1.073
Date: Wed, 17 Jul 2024 Prob (F-statistic): 0.346
Time: 11:35:00 Log-Likelihood: -134.18
No. Observations: 100 AIC: 274.4
Df Residuals: 97 BIC: 282.2
Df Model: 2
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 0.9643 0.300 3.212 0.002 0.368 1.560
x1 0.2923 0.313 0.935 0.352 -0.328 0.913
x2 -0.3527 0.340 -1.038 0.302 -1.027 0.322
==============================================================================
Omnibus: 0.332 Durbin-Watson: 1.982
Prob(Omnibus): 0.847 Jarque-Bera (JB): 0.478
Skew: 0.114 Prob(JB): 0.787
Kurtosis: 2.750 Cond. No. 3.05
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Evaluating GLS Models
Evaluating the performance of GLS models involves several diagnostic checks:
Conclusion
Generalized Least Squares (GLS) is a powerful tool for regression analysis, especially when dealing with heteroscedasticity and autocorrelation in the error terms. By transforming the data to meet the OLS assumptions, GLS provides more accurate and reliable parameter estimates.
Similar Reads
How To Implement Weighted Mean Square Error in Python?
In this article, we discussed the implementation of weighted mean square error using python.Mean squared error is a vital statistical concept, that is nowadays widely used in Machine learning and Deep learning algorithm. Mean squared error is basically a measure of the average squared difference bet
2 min read
Implementing SVM from Scratch in Python
Support Vector Machines (SVMs) is a supervised machine learning algorithms used for classification and regression tasks. They work by finding the optimal hyperplane that separates data points of different classes with the maximum margin. We can use Scikit library of python to implement SVM but in th
3 min read
Linear Regression (Python Implementation)
Linear regression is a statistical method that is used to predict a continuous dependent variable i.e target variable based on one or more independent variables. This technique assumes a linear relationship between the dependent and independent variables which means the dependent variable changes pr
14 min read
Implementing PCA in Python with scikit-learn
Principal Component Analysis (PCA) is a dimensionality reduction technique. It transform high-dimensional data into a smaller number of dimensions called principal components and keeps important information in the data. In this article, we will learn about how we implement PCA in Python using scikit
3 min read
Get the Least squares fit of Hermite series to data in Python
In this article, we will discuss how to find the Least-squares fit of the Hermite series to data in Python and NumPy. NumPy.polynomials.hermite.hermfit method The Hermite series is an orthogonal polynomial sequence that has its applications in physics, wave theory, numerical analysis, and signal pro
4 min read
Generate a Legendre series with given roots in Python
In this article, we will see how to generate a Legendre series with given roots in Python. Legendre classIn python, the Legendre module provides many functions like legfromroots to perform arithmetic, and calculus operations on the Legendre series. It is one of the functions provided by the Legendre
2 min read
How to implement linear interpolation in Python?
Linear Interpolation is the technique of determining the values of the functions of any intermediate points when the values of two adjacent points are known. Linear interpolation is basically the estimation of an unknown value that falls within two known values. Linear Interpolation is used in vario
4 min read
Get the Least squares fit of Chebyshev series to data in Python-NumPy
In this article, we will cover how to get the Least-squares fit of the Chebyshev series to data in Python. chebyshev.chebfit method The NumPy library provides us numpy.polynomial.chebyshev.chebfit() method to get the Least-squares fit of the Chebyshev series to data in python. The method returns the
3 min read
ML | Implementing L1 and L2 regularization using Sklearn
Prerequisites: L2 and L1 regularizationThis article aims to implement the L2 and L1 regularization for Linear regression using the Ridge and Lasso modules of the Sklearn library of Python. Dataset - House prices dataset.Step 1: Importing the required libraries Python3 import pandas as pd import num
3 min read
Linear Regression Implementation From Scratch using Python
Linear Regression is a supervised learning algorithm which is both a statistical and a machine learning algorithm. It is used to predict the real-valued output y based on the given input value x. It depicts the relationship between the dependent variable y and the independent variables xi ( or featu
4 min read