Generalized additive model in Python
Last Updated :
10 Jun, 2024
Generalized additivemodels Models are a wider and more flexible form of a linear model with nonparametric terms and are simply extensions of generalized linear models. Whereas simple linear models are useful when relationships between two variables are strikingly linear, all of which might not be possible in the real world, generalized additive models are advantageous in that they can simultaneously capture non-linear relationships between two variables. In
Mathematical formula for Generalized Additive Model in Python
The fundamental concept of GAMs lies in the ability of the response variable to be described as an average of components that are smooth functions of the predictors. This is so that each predictor can independently affect the response in a possibly nonlinear manner although the model as a whole remains interpretable because of the form in which the equation is additive.
The aforementioned model can be described with the help of the following formula:
y = \beta_0 + \sum f_i(x_i) + \epsilon, where
β0 is the constant; fi(xi) smooth functions of the predictors; xi and ϵ – the errors.
Why Generalized Additive Models (GAM) are important?
GAMS are important for the following reasons:
- Flexibility in Modeling Non-Linear Relationships:
- GAMs enable the establishment of interactions between predictor variables and the response variable with non-linear associations. This flexibility is achieved through the use of smooth functions like splines, which describe these relationships effectively. This is particularly useful in realistic scenarios involving complex interactions that cannot be captured by simple linear equations.
- Interpretability:
- Despite their flexibility, GAMs maintain the interpretability of additive models derived from linear models. Each predictor is treated as a single variable, allowing for the analysis of its impact while controlling for other predictors. This makes it easier to understand how individual predictors influence the response variable.
- Handling Multivariate Data:
- GAMs excel in situations with numerous potential predictors of various types. This versatility makes them suitable for diverse fields, including natural and social sciences, where relationships between variables can be complex.
- Smoothness Control:
- GAMs incorporate techniques of regularization to control smoothness, reducing the risk of overfitting. By enabling cross-validation, they ensure that the model can predict new, unseen data accurately. Regularization helps manage the degree of model fit concerning smoothness.
- Applications Across Diverse Fields:
- Environmental Science: Describing how environmental factors affect species occurrence.
- Finance: Risk modeling and market trend prediction, especially where non-linear effects are significant.
- Medicine and Biology: Understanding complex biological processes and disease progression.
- Social Sciences: Analyzing socio-economic data with complex, non-linear relationships.
- Robustness:
- GAMs support various response variables, including continuous, binary, and count data. This versatility makes them applicable to a wide range of regression issues.
- User-Friendly Implementation:
- Libraries like
pygam
in Python have made GAMs accessible and easy to deploy. This allows data scientists and statisticians to leverage GAMs without worrying about high computational costs.
What are the Components of a GAM?
The components of a Generalized Additive Model (GAM) include:
- Response Variable (y):
- This is the dependent variable the model aims to predict or explain. In GAMs, the response variable can be continuous, binary, a count variable, or any other type of response for which a specific link function can be specified.
- Predictors (x_i):
- Predictors, or independent variables, are the variables used to make predictions on the response variable. GAMs allow for potential non-linear relationships between each predictor and the response variable.
- Smooth Functions (f_i(x_i)):
- A key feature of GAMs is the use of smoother functions to model the effect of each predictor on the response variable. These functions enable non-linear associations between independent and dependent variables. Common techniques include:
- Splines: Such as cubic splines, B-splines, and thin plate splines.
- Polynomial Functions: Higher-order polynomials that account for non-linear relationships.
- Loess/Local Regression: Non-parametric methods involving curve fitting to subsets of the data.
- Additive Structure:
- GAMs maintain the additive property of linear models, where the impact of predictors is summed to forecast the response variable.
- Error Term (ϵ):
- The error term represents the residual variation in the response variable not explained by the predictors. It is assumed to follow a specific distribution depending on whether the response variable is continuous or discrete (e.g., Normal distribution for continuous variables, Binomial distribution for discrete variables).
- Link Function:
- In cases involving generalized distributions, a link function g maps the mean of the response variable to the linear predictor. This is particularly relevant for Generalized Linear Models and applies to GAMs as well. The relationship is given by:g(E(y)) = \beta_0 + \sum_{i=1}^{p} f_i(x_i)
- Basis Functions and Penalty Terms:
- Smooth functions in GAMs are constructed from basis functions, simpler functions joined together to form a smooth curve. To prevent overfitting, GAMs incorporate penalty terms that regulate the smoothness of these functions, restricting excessive oscillatory behavior and controlling the number of fluctuations.
- Smoothing Parameters:
- Each smooth function is defined by smoothing parameters that determine the degree of smoothness. These parameters are typically selected through methods like cross-validation, balancing the trade-off between bias and variance.
- Implementation and Estimation:
- In GAMs, model fitting involves estimating the coefficients of the basis functions and the smoothers. This process is often carried out using Penalized Iteratively Reweighted Least Squares (PIRLS), which incorporates penalization to ensure appropriate smoothness.
These components collectively enable GAMs to model complex, non-linear relationships in a flexible and interpretable manner, making them a powerful tool for various regression tasks across multiple fields.
Implementation of Generalized additive model in Python
Step 1: Install the pyGAM
library
First, ensure you have the pyGAM
library installed. You can install it using pip:
pip install pygam
Step 2: Import necessary libraries
Import pyGAM
and other required libraries like numpy
and pandas
for data manipulation:
Python
import numpy as np
import pandas as pd
from pygam import LinearGAM, s
Step 3: Prepare your data
Load and prepare your dataset. For demonstration, let's use a sample dataset:
Python
# Example dataset
np.random.seed(0)
X = np.random.rand(100, 1) * 10 # 100 data points, 1 feature
y = np.sin(X).ravel() + np.random.normal(0, 0.5, 100) # target variable with some noise
Step 4: Define and fit the GAM
Create a GAM model using the LinearGAM
class and specify the smoothing term using the s
function. Fit the model to your data:
Python
# Define the GAM model
gam = LinearGAM(s(0))
# Fit the model to the data
gam.fit(X, y)
Step 5: Evaluate the model
Evaluate the model by checking its performance metrics and visualizing the results:
Python
# Predict on new data
X_pred = np.linspace(0, 10, 500).reshape(-1, 1)
y_pred = gam.predict(X_pred)
# Plot the results
import matplotlib.pyplot as plt
plt.figure(figsize=(10, 6))
plt.scatter(X, y, label='Data', alpha=0.5)
plt.plot(X_pred, y_pred, label='GAM Prediction', color='red')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.show()
Output:

Advantages of GAMs
- Flexibility:
- GAMs can capture complex, non-linear relationships between the dependent and independent variables, making them suitable for a wide range of applications.
- Interpretability:
- Unlike many machine learning models, GAMs provide interpretable results. Each predictor's effect can be visualized individually, which helps in understanding the influence of each variable.
- Additivity:
- The additivity of GAMs simplifies the interpretation. Each term in the model can be examined separately, facilitating easier identification of the contribution of each predictor.
- Customizability:
- Different smoothing techniques (like splines) can be used for different predictors, allowing for customized fitting that can improve model performance.
- Handling Non-Linearity:
- GAMs handle non-linear relationships effectively without the need for explicitly specifying the form of the non-linearity, as required in polynomial regression.
- Reduced Overfitting:
- Smoothing functions help in controlling overfitting by regularizing the fitted functions, especially when dealing with noisy data.
Disadvantages of GAMs
- Complexity:
- The flexibility of GAMs can lead to increased model complexity, making them computationally intensive and sometimes difficult to tune.
- Selection of Smoothing Parameters:
- Choosing the appropriate smoothing parameters or basis functions can be challenging and requires expertise. Improper selection can lead to underfitting or overfitting.
- Scalability:
- GAMs may not scale well with very large datasets or with a large number of predictors due to their computational intensity.
- Additive Assumption:
- The assumption of additivity might be restrictive in some cases where interactions between predictors are important. While GAMs can include interaction terms, they are generally more complex to specify and interpret.
- Interpretation of Smoothing Terms:
- While GAMs are interpretable, the smoothing terms themselves can sometimes be difficult to explain, especially to stakeholders not familiar with the methodology.
- Software and Implementation:
- Implementing GAMs requires specialized statistical software and packages (e.g.,
mgcv
in R), which might not be as widely understood or available as more standard linear or logistic regression models.
Conclusion
Generalized Additive Models provide a powerful and flexible approach to modeling non-linear relationships while maintaining interpretability. However, they require careful consideration in terms of model complexity, selection of smoothing parameters, and the additivity assumption. Proper use of GAMs can lead to robust and insightful models, but they may not always be the best choice for every dataset or research question.
Similar Reads
Generalized Additive Models Using R
A versatile and effective statistical modeling method called a generalized additive model (GAM) expands the scope of linear regression to include non-linear interactions between variables. Generalized additive models (GAMs) are very helpful when analyzing complicated data that displays non-linear pa
7 min read
Generalized Linear Models
Prerequisite: Linear RegressionLogistic Regression Generalized Linear Models (GLMs) are a class of regression models that can be used to model a wide range of relationships between a response variable and one or more predictor variables. Unlike traditional linear regression models, which assume a li
6 min read
External Modules in Python
Python is one of the most popular programming languages because of its vast collection of modules which make the work of developers easy and save time from writing the code for a particular task for their program. Python provides various types of modules which include built-in modules and external m
5 min read
Generalized Linear Models Using R
GLMs (Generalized linear models) are a type of statistical model that is extensively used in the analysis of non-normal data, such as count data or binary data. They enable us to describe the connection between one or more predictor variables and a response variable in a flexible manner.Major compon
5 min read
Built-in Modules in Python
Python is one of the most popular programming languages because of its vast collection of modules which make the work of developers easy and save time from writing the code for a particular task for their program. Python provides various types of modules which include Python built-in modules and ext
9 min read
Pandas AI: The Generative AI Python Library
In the age of AI, many of our tasks have been automated especially after the launch of ChatGPT. One such tool that uses the power of ChatGPT to ease data manipulation task in Python is PandasAI. It leverages the power of ChatGPT to generate Python code and executes it. The output of the generated co
9 min read
Tree-Based Models for Classification in Python
Tree-based models are a cornerstone of machine learning, offering powerful and interpretable methods for both classification and regression tasks. This article will cover the most prominent tree-based models used for classification, including Decision Tree Classifier, Random Forest Classifier, Gradi
8 min read
How to create modules in Python 3 ?
Modules are simply python code having functions, classes, variables. Any python file with .py extension can be referenced as a module. Although there are some modules available through the python standard library which are installed through python installation, Other modules can be installed using t
4 min read
Python Module Index
Python has a vast ecosystem of modules and packages. These modules enable developers to perform a wide range of tasks without taking the headache of creating a custom module for them to perform a particular task. Whether we have to perform data analysis, set up a web server, or automate tasks, there
4 min read
Implementing Generalized Least Squares (GLS) in Python
Generalized Least Squares (GLS) is a statistical technique used to estimate the unknown parameters in a linear regression model. It extends the Ordinary Least Squares (OLS) method by addressing situations where the assumptions of OLS are violated, specifically when there is heteroscedasticity or aut
5 min read