Generalized Linear Models Using R
Last Updated :
02 May, 2025
GLMs (Generalized linear models) are a type of statistical model that is extensively used in the analysis of non-normal data, such as count data or binary data. They enable us to describe the connection between one or more predictor variables and a response variable in a flexible manner.
Major components of GLMs
- A probability distribution for the response variable
- A linear predictor function of the predictor variables
- A link function that connects the linear predictor to the response variable's mean.
The probability distribution and link function used is determined by the type of response variable and the research topic at hand. R includes methods for fitting GLMs, such as the glm() function. The user can specify the formula for the model, which contains the response variable and one or more predictor variables, as well as the probability distribution and link function to be used, using this function.
In Generalized Linear Models (GLMs), the response variable Yis assumed to follow a distribution from the exponential family. The model relates the expected value of Y, denoted \mu , to the predictors X via a link function:
g(\mu) = X\beta
Here, \beta is the vector of model coefficients and g(\cdot) is a specified link function. The variance of Y is given by:
\text{Var}(Y) = \phi V(\mu)
where V(\mu) is the variance function and \phi is a dispersion parameter.
Classical Linear Regression as a Special Case
In linear regression, Y = X\beta + \epsilon , with \epsilon \sim N(0, \sigma^2), is a special case where:
- g(\mu) = \mu (identity link)
- V(\mu) = 1
- \phi = \sigma^2
Estimation
Model parameters \beta are estimated via maximum likelihood. For observations (x_i, y_i), the likelihood is:
L(\beta) = \prod_{i=1}^n f(y_i \mid \mu_i)
where f(\cdot) is the density function of the assumed distribution and \mu_i is the expected value of Y_i given x_i.
GLM model families
There are several GLM model families depending on the make-up of the response variable. These includes three well-known GLM model families:
- Binomial: The binomial family is used for binary response variables (i.e., two categories) and assumes a binomial distribution.
R
model <- glm(binary_response_variable ~ predictor_variable1 + predictor_variable2,
family = binomial(link = "logit"), data = data)
- Gaussian: This family is used for continuous response variables and assumes a normal distribution. The link function for this family is typically the identity function.
R
model <- glm(response_variable ~ predictor_variable1 + predictor_variable2,
family = gaussian(link = "identity"), data = data)
- Gamma: The gamma family is used for continuous response variables that are strictly positive and have a skewed distribution.
R
model <- glm(positive_response_variable ~ predictor_variable1 + predictor_variable2,
family = gamma(link = "inverse"), data = data)
- Quasibinomial: When a response variable is binary but has a higher variance than would be predicted by a binomial distribution, the quasibinomial model is utilized. This could happen if the response variable has excessive dispersion or additional variation that the model is not taking into account.
R
model <- glm(response_variable ~ predictor_variable1 + predictor_variable2,
family = quasibinomial(), data = data)
Building a Generalized Linear Model
1. Loading the Dataset
We will use the "mtcars" dataset in R to illustrate the use of generalized linear models. This dataset includes data on different car models, including mpg, horsepower (hp) and weight. (wt). The response variable will be "mpg," and the predictor factors will be "hp" and "wt."
R
data(mtcars)
head(mtcars)
Output:
Sample DataTo create a generalized linear model in R, we must first select a suitable probability distribution for the answer variable.
- If the answer variable is binary (e.g., 0 or 1), we could use the Bernoulli distribution.
- If the response variable is a count (for example, the number of vehicles sold), the Poisson distribution may be used.
2. Building the model
To create a generalized linear model in R, use the glm() tool. We must describe the model formula (the response variable and the predictor variables) as well as the probability distribution family.
R
data(mtcars)
model <- glm(mpg ~ hp + wt, data = mtcars, family = gaussian)
The Gaussian family is used in this example, which implies that the response variable has a normal distribution.
Why Gaussian family?
The model may be clearly understood in terms of the mean and variance of the response variable, which is one benefit of employing the Gaussian family. Additionally, the model can be fitted using the well-known statistical technique : maximum likelihood estimation.
3. Calculate summary of the model
R
Output:
Summary of the modelA one unit hp increase predicts a 0.03177 mpg decrease, while one unit wt increase predicts a 3.87783 mpg decrease.
4. Visualize the model
After creating an extended linear model, we must evaluate its fit to the data. This can be accomplished with the help of diagnostic graphs such as the residual plot and the Q-Q plot.
R
plot(model, which = 1)
plot(model, which = 2)
Output:
Generalized Linear Models in RThe residual plot displays the residuals (differences between measured and predicted values) plotted against the fitted values. (i.e. the predicted values). We want to see a random scatter of residuals around zero, which indicates that the model is capturing the data trends.
Generalized Linear Models in RThe residuals Q-Q plot displays the residuals plotted against the anticipated values if they were normally distributed. The points should follow a straight line, showing that the residuals are normally distributed.
Similar Reads
Generalized Additive Models Using R
A versatile and effective statistical modeling method called a generalized additive model (GAM) expands the scope of linear regression to include non-linear interactions between variables. Generalized additive models (GAMs) are very helpful when analyzing complicated data that displays non-linear pa
7 min read
Generalized additive model in Python
Generalized additivemodels Models are a wider and more flexible form of a linear model with nonparametric terms and are simply extensions of generalized linear models. Whereas simple linear models are useful when relationships between two variables are strikingly linear, all of which might not be po
7 min read
Fitting Generalized Linear Mixed-Effects Models in R
Generalized Linear Mixed-Effects Models (GLMMs) are powerful statistical models used to analyze data with non-normal distributions, hierarchical structures, and correlated observations. These models extend the capabilities of Generalized Linear Models (GLMs) by incorporating random effects to accoun
4 min read
Can multinomial models be estimated using Generalized Linear model in R?
Multinomial models are used to predict outcomes where the dependent variable is categorical with more than two levels. Generalized Linear Models (GLMs) provide a flexible framework for modeling various types of data, including multinomial outcomes. In this article, we will explore whether multinomia
3 min read
Tree-Based Models Using R
Tree-based models are a group of supervised machine learning algorithms used for both classification and regression tasks. These models work by recursively splitting a dataset into smaller subsets based on certain feature values. The structure formed by these splits is represented as a decision tree
4 min read
Graphical Models in R Programming
In this article, we are going to learn about graphical models in detail in the R programming language. In this, we are going to discuss the graphical model or probabilistic graphical models are statistical models that encode multivariate probabilistic distributions in the form of a graph, its real-l
4 min read
Hierarchical linear regression using R
Linear Regression model is used to establish a connection between two or more variables. These variables are either dependent or independent. Linear Regression In R Programming Language is used to give predictions based on the given data about a particular topic, It helps us to have valuable insight
9 min read
Tuning Machine Learning Models using Caret package in R
Machine Learning is an important part of Artificial Intelligence for data analysis. It is widely used in many sectors such as healthcare, E-commerce, Finance, Recommendations, etc. It plays an important role in understanding the trends and patterns in our data to predict useful information that can
15+ min read
Linear Mixed-Effects Models (LME) In R
Linear Mixed-Effects Models (LME) are powerful tools used in statistical analysis to handle data that involve both fixed and random effects. These models are particularly useful in dealing with hierarchical or grouped data, where observations within the same group may be correlated. In R, the lme4 p
5 min read
How Linear Mixed Model Works in R
Linear mixed models (LMMs) are statistical models that are used to analyze data with both fixed and random effects. They are particularly useful when analyzing data with hierarchical or nested structures, such as longitudinal or clustered data. In R Programming Language, the lme4 package provides a
4 min read