0% found this document useful (0 votes)

8 views

Categorical-Notes-Ch4

Uploaded by

Aaid Algahtani

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

Categorical-Notes-Ch4

Uploaded by

Aaid Algahtani

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 40

Chapter 4

Generalized Linear Models

The techniques presented in Chapter 2 and Chapter 3 are limited in two aspects.
First, they consider only categorical predictors X without allowing for numeric
predictors. Second, we only considered relatively small numbers of predictors.
To overcome these limitations, we will need to develop an extension of the
linear model which is capable of dealing with categorical and count responses;
this leads to the so-called generalized linear model or GLM.

4.1 Introduction to GLMs

Consider data consisting of a response vector Y = (Y1 , . . . , Yn ) and a design
matrix
 �
X1
 
X =  ...  .
Xn�
We will model the distribution of [Yi | Xi = xi ] using a distribution from the
exponential dispersion family.

Deﬁnition 4.1. A family of densities/mass functions {f (·; θ, φ) : θ ∈

Θ, φ ∈ Φ} is an exponential dispersion family if we can write
� �
yθ − b(θ)
f (y; θ, φ) = exp + c(y, φ)
φ/ω

for known functions b(·), c(·, ·) and known constant ω > 0. The parameter
θ is referred to as the canonical parameter and φ is known as the dispersion
parameter.
Remark 4.1. This deﬁnition is given in Section 4.4 of Agresti; we will be using
this one rather than using the less general exponential family deﬁnition from
Section 4.1.1 of Agresti.

51
52 CHAPTER 4. GENERALIZED LINEAR MODELS

Example 4.1. Consider Y ∼ Poisson(λ). The mass function of Y is

λy e−λ
f (y; λ) = = exp {y log λ − λ − log y!} .
y!

Taking θ = log λ, b(θ) = eθ , φ = ω = 1, and c(y, φ) = − log y! we have

� �
yθ − b(θ)
f (y; λ) = exp + c(y, φ) .
φ/ω
Hence the Poisson(λ) family is an exponential dispersion family.
Example 4.2. Let Y = Z/n where Z ∼ Binomial(n, π). Then Y has mass
function
� �
n
π ny (1 − π)n(1−y)
ny
� � ��
n
= exp ny log π + n log(1 − π) − ny log(1 − π) + log .
ny
θ
Now, � θ = log(π/(1 − π)), φ = 1, ω = n, b(θ) = log(1 + e ), and c(y, φ) =
� nset
log ny . Then
� �
yθ − b(θ)
f (y; n, π) = exp + c(y, φ).
φ/ω
Hence the possible distributions of Y form an exponential dispersion family.

Exercise 4.1. Show that the following families are exponential dispersion
families for suitable choices of b(·), c(·, ·), θ, φ, ω.

(a) The Normal(µ, σ 2 ) family.

(b) The Gamma(α, β) family.

Deﬁnition 4.2. A generalized linear model for Y and X models the re-
sponse Yi with the density/mass function
� �
yi θ(xi ) − b(θ(xi ))
f (yi ; θ(xi ), ωi , φ) = exp + c(yi , φ) ,
φ/ωi

where

1. f (·; θ, ω, φ) is an exponential dispersion family; this is referred to as

the stochastic component of the model.
2. g(µi ) = x�
i β for a known link function g(·) and where µi = E {Yi | Xi = xi }.
This is referred to as the systematic component of the model. The
4.1. INTRODUCTION TO GLMS 53

term ηi = x�
i β is referred to as the linear predictor.

4.1.1 Moments of the Exponential Dispersion Family

Given an exponential dispersion family, the log-likelihood is given by
yθ − b(θ)
log f (y; θ, φ, ω) = + c(y, φ).
φ/ω
Suppose (φ, ω) are known; then the score is given by
y − b� (θ)
u(θ) = .
φ/ω
Because the score has mean 0, this gives
Eθ (Y ) − b� (θ)
Eθ u(θ) = = 0.
φ/ω
Hence Eθ (Y ) = b� (θ). Next, the Fisher information is given by
b�� (θ)
Iθ = .
φ/ω
Note that, because the second derivative does not depend on Y , this is both the
observed and expected Fisher information for θ. Because the Fisher information
gives the variance of the score, we have
ω2 ωb�� (θ)
Var θ (Y ) = .
φ2 φ
φ ��
Thus, Varθ (Y ) = ω b (θ). These facts are summarized below.

Fact 4.1. Suppose that Y ∼ f (y; θ, φ, ω) where f (·; θ, φ, ω) is an exponential

dispersion family. Then
1. Eθ (Y ) = b� (θ); and
φ ��
2. Varθ (Y ) = ω b (θ).

4.1.2 The canonical parameter and the canonical link

Fact 4.1 gives an implicit relationship between θ and µ in a generalized linear
model. Recall that we assume

g(µi ) = x�
i β,

for some parameter β in a GLM. Now, we have just shown that µi = b� (θi );
hence,

g(b� (θi )) = x�
i β
54 CHAPTER 4. GENERALIZED LINEAR MODELS

so that
� �
θi = (b� )−1 g −1 (x�
i β) .

Now, imagine we set g(µ) = (b� )−1 (µ). This leads to the model θi = x� i β
so that the linear predictor ηi = x�i β and the canonical parameter θ i coincide.
This choice of the link function is referred to as the canonical link.

Deﬁnition 4.3. In a generalized linear model, the canonical link function

is given by g(µ) = (b� )−1 (µ), and the resulting mass function for Yi is
� �
yi x� �
i β − b(xi β)
exp + c(yi , φ) .
φ/ωi

In some sense, this choice of link function is the most “natural” choice of
link function, and (as we will see) various aspects of GLMs become simpliﬁed
when the canonical link is chosen.
Example 4.3. Consider the binomial dispersion family from Example 4.2 in
which Y = Z/n where Z ∼ Binomial(n, π). For this model, we have φ = 1, ω =
n, b(θ) = log(1 + eθ ) where θ = log π/(1 − π). We have

eθ
b� (θ) = .
1 + eθ
ex
Now, noting that expit(x) = 1+ex is the inverse function of logit(x) = log(x/(1−
x)) we have

E(Y ) = b� (θ) = π.

Taking a second derivative, we have

� �
�� eθ eθ eθ
b (θ) = = · 1− = π(1 − π).
(1 + eθ )2 1 + eθ 1 + eθ
The variance of Y is then
φ �� π(1 − π)
Var(Y ) = b (θ) = .
ω n
This illustrates that the formulae we have given for the moments works as
intended. The canonical link is given by the inverse of b� (θ) = expit(θ), i.e.,
π
g(π) = log 1−π . The generalized linear model for binomial proportion data with
π
the link function g(π) = log 1−π is referred to as a logistic regression model.

Exercise 4.2. Show that the Poisson exponential dispersion family de-
scribed in Example 4.1 has canonical link g(λ) = log λ. Additionally, using
the properties of the exponential dispersion family to verify that E(Y ) = λ
and Var(Y ) = λ for the Poisson distribution.
4.1. INTRODUCTION TO GLMS 55

4.1.3 GLMs model the variance

For most types of categorical data, there is a relationship between the mean
and variance. For example, Poisson models have E(Y ) = Var(Y ) and binomial
proportion models (Y = Z/n, Z ∼ Binomial(n, π)) have Var(Y ) = E(Y )[1 −
E(Y )]/n.
One reason for preferring generalized linear models is that they respect these
relationships. That is, generalized linear models automatically incorporate het-
eroskedasticity. For example, the Poisson model has

Var(Yi ) = eθi .
�
Assuming a canonical link we have Var(Yi | Xi = x) = E(Yi | Xi = x) = ex β
.

4.1.4 The Deviance of a GLM

Given data D = {(Xi , Yi )} which are modeled by some GLM, the log-likelihood
is given by
n
� ωi (Yi θ(µi ) − b(θ(µi )))
+ c(Yi , φ), (4.1)
i=1
φ

where recall

θ(µi ) = (b� )−1 (µi ),

g(µi ) = Xi� β.

For a ﬁxed φ, the model which ﬁts the data as closely as possible is the model
which simply takes µi = Yi .

Exercise 4.3. Let µ = (µ1 , . . . , µn ). Show that (4.1) is maximized as a

function of µ when µi = Yi for i = 1, . . . , n.

The log-likelihood of this model is given by

n
� ωi (Yi θ(Yi )) − b(θ(Yi ))
+ c(Yi , φ).
i=1
φ

This model, which ﬁts the data as closely as possible, is referred to as the
saturated model, and is a model which has a separate mean parameter for every
observation.

Deﬁnition 4.4. The scaled deviance of a GLM given data D = {(Xi , Yi )}

is likelihood ratio test statistic for testing the model against the saturated
56 CHAPTER 4. GENERALIZED LINEAR MODELS

model, i.e. it is
n
� ωi [Yi (θ�i − θ�i ) − (b(θ�i ) − b(θ�i ))]
D� = 2 ,
i=1
φ

where θ�i = θ(Yi ), θ�i = θ(� � where β� is the MLE of

�i = g −1 (Xi� β)
µi ), and µ
β.

The deviance of a GLM is

n
�
D = φD� = 2 ωi [Yi (θ�i − θ�i ) − (b(θ�i ) − b(θ�i ))]
i=1

Remark 4.2. For Poisson and Binomial GLMs, φ ≡ 1 so that the scaled de-
viance D� and the (raw) deviance D are equal.
A common use of the scaled deviance D� is as a test statistic for assessing
goodness of fit. A sensible way to check goodness of fit is to test your model
against a the model which fits the data as well as possible; if you cannot reject
your model in favor of this larger model, this gives some assurance that the
model is not out-of-line with the data.

Fact 4.2. Consider a GLM for D = {(Xi , Yi )} with a p-dimensional predictor

vector β and m observations Yi . Assume that the model is correct. Then:
•
1. If the GLM is a Poisson GLM, then D ∼ χ2m−p for ﬁxed m as the true
expected counts µi tend to ∞.
•
2. If the GLM is a Binomial GLM, then D ∼ χ2m−p for ﬁxed m as the true
expected counts ni πi tend to ∞.

Remark 4.3. It is not true in general that D or D� will have an asymptotic

χ2 distribution as the number of observations diverges m → ∞. This is because
the number of parameters in the saturated model increases as the number of
observations increases. This excludes a number of important special cases: for
example, binary regression will not typically have an asymptotic χ2 distribution
for D because the expected counts are all bounded by ni = 1 (and hence cannot
go to ∞).

4.1.5 Analysis of deviance

The scaled deviance D� can also be used to conduct hypothesis
� � tests of nested
models. Suppose we have a GLM with linear predictor x� βγ and we are inter-
ested in testing the hypothesis H0 : γ = 0 against the alternative H1 : γ �= 0.
This can be done using the a likelihood ratio test, which can be conveniently
expressed in the form

Λ = D0� − D1� ∼
•
χ2d under H0
4.2. BINOMIAL GLMS 57

where D0� is the deviance of the model under H0 and D1� is the deviance of the
model under H1 , and d = dim γ. An application of analysis of deviance is given
in Section 4.3.2. This χ2 approximation is valid even when D� is not itself χ2 ;
all we need is that d = dim γ is ﬁxed in the asymptotics.

4.1.6 Why GLMs?

The GLM approach should be contrasted with a “transformation” based ap-
proach. For example, consider Yi ∼ Poisson(λi ) and suppose we want to model
Yi as a function of predictors Xi . Aside from the fact that λi must be positive,
the linear model

Yi = Xi� β + �i

has a problem in that the �i ’s will not have the same variance, i.e., there is
heteroskedasticity. To resolve this problem, one approach is to transform Yi
so that it is close to homoskedastic. For Poisson data, the so-called “variance
stabilizing transformation” gives the linear model
�
Yi = Xi� β + �i ,

where now the �i ’s have approximately constant variance.

The transformation approach considers modeling E[T (Yi )] in this case where
√
T (y) = y. A generalized linear model instead considers transforming the
mean; we consider g (E(Yi )). Even if T = g, these approaches are not the same
because E(g(Y )) �= g(E(Y )) in general.
A primary advantage of a GLM over using a transformation is that we have
a model for E(Yi ) directly. The transformation approach only gives a model for
E(T (Yi )) and makes inference about E(Yi ) diﬃcult.

4.2 Binomial GLMs

4.2.1 Logistic Regression
We consider case of binomial proportions data, in which the response is Yi =
Zi /n where Zi ∼ Binomial(n, πi ). The logistic regression model sets
π
logit(πi ) = x�
i β where logit(πi ) = log
1−π
is referred to as the logit function. As mentioned in Example 4.3, the logit
function has inverse logit−1 (x) = expit(x) where expit(x) = ex /(1 + ex ) =
(1 + e−x )−1 . Hence, the model for πi is given by

exp(x�
i β)
πi = .
1 + exp(x�
i β)
58 CHAPTER 4. GENERALIZED LINEAR MODELS

A plot of the expit function is given below. As we can see, the function has an
“S” shape, and tends to 0, 1 as x → ±∞. A nice feature of the expit function
is that it respects the fact that πi ∈ [0, 1].

1.0
0.8
0.6
expit

0.4
0.2
0.0

-6 -4 -2 0 2 4 6

Example 4.4. On January 28, 1986, the space shuttle Challenger broke apart
just after launch, taking the lives of all seven of the crew. This example is taken
from an article by Dalal et al. (1989), which examined whether the incident
should have been predicted, and hence prevented, on the basis of data from
previous ﬂights.
The cause of the failure was ultimately attributed to the failure of a crucial
shuttle component known as the O-rings; these components had been tested
prior to the launch to see if they could hold up under a variety of temperatures.
For our analysis, we will let Yi = 1 or 0 if the O-ring failed on a given test
shuttle ﬂight and let Xi = (1, Temperaturei ). The temperature on the day of
the Challenger launch was 31◦ Fahrenheit, or roughly 0◦ Celsius.
We let Yi ∼ Bernoulli(πi ) and use a logistic regression model logit(πi ) =
Xi� β. The data is available in the vcd package:

library(vcd)
data(SpaceShuttle)
head(SpaceShuttle)

## FlightNumber Temperature Pressure Fail nFailures Damage

## 1 1 66 50 no 0 0
## 2 2 70 50 yes 1 4
## 3 3 69 50 no 0 0
## 4 4 80 50 <NA> NA NA
## 5 5 68 50 no 0 0
## 6 6 67 50 no 0 0
4.2. BINOMIAL GLMS 59

The model can be ﬁt using the glm function in R as follows.

fit_space <- glm(I(Fail == 'yes') ~ Temperature, data =

SpaceShuttle, family = binomial)
summary(fit_space)

##
## Call:
## glm(formula = I(Fail == "yes") ~ Temperature, family = binomial,
## data = SpaceShuttle)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.0611 -0.7613 -0.3783 0.4524 2.2175
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 15.0429 7.3786 2.039 0.0415 *
## Temperature -0.2322 0.1082 -2.145 0.0320 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 28.267 on 22 degrees of freedom
## Residual deviance: 20.315 on 21 degrees of freedom
## (1 observation deleted due to missingness)
## AIC: 24.315
##
## Number of Fisher Scoring iterations: 5

As we can see, the coeﬃcient corresponding to the predictor Temperature is

significant; we will delve into the other components of the fit later, but for now
we will just plot the fit.

beta <- coef(fit_space)

plot(function(x) expit(beta[1] + x * beta[2]), xlim = c(40, 90),
xlab = "Temperature", ylab = "Estimated probability of failure")
with(SpaceShuttle, points(Temperature, ifelse(Fail == 'yes', 1, 0)))
60 CHAPTER 4. GENERALIZED LINEAR MODELS

1.0
0.8
Estimated probability of failure

0.6
0.4
0.2
0.0

40 50 60 70 80 90

Temperature

We make two observations at this point:

1. The dataset is relatively small, and certainly predicting the outcome at
31◦ based on these results requires a large amount of extrapolation.
2. If we take the model at face-value, a failure was inevitable; the estimated
probability of failure at 31◦ is virtually 1:

expit(beta[1] + 31 * beta[2])

## (Intercept)
## 0.9996088

It seems unlikely that the astronauts would get on the shuttle if they were
aware of this.

4.2.2 Interpreting the coeﬃcients of a logistic regression

A nice feature of the logistic is that the coeﬃcients have relatively simple in-
terpretations. Consider predictor j and a vector x = (x1 , . . . , xP ). Consider
4.2. BINOMIAL GLMS 61

shifting xj by Δ units while holding all other entries ﬁxed. Then

�
logit π(xj + Δ) = xp βp + (xj + Δ)βj = logit π(xj ) + Δβj .
p�=j

The odds ratio corresponding to a change of Δ units in j is then

Odds(xj + Δ)
= eΔβj .
Odds(xj )

This leads to the following interpretation:

Holding all other predictors ﬁxed, a change in xj by Δ has a multi-

plicative eﬀect of eΔβj on the odds of success.

Example 4.5. For the shuttle data, the estimated regression coeﬃcients are

beta <- coef(fit_space)

print(beta)

## (Intercept) Temperature
## 15.0429016 -0.2321627

A change in the temperature of −10 degrees results in a multiplicative in-

crease in the odds of failure by an estimated factor of

exp(-10 * beta[2])

## Temperature
## 10.19225

That is, the odds of failure increase by a factor of roughly 10 (estimated) for
every decrease of 10 degrees.

4.2.3 Conﬁdence intervals for coeﬃcients

The output of the glm function gives standard errors which we can use to com-
pute conﬁdence intervals of the form

βj = β�j ± zα/2 se(β�j ).

These are essentially Wald-based intervals and they are generally not preferred.
A better conﬁdence interval can be obtained by inverting a likelihood ratio test.
Consider the null hypothesis H0 : βj = b versus the alternative Ha : βj �= b.
This hypothesis can be tested by the following procedure:

1. Fit the model with βj unrestricted and compute the log-likelihood A.

2. Fit the model with βj ﬁxed at b and compute the log-likelihood B.

62 CHAPTER 4. GENERALIZED LINEAR MODELS

3. Under H0 , the test statistic Λ(b) = −2(B − A) has an asymptotic χ21

distribution; reject if Λ(b) > χ21,α .

Inverting this test, we can form an asymptotic 100(1 − α)% confidence interval
for βj as {b : Λ(b) ≤ χ21,α }. This is referred to as a profile-likelihood confidence
interval.
Example 4.6. The profile confidence interval is easy to obtain in R; we compare
this with the Wald interval.

## Profile confidence intervals

confint(fit_space)

## Waiting for profiling to be done...

## 2.5 % 97.5 %
## (Intercept) 3.3305848 34.34215133
## Temperature -0.5154718 -0.06082076

## Computing Wald intervals

var_beta <- vcov(fit_space) ## Get the inverse Fisher information
ci_space <- function(j) beta[j] +
c(-1, 1) * sqrt(var_beta[j,j]) * qnorm(0.975)
rbind(ci_space(1), ci_space(2))

## [,1] [,2]
## [1,] 0.5810523 29.50475096
## [2,] -0.4443022 -0.02002324

The intervals largely overlap but are somewhat different. Again, we generally
prefer the profile confidence intervals. From this, we see that the multiplicative
effect on the odds of a change of −10 degrees in temperature is, with 95%
confidence, in the interval

exp(-10 * confint(fit_space))[2,]

## Waiting for profiling to be done...

## 2.5 % 97.5 %
## 173.246863 1.837136

4.2.4 Other Choices of Link Functions

The linear link function
The choice of link function determines how the success probability π varies with
the predictor xi . The simplest choice of link function would be a simple linear
4.2. BINOMIAL GLMS 63

link function

πi = x�
i β.

This is adequate in many situations; however, one runs into problems when
x� �
i β > 1 or xi β < 0. Whenever there is a continuous predictor with an
unrestricted range, there will always exist values of x which cause this to happen.
As we know that probabilities must lie in [0, 1], this is problematic. Hence, one
should think very carefully before using the a linear link function for binomial
logistic regression.
A beneﬁt of the linear link function is that it is extremely easy to interpret.

Latent tolerance link functions

A general class of link functions can be obtained by viewing the response Yi
as being determined through a latent tolerance model. Consider a toxicology
example, in which individual i can tolerate a dose Ti of some treatment (Yi = 0),
but if the dose exceeds Ti then the individual dies (Yi = 1). We model the dosage
a subject receives as a function of covariates ηi = Xi� β. As we do not know
iid
the tolerance, we model Ti ∼ F for some known distribution function F . This
results in the following induced model for Yi :

Pr(Yi = 1) = Pr(Ti ≤ Xi� β) = F (Xi� β).

This gives the GLM F −1 (πi ) = Xi� β, a binomial GLM with link function
g(π) = F −1 (π). Thus, given any distribution function F (·), we can deﬁne a
binomial GLM.

Exercise 4.4. Show that logistic regression corresponds to a latent toler-

ance model in which Ti is a random variable with density

et
f (t) = , (−∞ < t < ∞).
(1 + et )2

The distribution of Ti in this case is referred to as the logistic distribution.

Aside from logistic regression, the latent tolerance model has many inter-
esting special cases. The probit model considers Ti ∼ Normal(0, 1), while the
complementary log-log model sets Ti to have an extreme value distribution.
In practice, the main diﬀerence between these models lies in how they handle
outliers, which is determined by how fast F (x) → 0, 1 as x → ±∞. When F (x)
corresponds to a light-tailed distribution, such as a normal distribution, we
expect to see very few outliers in Y ; here, an outlier is a point (x, y) where x
is extreme and y takes on the “wrong” value (i.e., y is strongly predicted to be
1, but is 0 instead). Conversely, a heavy-tailed distribution (such as a Cauchy
distribution) does not have problems with outliers.
The logistic regression model has exponential tails, with F (x) ≈ ex as x →
−∞ and F (x) ≈ 1−e−x as x → ∞. This is heavier than the normal distribution,
64 CHAPTER 4. GENERALIZED LINEAR MODELS

Probit

1.0
Logit
C-Log-Log
0.8
0.6
Link

0.4
0.2
0.0

-4 -2 0 2 4

Figure 4.1: Comparison of link functions, after scaling and centering.

which has super-exponential tails, but lighter than the t-distribution, which has
polynomial tails.
We can also choose F (t) to correspond to an asymmetric distribution. This
is often done in toxicology studies, where a suﬃciently high toxicity basically
guarantees death, but very small doses can also be fatal. In this case, the
complementary log-log model F −1 (π) = log{− log(1 − π)} is a good choice. A
comparison of the complementary log-log, probit, and logistic links is given in
the Figure 4.1.

4.2.5 Fitting a Binomial GLM

Agresti considers the following data. This data is from an epidemiological survey
to investigate snoring as a risk factor for heart disease. The sample consists of
2484 subjects and is given in the following table.
4.2. BINOMIAL GLMS 65

Heart Disease
Snoring Yes No
Never 24 1355
Occasionally 35 603
Nearly every night 21 192
Every night 30 224

The variable snoring is ordinal; to account for this, we will assign the nu-
merical scores 0, 2, 4, and 5 to the categories. We construct the data in R as:

snoring <- cbind(Yes = c(24, 35, 21, 30),

No = c(1355, 603, 192, 224))
snoring_scores <- c(0, 2, 4, 5)
print(cbind(snoring, snoring_scores))

## Yes No snoring_scores
## [1,] 24 1355 0
## [2,] 35 603 2
## [3,] 21 192 4
## [4,] 30 224 5

We consider the data is consisting of four binomial random variables Zi ∼

Binomial(ni , πi ) where, for example, Z1 = 24 and n1 = 24 + 1355. To ﬁt a
binomial logistic regression to Yi = Zi /n, we can use the commands

snoring_fit <- glm(snoring ~ snoring_scores, family = binomial)

Note how the data is input: the response is taken to be a matrix with the
ﬁrst column consisting of number of successes and the second column consisting
of the number of failures. The term family = binomial speciﬁes that we are
doing logistic regression. We obtain a summary of the model as

summary(snoring_fit)

##
## Call:
## glm(formula = snoring ~ snoring_scores, family = binomial)
##
## Deviance Residuals:
## 1 2 3 4
## -0.8346 1.2521 0.2758 -0.6845
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
66 CHAPTER 4. GENERALIZED LINEAR MODELS

## (Intercept) -3.86625 0.16621 -23.261 < 2e-16 ***

## snoring_scores 0.39734 0.05001 7.945 1.94e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 65.9045 on 3 degrees of freedom
## Residual deviance: 2.8089 on 2 degrees of freedom
## AIC: 27.061
##
## Number of Fisher Scoring iterations: 4

The snoring scores are highly signiﬁcant. We also note that the deviance
is D = 2.8089. The asymptotic χ2 approximation is likely to be good here,
as the cell counts are large. Comparing this to a χ24−2 distribution, a P -value
for the test against the saturated model is 0.2455, indicating that there is little
evidence of lack of ﬁt.

Exercise 4.5. Interpret the estimated coeﬃcient for snoring scores and
construct a Wald interval for the coeﬃcient.

Next, we get the predictions from the model.

snoring_predictions <- predict(snoring_fit, type = 'response')

print(cbind(
c("Never", "Occasionally", "Nearly Every Night", "Every Night"),
round(snoring_predictions,3)
))

## [,1] [,2]
## 1 "Never" "0.021"
## 2 "Occasionally" "0.044"
## 3 "Nearly Every Night" "0.093"
## 4 "Every Night" "0.132"

Evidently the probability of developing heart disease increases dramatically as

the snoring score increases, from an estimated 2% to an estimated 13%.

4.3 Poisson Loglinear Models

The simplest GLM for count data is the Poisson loglinear model.

Deﬁnition 4.5. Given data D = {(Xi , Yi )}, the distribution of [Yi | Xi ]

4.3. POISSON LOGLINEAR MODELS 67

follows a Poisson loglinear model if

� �
Yi ∼ Poisson exp(x� β) given Xi = x.

This model was introduced in Exercise 4.2, where we saw that this corresponds
to a Poisson GLM using the canonical link g(λ) = log λ; that is, the model is

log{E(Yi | Xi = x)} = x� β.

4.3.1 Interpreting the coeﬃcients

The coefficients of Poisson loglinear model, like the coefficients of a binomial
logistic regression model, have nice interpretations. Consider shifting xj by Δ
units while holding all other entries fixed. Then
�
log λ(xj + Δ) = xp βp + (xj + Δ)βj = log λ(xj ) + Δβj .
p�=j

Thus holding all other predictors ﬁxed, shifting xj by Δ has the eﬀect of multi-
plying the mean by eΔβj :

λ(xj + Δ) = eΔβj λ(xj ).

This leads to the following interpretation:

Holding all other predictors ﬁxed, a change in xj by Δ units has a

multiplicative eﬀect of eΔβj on the mean of Yj .

4.3.2 Poisson Loglinear Models with an Oﬀset

Many Poisson loglinear models also include a term called an oﬀset. The need
for oﬀset terms is best understood through an example.
Example 4.7. This example, taken from Section 6.3.2 of McCullaugh and
Nelder (1989), concerns modeling the rate of reported damage incidents of cer-
tain types of cargo-carrying ships. We consider the following predictors which
make up Xi :

• An intercept term.

• The type of ships (A–E).

• The year of construction, as a categorical variable (60–64, 65–69, 70–74,

75–79).

• The period of operation (60–74, 75–79).

• The number of months of service.

68 CHAPTER 4. GENERALIZED LINEAR MODELS

Each of these predictors is categorical, with the exception of months of service.

Letting λ(x) be E{Yi | Xi = x}, we could posit a Poisson loglinear model as

log λ(x) = α + βType + γYear + δPeriod + ζ × Months.

Instead, we make the following observation:

Consider two ships, of the same type, constructed in the same year,
and serving in the period. If the ﬁrst ship is at service for twice as
many months as the second, how many more incidents do we expect
it to have on average?
Assuming that incidents arrive according to something like a homogeneous Pois-
son process, the answer is clear: we would expect twice as many accidents. This
suggests the model

log λ(x) = α + log Months + βType + γYear + δPeriod .

The term log Months is called an oﬀset. It essentially corresponds to a predictor

which has a known coefficient equal to 1.
We now fit this Poisson loglinear model with an offset term to the data.
First we print the data:

library(MASS)
data(ships)
head(ships)

## type year period service incidents

## 1 A 60 60 127 0
## 2 A 60 75 63 0
## 3 A 65 60 1095 3
## 4 A 65 75 1095 4
## 5 A 70 60 1512 6
## 6 A 70 75 3353 18

For this data, the rows actually correspond to the number of incidents for
all ships of the same type/year/period, with the oﬀset term service giving the
total number of months those ships were at service. The Poisson loglinear model
can be ﬁt as:

ships$yearbuilt <- as.factor(ships$year)

fit_ships_poisson <- glm(incidents ~ type + yearbuilt + period,
offset = log(service), family = poisson,
data = ships, subset = (service != 0))
summary(fit_ships_poisson)

##
## Call:
4.3. POISSON LOGLINEAR MODELS 69

## glm(formula = incidents ~ type + yearbuilt + period, family = poisson,

## data = ships, subset = (service != 0), offset = log(service))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.6768 -0.8293 -0.4370 0.5058 2.7912
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -7.943769 0.561747 -14.141 < 2e-16 ***
## typeB -0.543344 0.177590 -3.060 0.00222 **
## typeC -0.687402 0.329044 -2.089 0.03670 *
## typeD -0.075961 0.290579 -0.261 0.79377
## typeE 0.325579 0.235879 1.380 0.16750
## yearbuilt65 0.697140 0.149641 4.659 3.18e-06 ***
## yearbuilt70 0.818427 0.169774 4.821 1.43e-06 ***
## yearbuilt75 0.453427 0.233170 1.945 0.05182 .
## period 0.025631 0.007885 3.251 0.00115 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 146.328 on 33 degrees of freedom
## Residual deviance: 38.695 on 25 degrees of freedom
## AIC: 154.56
##
## Number of Fisher Scoring iterations: 5

We can also test for the individual factors using likelihood ratio tests; the anova
function in R performs the results of analysis of deviance in which the various
terms are added sequentially to the model.

anova(fit_ships_poisson, test = "LRT")

## Analysis of Deviance Table

##
## Model: poisson, link: log
##
## Response: incidents
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev Pr(>Chi)
## NULL 33 146.328
70 CHAPTER 4. GENERALIZED LINEAR MODELS

## type 4 55.439 29 90.889 2.629e-11 ***

## yearbuilt 3 41.534 26 49.355 5.038e-09 ***
## period 1 10.660 25 38.695 0.001095 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

It is also possible to test for the eﬀect of each factor assuming all other terms
are being included in the model. This can be done using the drop1 function.

drop1(fit_ships_poisson, test = "LRT")

## Single term deletions

##
## Model:
## incidents ~ type + yearbuilt + period
## Df Deviance AIC LRT Pr(>Chi)
## <none> 38.695 154.56
## type 4 62.365 170.23 23.670 9.300e-05 ***
## yearbuilt 3 70.103 179.97 31.408 6.975e-07 ***
## period 1 49.355 163.22 10.660 0.001095 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

From the likelihood ratio tests, we see that there is a large amount of evidence
for all three predictors. The deviance for this example is

D <- deviance(fit_ships_poisson)
p_value <- pchisq(D, df.residual(fit_ships_poisson),
lower.tail = FALSE)

c(D = D, df = df.residual(fit_ships_poisson), p_value = p_value)

## D df p_value
## 38.69505154 25.00000000 0.03951433

There is some evidence of lack of ﬁt for the model. One possibility for
correcting this is to consider interaction terms. For example:

fit_ships_interaction <- glm(incidents ~ type * yearbuilt + period,

offset = log(service), family = poisson,
data = ships, subset = (service != 0 ))
anova(fit_ships_interaction, test = "LRT")

## Analysis of Deviance Table

##
## Model: poisson, link: log
4.4. DEALING WITH OVERDISPERSION 71

##
## Response: incidents
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev Pr(>Chi)
## NULL 33 146.328
## type 4 55.439 29 90.889 2.629e-11 ***
## yearbuilt 3 41.534 26 49.355 5.038e-09 ***
## period 1 10.660 25 38.695 0.001095 **
## type:yearbuilt 12 24.108 13 14.587 0.019663 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The deviance of 14.587 is now in line with the reference χ213 distribution.

4.4 Dealing with Overdispersion

For count data, overdispersion occurs when Var(Yi ) > E(Yi ). This situation is
very common! For example, it necessarily occurs when Yi ∼ Poisson(ηi ) and ηi
is random.

4.4.1 Negative Binomial GLMs

For loglinear GLMs, the root of the problem is that the Poisson distribution
only has one parameter. Alternatively, we might use a negative binomial model.
Recall that a negative binomial describes the following experiment:

Flip a coin with probability of Heads equal to π repeatedly. Then

Y = the number of Tails you ﬂip until you have observed k heads.
The mass function of Y in this case is

f (y; π, k) = Z(y, k) · π k (1 − π)y , y = 0, 1, 2, . . . .

where Z(y, k) is the number of Bernoulli sequences of length y + k consisting of

exactly y failures and k successes, subject to the constraint that the last trial is
a success, i.e.,
� �
y+k−1 (y + k − 1)! Γ(y + k)
Z(y, k) = = = .
y y!(k − 1)! Γ(k)Γ(y + 1)

Exercise 4.6. Suppose that Y has a negative binomial distribution with

72 CHAPTER 4. GENERALIZED LINEAR MODELS

success probability π and k successes. Show that

k(1 − π) k(1 − π)
E(Y ) = , and Var(Y ) = .
π π2
Writing π in terms of E(Y ) = µ, show this gives

k
π=
µ+k
µ2
Var(Y ) = µ +
k
Hint: we can write Y as the sum of k independent Geometric random
variables.

Deﬁnition 4.6. A loglinear negative binomial GLM for [Yi | Xi = Xi ]

models the mass function of Yi as
� �k � �y i
Γ(yi + k) k k
f (yi | k, µi ) = · 1− ,
Γ(k)Γ(yi + 1) µ+k µ+k

where log µi = Xi� β.

The key point of the negative binomial GLM is that it implies overdispersion.
Instead of having the variance equal to µ, it is equal to µ+µ2 /k. When k is very
large relative to µ, we will not have much overdispersion; on the other hand, if
k is small relative to µ then we will have a lot of overdispersion.
When k = 1, the negative binomial corresponds to a geometric random
variable. It turns out that the negative binomial distribution also includes the
Poisson distribution as a special case.

Exercise 4.7. Consider the hierarchical model

Y ∼ Poisson(λ) given λ,
λ ∼ Gamma(k, k/µ).

(a) Show that Y has a negative binomial distribution with k trials and
mean µ.
(b) As k → ∞, show that Y converges in distribution to a Poisson(µ).
Note: the gamma is parameterized so that E(λ) = µ and Var(λ) = µ2 /k.

Remark 4.4. Exercise 4.7 is important for the following reasons. First, in
indicates that we do not need to restrict k to be an integer. Second, it makes
it clear why Y is overdispersed relative to a Poisson; as we have mentioned,
introducing a latent variable automatically will cause overdispersion. Third,
4.4. DEALING WITH OVERDISPERSION 73

it gives some intuitive justification for how a negative binomial might arise in
practice, even if there is no obvious “coin flipping” going on. Fourth, it makes it
clear that the Poisson GLM is a special case of a negative binomial GLM, so if
a Poisson GLM is correct we will not lose anything (aside from some estimation
efficiency) by using a negative binomial GLM.

4.4.2 Quasi-likelihood Methods

We briefly introduce the idea of quasi-likelihood here, since it is often used to
deal with overdispersion. Rather than specifying a parametric model for the
response Yi , we have a model with the following components:
(A1) A linear predictor ηi = Xi� β;
(A2) A link function g(·) such that µi = g −1 (ηi ); and
φ
(A3) A variance function V (·) such that Var(Yi | Xi = x) = ωi V (µi ).
Remark 4.5. These components should be compared with Fact 4.1. Every
GLM satisfies assumptions (A1–A3) V (µi ) = b�� {(b� )−1 (µi )}; for example, a
Poisson GLM has
V (µi ) = µi ,
while a binomial GLM has
V (µi ) = µi (1 − µi ).
The quasi-likelihood framework is more general, however, because there is no
GLM that corresponds to the choices φ > 1 and V (µ) = µ or V (µ) = µ(1 −
µ).
The reason for the name “quasi-likelihood” will become clearer later, when
we study how they are fit.

Deﬁnition 4.7. A quasi-Poisson model for Yi given Xi is a model for Yi

which satisﬁes assumptions A1 and A2 with variance function V (µ) = µ
and ωi = 1 so that

Var(Yi | Xi = x) = φµi .
The Poisson GLM is a type of quasi-Poisson model when φ ≡ 1, but the
quasi-Poisson model allows overdispersion when φ > 1.

4.4.3 Quasi-Poisson or Negative Binomial?

The negative binomial and quasi-Poisson models give two possible ways of deal-
ing with overdispersion. But which one is most appropriate?
The negative binomial model has a couple of features which make it some-
what undesirable. The form of the variance is µi + µ2i /k, i.e., the negative
binomial has variance which is quadratic in µi . Many (including Agresti) ﬁnd it
unnatural to have the variance scale with µ2i , and hence prefer the quasi-Poisson
as a default.
74 CHAPTER 4. GENERALIZED LINEAR MODELS

4.4.4 Homicide Example

The following data is taken from Table 14.6 in Agresti. The data is from a survey
of 1308 people in which they were asked how many homicide victims they know.
The variables are resp, the number of victims the respondent knows, and race,
the race of the respondent (black or white). The question: to what extent does
race predict how many homicide victims a person knows? The data is given by:

Response Black White

0 119 1070
1 16 60
2 12 14
3 7 4
4 3 0
5 2 0
6 0 1

We load the data into R:

black <- c(119,16,12,7,3,2,0)

white <- c(1070,60,14,4,0,0,1)
resp <- c(rep(0:6,times=black), rep(0:6,times=white))
race <- factor(c(rep("black", sum(black)),
rep("white", sum(white))),
levels = c("white","black"))
victim <- data.frame(resp, race)
head(victim)

## resp race
## 1 0 black
## 2 0 black
## 3 0 black
## 4 0 black
## 5 0 black
## 6 0 black

tail(victim)

## resp race
## 1303 2 white
## 1304 3 white
## 1305 3 white
## 1306 3 white
## 1307 3 white
## 1308 6 white
4.4. DEALING WITH OVERDISPERSION 75

We consider GLM’s of the form

g(µi ) = β0 + β1 I(person i is black).

First, we compute summary statistics for the data.

library(tidyverse)
victim %>% group_by(race) %>%
dplyr::summarise(mean = mean(resp), var = var(resp))

## # A tibble: 2 x 3
## race mean var
## <fct> <dbl> <dbl>
## 1 white 0.0923 0.155
## 2 black 0.522 1.15

We display these results in the following table.

Respondent Race Mean Variance

White 0.09 0.16
Black 0.52 1.15

Exercise 4.8. Explain why the table above suggests that the Poisson GLM
is not appropriate for modeling the homicide data.

Next, we ﬁt the Poisson, negative binomial, and quasi-Poisson models.

fit_poisson <- glm(resp ~ race, family = poisson, data = victim)

fit_negbin <- glm.nb(resp ~ race, data = victim)
fit_quasi <- glm(resp ~ race, data = victim, family = quasipoisson)

k_negbin <- fit_negbin$theta

phi_quasi <- summary(fit_quasi)$dispersion
print(c(k = k_negbin, phi = phi_quasi))

## k phi
## 0.2023119 1.7456940

This gives the estimates k = 0.2 for the negative binomial model and φ =
1.75 for the quasi-Poisson. We now see how the variance estimates for the
models line up with the empirical variance.

victim_summary <- victim %>% group_by(race) %>%

dplyr::summarise(mean = mean(resp), empirical_var = var(resp))
victim_summary <- victim_summary %>%
76 CHAPTER 4. GENERALIZED LINEAR MODELS

mutate(poisson_var = mean,
negbin_var = mean + mean^2 / k_negbin,
quasi_var = phi_quasi * mean)
print(victim_summary)

## # A tibble: 2 x 6
## race mean empirical_var poisson_var negbin_var quasi_var
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 white 0.0923 0.155 0.0923 0.134 0.161
## 2 black 0.522 1.15 0.522 1.87 0.911
This gives the following table:
Model Var, White Var, Black
Observed 0.16 1.15
Poisson 0.09 0.52
Negative Binomial 0.13 1.87
Quasi-Poisson 0.16 0.91
The quasi-Poisson model does quite well at recovering the observed/empirical
variance for each race, while the Poisson does not do very well. The negative
binomial is somewhat in-between, give a slightly lower variance for white re-
spondents and a moderately larger variance for black respondents.

4.5 Technical Details: Likelihood Theory

We now delve into the likelihood equations to gain insight into GLMs and un-
derstand better how they are ﬁt. Recall that the log-likelihood of a GLM is
n
� ωi [Yi θi − b(θi )]
L= + c(Yi , φ).
i=1
φ
Diﬀerentiating with respect to βj and applying the chain rule gives
n
� ωi [Yi − b� (θi )] ∂θi ∂µi ∂ηi
∂L
= ,
∂βj i=1
φ ∂µi ∂ηi ∂βj

where recall that µi = b� (θi ), g(µi ) = ηi , and ηi = Xi� β. We now begin to

d −1
simplify. First, using the fact that dx f (x) = 1/f � (f −1 (x)), we derive the
following:
∂θi 1 def 1
= �� = ,
∂µi b {(b� )−1 (µi )} V (µi )
∂µi 1
= �
∂ηi g (µi )
∂ηi
= Xij .
∂βj
4.5. TECHNICAL DETAILS: LIKELIHOOD THEORY 77

Hence, the score vector is given by

n
∂L � ωi Xi (Yi − µi )
= .
∂β i=1
φV (µi )g � (µi )

Summarizing, we have the following

Fact 4.3. The score of β is

� ωi Xi (Yi − µi )
u(β) = ,
i
φV (µi )g � (µi )

� = 0.
where V (µi ) = b�� {(b� )−1 (µi )} and the MLE of β satisﬁes u(β)

Remark 4.6. Even though there are no β’s on the right-hand-side of u(β), note
that µi = g −1 (Xi� β) so that µi depends implicitly on β.
Next, we derive the Fisher information. We have

∂2L 1� ∂ Yi − µi
− = ωi Xij
∂βj ∂βk φ i ∂βk V (µi )g � (µi )
1� ∂ Yi − µi ∂µi ∂ηi
=− ωi Xij
φ i ∂µi V (µi )g � (µi ) ∂ηi ∂βk
� � ��
1� ωi ωi (Yi − µi ) ∂ 1
= Xij Xik − .
φ i V (µi )g � (µi )2 g � (µi ) ∂µi V (µi )g � (µi )

Hence, the observed Fisher information can be written as

� X/φ
J = X �W

� is diagonal with (i, i)th entry

where W

ωi ωi (Yi − µi ) ∂ 1
�
− .
V (µi )g (µi ) 2 g (µi ) ∂µi V (µi )g � (µi )
�

The expected Fisher information is then

I = Eβ (J ) = X � W X/φ

where W is diagonal with entries ωi /[V (µi )g � (µi )2 ], where we have simply used
the fact that Eβ (Yi − µi ) = 0. We summarize these results as:
78 CHAPTER 4. GENERALIZED LINEAR MODELS

Fact 4.4. (a) The observed Fisher information for a GLM of Yi | Xi = x is

given by

� X/φ
J = X �W

� is diagonal with (i, i)th entry

where W

ωi ωi (Yi − µi ) ∂ 1
�ii =
w − .
V (µi )g � (µi )2 g � (µi ) ∂µi V (µi )g � (µi )

(b) The expected Fisher information is

I = Eβ (J ) = X � W X/φ

where W is diagonal with entries

ωi
wii = .
V (µi )g � (µi )2

(c) The asymptotic covariance of the MLE estimator β� of β is φ(X � W X)−1 .

(d) The matrix W can be approximated with W � , which has diagonal entries
ωi /[V (� µi )2 ] where µ
µi )g(� �i is the MLE of µi .

Remark 4.7. Note that similarity between the asymptotic covariance of β� for
the linear model Y = Xβ + � with iid mean-0 ﬁnite-variance errors �i and
the GLM. In the former case, covariance of β� is σ 2 (X � X)−1 . The formula
φ(X � W X)−1 generalizes this to the case of generalized linear models.
Remark 4.8. Typically, we favor the use of the observed Fisher information J
� to approximate the variance of β;
(evaluated at β) � we do not favor the use of
the expected Fisher information I.

Exercise 4.9. Recall that we deﬁned V (µi ) = b�� {(b� )−1 (µi )} and that we
deﬁne the canonical link by g(µi ) = (b� )−1 (µi ).
(a) Show that J = I when the canonical link is used. Hint: Show that
V (µi )g � (µi ) = 1.

(b) Suppose that the canonical link is used and that φ is known and that
the design matrix X is regarded as ﬁxed and known. Show that X � Y�
is suﬃcient for β, where Y� = (ω1 Y1 , . . . , ωn Yn ).

4.6 Analysis of Residuals

As in linear models, examination of residuals can give insight into where a
poorly-ﬁtting GLM might be failing. There are several possibilities for residuals
4.6. ANALYSIS OF RESIDUALS 79

to check.

Deﬁnition 4.8. The Pearson residual for observation i is

(Yi − µ�i )
ei = � .
V (�
µi )/ωi

The Pearson residual is relatively intuitive: how far away is Yi from its estimated
mean, scaled by an estimate of its variance. Additionally, the Pearson residuals
are associated to the generalized X 2 statistic
n
� n
�
�i )2
ωi (Yi − µ
X2 = = e2i .
i=1
V (�
µi ) i=1

When φ = 1, the generalized X 2 statistic is associated to the score test

comparing the fitted model to the saturated model. We use this observation
as motivation to define residuals using the likelihood ratio test comparing the
fitted model to the saturated model.

Note: The Pearson residuals do not depend on φ; similarly, the deviance resid-
uals (to be deﬁned below) do not depend on φ. Hence, they can have variance
much larger than 1; keep this in mind when trying to interpret them!

Deﬁnition 4.9. Deﬁne

di = 2ωi [Yi (θ�i − θ�i ) − {b(θ�i ) − b(θ�i )}],

where θ�i and θ�i are as in Deﬁnition 4.4. Then we deﬁne the deviance
residual for observation i as
�
di × sign(Yi − µ
�i ).

The deviance D is the the sum of the squared deviance residuals, giving an
analogy with the relationship between Pearson’s residuals and the generalized
X 2 statistic. The quantiles of the deviance residuals are reported in the output
of the GLM function in R. For the Ships dataset, we have

summary(fit_ships_poisson)

##
## Call:
## glm(formula = incidents ~ type + yearbuilt + period, family = poisson,
## data = ships, subset = (service != 0), offset = log(service))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
80 CHAPTER 4. GENERALIZED LINEAR MODELS

## -1.6768 -0.8293 -0.4370 0.5058 2.7912

##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -7.943769 0.561747 -14.141 < 2e-16 ***
## typeB -0.543344 0.177590 -3.060 0.00222 **
## typeC -0.687402 0.329044 -2.089 0.03670 *
## typeD -0.075961 0.290579 -0.261 0.79377
## typeE 0.325579 0.235879 1.380 0.16750
## yearbuilt65 0.697140 0.149641 4.659 3.18e-06 ***
## yearbuilt70 0.818427 0.169774 4.821 1.43e-06 ***
## yearbuilt75 0.453427 0.233170 1.945 0.05182 .
## period 0.025631 0.007885 3.251 0.00115 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 146.328 on 33 degrees of freedom
## Residual deviance: 38.695 on 25 degrees of freedom
## AIC: 154.56
##
## Number of Fisher Scoring iterations: 5

The deviance residuals for this example vary between −1.7 and 2.8.
Generally, we prefer to have residuals which are standardized to have (ap-
proximate) mean 0 and variance 1. The deviance residuals and Pearson residuals
do not have this property. In the case of the Pearson residual, the reason that
we do not have approximate variance 1 is that Yi and µ �i are correlated; the
same issue occurs in linear regression models, wherein the standardized residu-
als (Yi − Y�i )/�
σ do not have variance 1. When φ = 1, Agresti shows (see Sections
4.5.6 and 4.5.7) that

Var(ei ) ≈ (1 − �
hii ) where � � X)−1 X � W 1/2 �ii , (4.2)
hii = �W 1/2 X(X � W

i.e., � � X)−1 X � W 1/2 . When

hii is the ith diagonal element of W 1/2 X(X � W
�
φ �= 1, we instead have Var(ei ) ≈ φ(1 − hii )

Deﬁnition 4.10. The standardized Pearson residual of observation i is

ei
ri = � ,
� −�
φ(1 hii )
4.6. ANALYSIS OF RESIDUALS 81

where �hii is as deﬁned in (4.2) and φ� is an appropriate estimate of φ (or,

φ� = φ if φ is known).

Example 4.8. This example concerns the 1973 admissions data for department
at the University of California at Berkeley. The key inferential issue lies in
assessing whether there is evidence of sex bias in their admissions practices. We
consider two predictors: the sex of the student and the department the student
applied to.
The data is built into R and be loaded as follows:

data(UCBAdmissions)
berk_0 <- data.frame(UCBAdmissions)
print(berk_0)

## Admit Gender Dept Freq

## 1 Admitted Male A 512
## 2 Rejected Male A 313
## 3 Admitted Female A 89
## 4 Rejected Female A 19
## 5 Admitted Male B 353
## 6 Rejected Male B 207
## 7 Admitted Female B 17
## 8 Rejected Female B 8
## 9 Admitted Male C 120
## 10 Rejected Male C 205
## 11 Admitted Female C 202
## 12 Rejected Female C 391
## 13 Admitted Male D 138
## 14 Rejected Male D 279
## 15 Admitted Female D 131
## 16 Rejected Female D 244
## 17 Admitted Male E 53
## 18 Rejected Male E 138
## 19 Admitted Female E 94
## 20 Rejected Female E 299
## 21 Admitted Male F 22
## 22 Rejected Male F 351
## 23 Admitted Female F 24
## 24 Rejected Female F 317

We place this data into a form suitable for the glm function (with admit-
ted/rejected counts on the same row) using the tidyverse package:

library(tidyverse)
berk <- berk_0 %>% group_by(Gender, Dept) %>%
dplyr::summarise(Admitted = sum(Freq * (Admit == "Admitted")),
82 CHAPTER 4. GENERALIZED LINEAR MODELS

Rejected = sum(Freq * (Admit == "Rejected"))) %>%

as.data.frame()
print(berk)

## Gender Dept Admitted Rejected

## 1 Male A 512 313
## 2 Male B 353 207
## 3 Male C 120 205
## 4 Male D 138 279
## 5 Male E 53 138
## 6 Male F 22 351
## 7 Female A 89 19
## 8 Female B 17 8
## 9 Female C 202 391
## 10 Female D 131 244
## 11 Female E 94 299
## 12 Female F 24 317

We consider a logistic regression model in which the number of admitted in

for gender i in department j is binomial with success probability πij . We ﬁrst
consider the model logit(πij ) = α + βi , i.e., the probability of being accepted
depends on gender.

berk_glm_1 <- glm(cbind(Admitted, Rejected) ~ Gender,

family = binomial, data = berk)
summary(berk_glm_1)

##
## Call:
## glm(formula = cbind(Admitted, Rejected) ~ Gender, family = binomial,
## data = berk)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -16.7915 -4.7613 -0.4365 5.1025 11.2022
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.22013 0.03879 -5.675 1.38e-08 ***
## GenderFemale -0.61035 0.06389 -9.553 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 877.06 on 11 degrees of freedom
4.6. ANALYSIS OF RESIDUALS 83

## Residual deviance: 783.61 on 10 degrees of freedom

## AIC: 856.55
##
## Number of Fisher Scoring iterations: 4

Based on these results, we see that females have an odds of admittance

which is e−0.61 = 0.54 times that of males; moreover, these results are highly
statistically signiﬁcant. One can imagine making a claim of serious gender bias
on the basis of this analysis.
Consider now the model logit(πij ) = α + βi + γj , so that the probability of
being accepted depends on both gender and department.

berk_glm <- glm(cbind(Admitted, Rejected) ~ Gender + Dept,

family = binomial, data = berk)
summary(berk_glm)

##
## Call:
## glm(formula = cbind(Admitted, Rejected) ~ Gender + Dept, family = binomial,
## data = berk)
##
## Deviance Residuals:
## 1 2 3 4 5 6 7 8
## -1.2487 -0.0560 1.2533 0.0826 1.2205 -0.2076 3.7189 0.2706
## 9 10 11 12
## -0.9243 -0.0858 -0.8509 0.2052
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.58205 0.06899 8.436 <2e-16 ***
## GenderFemale 0.09987 0.08085 1.235 0.217
## DeptB -0.04340 0.10984 -0.395 0.693
## DeptC -1.26260 0.10663 -11.841 <2e-16 ***
## DeptD -1.29461 0.10582 -12.234 <2e-16 ***
## DeptE -1.73931 0.12611 -13.792 <2e-16 ***
## DeptF -3.30648 0.16998 -19.452 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 877.056 on 11 degrees of freedom
## Residual deviance: 20.204 on 5 degrees of freedom
## AIC: 103.14
##
## Number of Fisher Scoring iterations: 4
84 CHAPTER 4. GENERALIZED LINEAR MODELS

Interestingly, our results conflict with those the previous analysis. Females
have an odds of admittance which is e0.1 = 1.11 times that of males, and the
results are not statistically significant. On the other hand, we see that the
department an individual applies to is highly important!
This is an example of Simpson’s paradox — the direction of the effect of sex
reverses after we take department into account. Essentially what is happening
is that females apply to departments which are more selective than males. In
particular, examining the raw data, we see that females tend not to apply to
departments A and B, which happen to also have very high acceptance rates.
Hence, the effect of gender is primarily accounted for by the tendency of females
to apply to selective departments.
Next, we observe that the model logit(πij ) = α + βi + γj actually does not
fit the data particularly well. We have a residual deviance of 20.24 on 5 degrees
of freedom, which gives a P -value of 0.001 for the test of the model against a
saturated model. To understand why the model does not appear to fit well, we
look at the standardized Pearson residuals:

rstandard(berk_glm, type = "pearson")

## 1 2 3 4 5 6
## -4.0272880 -0.2797222 1.8808316 0.1412619 1.6334924 -0.3026439
## 7 8 9 10 11 12
## 4.0272880 0.2797222 -1.8808316 -0.1412619 -1.6334924 0.3026439

We see very large residuals relative to a standard normal distribution for ob-
servation 1 (department A, male) and observation 7 (department A, female).
Examining these observations, we see that department A seems to have an ex-
tremely high acceptance rate for females; 512/825 males are accepted, while
89/108 females are accepted. If we ﬁt the model with department A removed,
we get

berk_no_A <- glm(cbind(Admitted, Rejected) ~ Dept + Gender,

family = binomial,
data = subset(berk, Dept != "A"))
summary(berk_no_A)

##
## Call:
## glm(formula = cbind(Admitted, Rejected) ~ Dept + Gender, family = binomial,
## data = subset(berk, Dept != "A"))
##
## Deviance Residuals:
## 2 3 4 5 6 8 9 10
## -0.1191 0.5239 -0.5164 0.6868 -0.5024 0.5680 -0.3914 0.5440
## 11 12
## -0.4892 0.5158
4.6. ANALYSIS OF RESIDUALS 85

##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.54418 0.08584 6.340 2.3e-10 ***
## DeptC -1.14008 0.12188 -9.354 < 2e-16 ***
## DeptD -1.19456 0.11984 -9.968 < 2e-16 ***
## DeptE -1.61308 0.13928 -11.581 < 2e-16 ***
## DeptF -3.20527 0.17880 -17.927 < 2e-16 ***
## GenderFemale -0.03069 0.08676 -0.354 0.724
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 539.4581 on 9 degrees of freedom
## Residual deviance: 2.5564 on 4 degrees of freedom
## AIC: 71.791
##
## Number of Fisher Scoring iterations: 3

This model seems to ﬁt the data extremely well, with deviance 2.56 on 4 degrees
of freedom. Considering department A in isolation, we have:

summary(glm(cbind(Admitted, Rejected) ~ Gender,

family = binomial,
data = subset(berk, Dept == "A")))

##
## Call:
## glm(formula = cbind(Admitted, Rejected) ~ Gender, family = binomial,
## data = subset(berk, Dept == "A"))
##
## Deviance Residuals:
## [1] 0 0
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.49212 0.07175 6.859 6.94e-12 ***
## GenderFemale 1.05208 0.26271 4.005 6.21e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1.9054e+01 on 1 degrees of freedom
## Residual deviance: 5.5511e-15 on 0 degrees of freedom
86 CHAPTER 4. GENERALIZED LINEAR MODELS

## AIC: 15.706
##
## Number of Fisher Scoring iterations: 3

We draw the following conclusions from this analysis:

1. Within department A, there appears to be substantial evidence that female
applicants have a higher acceptance rate than male applicants.
2. Ignoring department A, there is no evidence of an eﬀect of sex on admission
rate, after controlling for department.
As before, we urge caution in interpreting these results in a causal fashion; we
are not saying that being female causes one to have a higher acceptance rate in
department A, only that there is strong evidence of a statistical association.

4.7 How to Fit a GLM

In general, there is no closed form for the MLE β� of a generalized linear model.
Recall that the MLE satisﬁes the score equations
n
� ωi Xi (Yi − µi ) set
u(β) = = 0.
i=1
φV (µi )g � (µi )

This equation can be solved numerically using Newton’s method. Recall that,
to solve the equation g(x) = 0 Newton’s method updates

x ← x − (∇g)−1 (x)g(x)

where ∇g denotes the Jacobian matrix of g. Recalling that the Jacobian matrix
of −u(β) is the observed Fisher information, we get the following algorithm.

Algorithm 1 Newton’s method for computing β�

1: Initialize β (0) and set t = 0.
2: Set β (t+1) = β (t) + J (β (t) )−1 u(β (t) ).
3: Set t = t + 1.
4: If converged, set β� = β (t) ; else, return to Step 2.

This algorithm works quite well, provided that J is positive deﬁnite. A

second idea is to replace J with I. This leads to the so-called Fisher scoring
algorithm.
Remark 4.9. In R, the Fisher scoring method is the default method for ﬁtting
glms. Fisher scoring tends to be more numerically stable (note that I is the
variance of the score, so it is guaranteed to be positive deﬁnite), but can be
slow to converge near the optimum. Fisher scoring is also known as iteratively
reweighted least-squares; see Section 4.6.4 of Agresti for a discussion of this.
4.8. ESTIMATING THE DISPERSION PARAMETER 87

Algorithm 2 Fisher scoring method for computing β�

1: Initialize β (0) and set t = 0.
2: Set β (t+1) = β (t) + I(β (t) )−1 u(β (t) ).
3: Set t = t + 1.
4: If converged, set β � = β (t) ; else, return to Step 2.

Exercise 4.10. For what class of GLMs is Fisher scoring equivalent to

Newton’s method?

4.8 Estimating the Dispersion Parameter

The Poisson loglinear and binomial logistic regression models work with a ﬁxed/known
dispersion parameter φ. For other GLMs, or for quasi-likelihood GLMs, we do
not know the dispersion parameter φ.

Exercise 4.11. Show that the MLE β� does not depend on the value of the
dispersion parameter φ.

In principle, φ could be estimated by maximum likelihood, and the previous

exercise suggest that we can proceed by ﬁrst computing β� as usual and second
� This is not usually done in practice; instead,
optimizing over φ with β ﬁxed at β.
we prefer to use the so-called moment estimator
n
1 � ωi (Yi − µ�i )2
φ� = .
n − p i=1 V (�
µi )

� can be shown to have an asymptotic χ2 distribution

The statistic (n − p)φ/φ n−p
under the same conditions that the deviance is χ2n−p , allowing us to construct
conﬁdence intervals for φ if desired. This is also rarely done, the reason being
that the dispersion parameter φ is rarely itself of interest, and that the inter-
vals constructed may be highly sensitive to the parametric assumptions we are
making.

4.8.1 Why the Dispersion Parameter Matters

In view of Exercise 4.11, one might wonder why we care about the dispersion
parameter at all; we usually care about β, and φ does not impact the point
estimate of β. For example, the estimate of β� obtained from a Poisson loglinear
model and a quasi-Poisson loglinear model are the same.
� The
The reason we estimate φ is that it impacts the variance estimate of β.
Fisher information is given by

I = X � W X/φ.
88 CHAPTER 4. GENERALIZED LINEAR MODELS

Hence, the variance of β� is directly proportional to φ! If we were to use, say,

a Poisson loglinear model in the presence of overdispersion, the consequence is
that we will underestimate the variance of β� and end up with inference which
is highly inaccurate; in particular, we expect higher false positive rates and
lower coverage of conﬁdence intervals when the model does not account for
overdispersion.

4.9 More on Quasi-Likelihood

We now discuss in more detail quasi-likelihood methods. The starting point for
these methods is the score function of a GLM, which is (see Fact 4.3)

�N
ωi (Yi − µi )
u(β) = .
i=1
φV (µi )g � (µi )

Quasi-likelihood methods arise from the observation that the solution β� to the
equation u(β) = 0 is valid under the weaker set of assumptions:

• The mean response is given by E{Yi | Xi = x} = µi = g −1 (x� β).

φ
• The variance of the response is given by Var(Yi | Xi = x) = ωi V (µi ) for
some φ, known weight ωi , and function V (µ).

Further, the function u(β) behaves very much like a score function, even when
it does not correspond to any genuine GLM score function. For example, when
the assumptions above are true, the asymptotic variance of β� is given by the
inverse of a “psuedo” Fisher information

I −1 = φ(X � W X)−1 .

Example 4.9. Suppose that Yi given Xi = x has mean given by logit µi =

x� β and variance nφi V (µi ) = nφi µi (1 − µi ). This model is referred to as a
quasi-binomial model. This model is useful when a binomial logistic model
Zi ∼ Binomial(ni , πi ) where Yi = Zi /ni might be appropriate, but the data
is overdispersed, i.e., there is more variability in the observations than can be
account for by a binomial distribution. This might occur, for example, if the Zi ’s
are actually Binomial(ni , πi ) where the individual πi ’s are themselves random
variables.
As with the quasi-Poisson model, the quasi-binomial model will have the
same point estimator β� but the variance estimate will diﬀer be inﬂated by φ. �
The dispersion parameter in this case can be estimated in the same fashion as
usual GLM’s, i.e. with the moment estimator.
4.10. GROUPED VERSUS UNGROUPED DEVIANCE 89

4.10 Grouped versus ungrouped deviance

Recall the snoring dataset from Section 4.2.5. There are actually two equivalent
ways of writing this GLM.

(M1) For each level of snoring i, let Zi ∼ Binomial(ni , πi ) be the number of

individuals with heart disease where ni is the number of individuals at
snoring level i and πi is the probability of heart disease. Set Yi = Zi /ni .

(M2) For each of the 2484 individuals surveyed, let Yi ∼ Binomial(1, πi ) where
πi is the probability that an individual at snoring level Xi develops heart
disease.

These two approaches are describing the same generative model, but differ
in that M1 considers a total of N = 4 observations by grouping together all
individuals at the same level of snoring, whereas M2 considers a total of N =
2484 individuals. Fitting these two models will result in the same β, � the same
�
variance estimator for β, and the same inferences in general as the likelihood
function for β is the same.
These approaches differ in one key aspect! They do not result in
the same saturated model! Note that the saturated model for M1 has four
parameters: one for each snoring level. On the other hand, the saturated model
for M2 has 2484 parameters: one for each subject! Not only do the saturated
models differ, but the deviance statistic for M1 can be well-approximated by a
χ24−p distribution while the deviance statistic for M2 does not have an asymptotic
χ2 distribution because the number of parameters of the saturated model is grows
with n.
To see this, first, we examine the deviance statistic with the grouped data
(i.e., using M1)

deviance(snoring_fit)

## [1] 2.808912

df.residual(snoring_fit)

## [1] 2

coef(snoring_fit)

## (Intercept) snoring_scores
## -3.8662481 0.3973366

The model ﬁts quite well. Now, let’s ﬁt the model to ungrouped data (i.e.,
using M2).
90 CHAPTER 4. GENERALIZED LINEAR MODELS

counts <- c(24, 1355, 35, 603, 21, 192, 30, 224)
snore_scores <- c(0, 0, 2, 2, 4, 4, 5, 5)
binary_data <- c(rep(1, counts[1]), rep(0, counts[2]),
rep(1, counts[3]), rep(0, counts[4]),
rep(1, counts[5]), rep(0, counts[6]),
rep(1, counts[7]), rep(0, counts[8]))
binary_scores <- c(rep(0, counts[1]), rep(0, counts[2]),
rep(2, counts[3]), rep(2, counts[4]),
rep(4, counts[5]), rep(4, counts[6]),
rep(5, counts[7]), rep(5, counts[8]))
big_snore <- data.frame(disease = binary_data, snore = binary_scores)
tail(big_snore)

## disease snore
## 2479 0 5
## 2480 0 5
## 2481 0 5
## 2482 0 5
## 2483 0 5
## 2484 0 5

The object big snore has a separate row for each individual, giving the
disease status and snoring score. Fitting this model, we see that the estimated
MLE is the same but the deviance is diﬀerent.

big_snore_fit <- glm(disease ~ snore, family = binomial,

data = big_snore)
deviance(big_snore_fit)

## [1] 837.7316

df.residual(big_snore_fit)

## [1] 2482

coef(big_snore_fit)

## (Intercept) snore
## -3.8662481 0.3973366

For the purpose of assessing goodness of ﬁt, we should use the deviance for
M1 rather than M2.

Week 4 - EViews Practice Note
No ratings yet
Week 4 - EViews Practice Note
13 pages
Hayashi Econometrics
50% (2)
Hayashi Econometrics
686 pages
Generalized Linear Models
No ratings yet
Generalized Linear Models
109 pages
7 generalized linear models
No ratings yet
7 generalized linear models
16 pages
First Isomorphism Theorem
No ratings yet
First Isomorphism Theorem
6 pages
Robin Zhang: n+1 (R) (S) (R) (S) N I 1 I I N I 1 I I
No ratings yet
Robin Zhang: n+1 (R) (S) (R) (S) N I 1 I I N I 1 I I
8 pages
Lecture 25
No ratings yet
Lecture 25
5 pages
Homework5 PDF
No ratings yet
Homework5 PDF
2 pages
Stat IIa 2011 PDF
No ratings yet
Stat IIa 2011 PDF
212 pages
Introduction To Lie Algebras
100% (3)
Introduction To Lie Algebras
23 pages
stat5900_f24_lec9
No ratings yet
stat5900_f24_lec9
12 pages
31-partial
No ratings yet
31-partial
2 pages
Lecture 8: Channel Capacity, Continuous Random Variables: 1.1 Examples
No ratings yet
Lecture 8: Channel Capacity, Continuous Random Variables: 1.1 Examples
6 pages
Homework 6: AMATH 353 Partial Differential Equations and Waves Weston Barger Summer 2016
No ratings yet
Homework 6: AMATH 353 Partial Differential Equations and Waves Weston Barger Summer 2016
2 pages
Homeomorphisms of Hashimoto Topologies
No ratings yet
Homeomorphisms of Hashimoto Topologies
10 pages
Fourier Uncertainty Principles On Riemannian Manifolds: E S E S C D
No ratings yet
Fourier Uncertainty Principles On Riemannian Manifolds: E S E S C D
14 pages
Bayesian Week4 LectureNotes
No ratings yet
Bayesian Week4 LectureNotes
15 pages
document
No ratings yet
document
1 page
AntChapter2 PDF
No ratings yet
AntChapter2 PDF
12 pages
Conjugacy and the Exponential Family
No ratings yet
Conjugacy and the Exponential Family
6 pages
Cuddy
No ratings yet
Cuddy
19 pages
MA40189 20 Open
No ratings yet
MA40189 20 Open
6 pages
2
No ratings yet
2
1 page
scribed4.pdf
No ratings yet
scribed4.pdf
1 page
Exactly One Element
No ratings yet
Exactly One Element
20 pages
glm
No ratings yet
glm
4 pages
Prior Distribution
No ratings yet
Prior Distribution
14 pages
15
No ratings yet
15
11 pages
3 Categories, Functors, Natural Transformations
No ratings yet
3 Categories, Functors, Natural Transformations
4 pages
Introduction To The Calculus of Variations
100% (1)
Introduction To The Calculus of Variations
12 pages
Fibers, Surjective Functions, and Quotient Groups
No ratings yet
Fibers, Surjective Functions, and Quotient Groups
8 pages
Exponential Family
No ratings yet
Exponential Family
13 pages
Calc_notes
No ratings yet
Calc_notes
1 page
Chap14_Sec3
No ratings yet
Chap14_Sec3
67 pages
(Springer Texts in Statistics) Peter K. Dunn, Gordon K. Smyth - Generalized Linear Models With Examples in R-Springer (2018) - 228-258
No ratings yet
(Springer Texts in Statistics) Peter K. Dunn, Gordon K. Smyth - Generalized Linear Models With Examples in R-Springer (2018) - 228-258
31 pages
BowerAmandaFundGpAndBrouwer
No ratings yet
BowerAmandaFundGpAndBrouwer
6 pages
An Introduction To Modular Forms
No ratings yet
An Introduction To Modular Forms
58 pages
Lecture 3: Fano's, Differential Entropy, Maximum Entropy Distributions
No ratings yet
Lecture 3: Fano's, Differential Entropy, Maximum Entropy Distributions
4 pages
The Diophantine Equation: Line L
No ratings yet
The Diophantine Equation: Line L
32 pages
Fe 9
No ratings yet
Fe 9
38 pages
CS 229, Public Course Problem Set #4 Solutions: Unsupervised Learn-Ing and Reinforcement Learning
No ratings yet
CS 229, Public Course Problem Set #4 Solutions: Unsupervised Learn-Ing and Reinforcement Learning
12 pages
Tikoo
No ratings yet
Tikoo
11 pages
Chapter1_008e0482eeb202d783ff261981a86a6c
No ratings yet
Chapter1_008e0482eeb202d783ff261981a86a6c
13 pages
Lecture Notes For Engineering Mathematics: The Fourier Integral and Fourier Transforms
No ratings yet
Lecture Notes For Engineering Mathematics: The Fourier Integral and Fourier Transforms
49 pages
EM Converge Property
No ratings yet
EM Converge Property
8 pages
Bagging and Boosting: 9.520 Class 10, 13 March 2006 Sasha Rakhlin
No ratings yet
Bagging and Boosting: 9.520 Class 10, 13 March 2006 Sasha Rakhlin
19 pages
Math Prelims
No ratings yet
Math Prelims
1 page
Functional Equations: Tom Leinster Spring 2017
No ratings yet
Functional Equations: Tom Leinster Spring 2017
49 pages
Reading List 2020 21
No ratings yet
Reading List 2020 21
8 pages
Some Weaker Forms of Fuzzy Almost Continuous Mappings On Bulletin of Kerala Mathematics Association, 5 (2) (2009, Desecember), Pp. 109-113.
No ratings yet
Some Weaker Forms of Fuzzy Almost Continuous Mappings On Bulletin of Kerala Mathematics Association, 5 (2) (2009, Desecember), Pp. 109-113.
5 pages
Homework Problems: X X y y
No ratings yet
Homework Problems: X X y y
1 page
Course Notes 1232
No ratings yet
Course Notes 1232
172 pages
Class 12 Mathematics Topic Wise Line by Line Questions Chapter 4 Continuity, Differentiability and Differentiation
No ratings yet
Class 12 Mathematics Topic Wise Line by Line Questions Chapter 4 Continuity, Differentiability and Differentiation
74 pages
Lecture 2 - 4 Prior
No ratings yet
Lecture 2 - 4 Prior
51 pages
The Fundamental Theorem of Calculus 9.7.21
No ratings yet
The Fundamental Theorem of Calculus 9.7.21
10 pages
TBVP_Unit4
No ratings yet
TBVP_Unit4
62 pages
w6 - Statistical Modelling
No ratings yet
w6 - Statistical Modelling
24 pages
Integral of Inverse Functions
No ratings yet
Integral of Inverse Functions
3 pages
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Elgenfunction Expansions Associated with Second Order Differential Equations
From Everand
Elgenfunction Expansions Associated with Second Order Differential Equations
E. C. Titchmarsh
No ratings yet
An Introduction to Algebraic Topology
From Everand
An Introduction to Algebraic Topology
Andrew H. Wallace
No ratings yet
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Bayes-xG Player and Position Correction On Expected Goals (XG) Using Bayesian Hierarchical Approach
100% (1)
Bayes-xG Player and Position Correction On Expected Goals (XG) Using Bayesian Hierarchical Approach
28 pages
Spatial Econometric S 5 A
No ratings yet
Spatial Econometric S 5 A
19 pages
Desain Studi Epidemiologi Analitik
No ratings yet
Desain Studi Epidemiologi Analitik
52 pages
Quiz Financial Econometrics Spring 2023 15032023 010423pm
No ratings yet
Quiz Financial Econometrics Spring 2023 15032023 010423pm
18 pages
Linear Regression Analysis For Survey Data
No ratings yet
Linear Regression Analysis For Survey Data
28 pages
Strategic Games With Incomplete Information: Microeconomics I: Game Theory
No ratings yet
Strategic Games With Incomplete Information: Microeconomics I: Game Theory
50 pages
Heston Model: Sankarshan Basu Professor of Finance Indian Institute of Management Bangalore
No ratings yet
Heston Model: Sankarshan Basu Professor of Finance Indian Institute of Management Bangalore
3 pages
IJSTRA-2024-0073
No ratings yet
IJSTRA-2024-0073
6 pages
Model Weight (LBS) Price ($)
No ratings yet
Model Weight (LBS) Price ($)
14 pages
Completely Randomized Stat 101
No ratings yet
Completely Randomized Stat 101
22 pages
Application of Game Theory
100% (2)
Application of Game Theory
65 pages
Simple Annuity
No ratings yet
Simple Annuity
15 pages
BFD Theory Past Papers by Sir Saud Tariq ST Academy
No ratings yet
BFD Theory Past Papers by Sir Saud Tariq ST Academy
78 pages
Chapter 3: Enterprise Risk Management: Lecturer: Amadeus GABRIEL Sup de Co La Rochelle
No ratings yet
Chapter 3: Enterprise Risk Management: Lecturer: Amadeus GABRIEL Sup de Co La Rochelle
30 pages
Wealth Variation Calculator
No ratings yet
Wealth Variation Calculator
1 page
Regression: Rashid Mehmood M.Phil. (Education) 2 Semester
No ratings yet
Regression: Rashid Mehmood M.Phil. (Education) 2 Semester
22 pages
Assignment I Questions Econ. For Acct & Fin. 2023
No ratings yet
Assignment I Questions Econ. For Acct & Fin. 2023
3 pages
Exercises 9 - Decision Making 0 0
No ratings yet
Exercises 9 - Decision Making 0 0
6 pages
The Monte Carlo Simulation: Rustom D. Sutaria - Avia Intelligence 2016, Dubai
No ratings yet
The Monte Carlo Simulation: Rustom D. Sutaria - Avia Intelligence 2016, Dubai
4 pages
Parameters Estimation Methods of The Weibull Distribution: A Comparative Study
No ratings yet
Parameters Estimation Methods of The Weibull Distribution: A Comparative Study
9 pages
1+ (I) 150000 1+ (I) 90000 1+ (I) 20000 1+ (I) 20000 1+ (I) 30000 1+ (I) 40000 1+ (I) 60000 1+ (I) 80000 1+ (I) 90000 1+ (I) 76852 1+ (I) 120000 1+ (I) 140000 1+ (I)
No ratings yet
1+ (I) 150000 1+ (I) 90000 1+ (I) 20000 1+ (I) 20000 1+ (I) 30000 1+ (I) 40000 1+ (I) 60000 1+ (I) 80000 1+ (I) 90000 1+ (I) 76852 1+ (I) 120000 1+ (I) 140000 1+ (I)
2 pages
Asnawi Biostatistik JD
No ratings yet
Asnawi Biostatistik JD
4 pages
Math3132outline W19
No ratings yet
Math3132outline W19
2 pages
Anuba!!!
No ratings yet
Anuba!!!
5 pages
Case Study - Temp Viscosity and Comp
No ratings yet
Case Study - Temp Viscosity and Comp
4 pages
W1 W2 W3 W4 Supply F1 14 25 45 5 6 F2 65 25 35 55 8 F3 35 3 65 15 16 Demand 4 7 6 13
No ratings yet
W1 W2 W3 W4 Supply F1 14 25 45 5 6 F2 65 25 35 55 8 F3 35 3 65 15 16 Demand 4 7 6 13
2 pages
Identifiability, Exchangeability, and Epidemiological Confounding
No ratings yet
Identifiability, Exchangeability, and Epidemiological Confounding
7 pages
SPSS Statistics: A Practical Guide 5e 5th Edition Kellie Bennett - The latest ebook version is now available for instant access
100% (1)
SPSS Statistics: A Practical Guide 5e 5th Edition Kellie Bennett - The latest ebook version is now available for instant access
61 pages