0% found this document useful (0 votes)
3 views

Models for Polytomous Responses AA 2016-2017

The document discusses polytomous logistic regression, focusing on nominal and ordinal response variables. It explains the baseline-category logit models for nominal responses and introduces various models for analyzing associations in tables with ordered categories. Additionally, it provides examples, including a study on alligator food choices and methods for assessing student perceptions in statistics classes.

Uploaded by

vincenzo.090
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Models for Polytomous Responses AA 2016-2017

The document discusses polytomous logistic regression, focusing on nominal and ordinal response variables. It explains the baseline-category logit models for nominal responses and introduces various models for analyzing associations in tables with ordered categories. Additionally, it provides examples, including a study on alligator food choices and methods for assessing student perceptions in statistics classes.

Uploaded by

vincenzo.090
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

Section 9: Polytomous Logistic Regression

1
Polytomous data

If the response of an individual or item in a study is


restricted to one of a fixed set of possible values, we sa
that the response is polytomous.

The k possible values of Y are called the response ca-


tegories.

Often the categories can be defined in a qualitative or


non-numerical way.

We need to develop satisfactory models that distinguish


several types of polytomous response. For instance, if
the categories are ordered, there is no compelling reason
for treat the extreme categories in the same way as the
intermediate ones.

However, if the categories are simply an unstructured


collection of labels, there is no reason a priori to select
a subset of categories for special treatment.

2
Nominal response variable: baseline-category logit mo-
dels
Let Y be a nominal response variable with J categories.
Logit models for nominal responses pair each respon-
se category with a baseline category. The choice of
baseline category is arbitrary.
Given a vector x of explanatory variables
J
X
πj (x) = P (Y = j|x) πj (x) = 1
j=1
If we have n independent observations based on these
probabilities, the probability distribution for the number
of outcomes that occur for each J types is a multinomial
with probabilities
(π1 (x), . . . , πJ (x)).
This model is basically just an extension of the binary
logistic regression model. It gives a simultaneous repre-
sentation of the odds of being in one category relative
to being in another category, for all pairs of categories.
Once the model specifies logits for a certain J − 1 pairs
of categories, the rest are redundant.
If the last category (J) is the baseline, the baseline
category logits model
 
πj (x)
log = αj + βj0 x j = 1, . . . , J − 1
πJ (x)
will describe the effect of x on the J − 1 logits.
3
Notes

Parameters in the (J − 1) equations determine parame-


ters for logits using all other pairs of response categories.
For instance, for an arbitrary pair of categories a and b:
       
πa πa /πJ πa πb
log = log = log − log =
πb πb /πJ πJ πJ

= (αa + βa x) − (αb + βb x)
= (αa − αb ) + (βa − βb )x

4
Alligator Food Choice Example
The data is taken from a study by the Florida Game and
Fresh Water Fish Commission of factors influencing the
primary food choice of alligators.
Primary food type has five categories: Fish, Inverte-
brate, Reptile, Birth and Other.
Explanatory variables are the Lake where alligators were
sampled and the Length of alligator.
food<-factor(c("fish","invert","rep","bird","other"),
levels=c("fish","invert","rep", "bird","other"))
size<-factor(c("<2.3",">2.3"),levels=c(">2.3","<2.3"))
gender<-factor(c("m","f"),levels=c("m","f"))
lake<-factor(c("hancock","oklawaha","trafford","george"),
levels=c("george","hancock", "oklawaha","trafford"))

table.7.1<-expand.grid(food=food,size=size,
gender=gender,lake=lake)

temp<-c(7,1,0,0,5,4,0,0,1,2,16,3,2,2,3,3,0,1,2,3,2,2,0,0,1,
13,7,6,0,0,3,9,1,0,2,0,1,0,1,0,3,7,1,0,1,8,6,6,3,5,2,4,1,1,
4,0,1,0,0,0,13,10,0,2,2,9,0,0,1,2,3,9,1,0,1,8,1,0,0,1)

table.7.1<-structure(.Data=table.7.1[rep(1:nrow(table.7.1),
temp),], row.names=1:219)
We fit several models
library(nnet)
5
fitS<-multinom(food~lake*size*gender,data=table.7.1)
fit0<-multinom(food~1,data=table.7.1) # null
fit1<-multinom(food~gender,data=table.7.1) # G
fit2<-multinom(food~size,data=table.7.1) # S
fit3<-multinom(food~lake,data=table.7.1) # L
fit4<-multinom(food~size+lake,data=table.7.1) # L+S
fit5<-multinom(food~size+lake+gender,data=table.7.1) #L+S+G
The likelihood ratio test for each model:
deviance(fit1)-deviance(fitS)
deviance(fit2)-deviance(fitS)
deviance(fit3)-deviance(fitS)
deviance(fit4)-deviance(fitS)
deviance(fit5)-deviance(fitS)
deviance(fit0)-deviance(fitS)
Collapsing over gender:

fitS<-multinom(food~lake*size,data=table.7.1) # saturated mode


fit0<-multinom(food~1,data=table.7.1) # null
fit1<-multinom(food~size,data=table.7.1) # S
fit2<-multinom(food~lake,data=table.7.1) # L
fit3<-multinom(food~size+lake,data=table.7.1) # L + S

deviance(fit1)-deviance(fitS)
deviance(fit2)-deviance(fitS)
deviance(fit3)-deviance(fitS)
deviance(fit0)-deviance(fitS)
According to the AIC the best model is fit3:
summary(fit3)
In this example the baseline category is the one tha
crosses “fish”, “ > 2.3” and “george”.

Results:

• In the George lake and small alligators, the odd to


choose an invertebrate rather than a fish is exp(1.46)
that is 4.3 times the estimated odd for large alli-
gators. So Length of alligators plays an important
role in determining their primary food choice.

• The estimated odds to choose an invertebrate ra-


ther than a fish are higher in Trafford and Oklawaha
lakes and lower in the Hancock lake all compared
with George lake.

6
Starting from these results we can evaluate all the re-
dundant odds ratios.

Fo example we can try to evaluate the odds of choosing


an “invertebrate” against “other” as:
 
 
πI log ππFI  
πI
 
πO
log =   = log − log =
πO πO
log πF π F π F

= (−1.55 + 1.465Size − 1.66ZH + 0.94ZO + 1.12ZT )−


(−1.90 + 0.335Size + 0.83ZH + 0.01ZO + 1.52ZT ) =
= 0.35 + 1.135 − 2.48ZH + 0.93ZO − 0.39ZT

7
Ordinal response variables: Log-Linear Association
models

Many tables are formed by cross-classifying variables wi-


th ordered categories. These can be categorical but
ordinal, such as Likert scales (for example, Strongly Di-
sagree, Disagree, Neutral, Agree, Strongly Agree) or
continuous variables that have been discretized, such as
income formed into intervals.

Tables with ordered categories allow for models with


different types of association built in, since concepts of
direct and inverse relationships make sense.

This permits parsimonious representation of a lack of


independence.

8
Ordinal response variables: 1. Linear by Linear (Uni-
form) association

Consider a table with rows and columns with ordinal


categories (BOTH) and assume that there exist kno-
wn scores {ui } (for rows) and {vj } (for columns) that
represent that ordering.

These scores could be:

• the actual values of a discrete underlying variable

• a score linked to an underlying continuous variable

• an equispaced representation of a non-numerical,


but ordinal scale (such as a Likert scale).

Most typically ui = i and vj = j.

The LbyL association model is


logµij = λ + λX Y
i + λj + θui vj

with constraints such as λX Y


i = λj = 0.

This can be seen as a special case of saturated model


in which λXY
ij = θui vj .

The uniform association model adds only one parameter


θ to the independence model, focusing all possible lack
of independence on that one parameter.
9
Tables with ordered categories: 1. Linear by Linear
(Uniform) association

• If θ = 0 independence holds.

• If θ > 0 the model implies that a higher expected


cell count occurs when ui and vj either go up TO-
GETHER or go down TOGETHER, so there is a
direct association relationship.

• If θ < 0 the model implies that higher expected


cell counts occur when ui is high and vj is low,
or vice versa, so there is an inverse association
relationship.

The θ parameter has a simple interpretation in terms of


odds ratios: the log odds ratio is directly proportional
to the product of the distance between the rows and
the distance between the columns.

So for example, for the 2 × 2 table using the cells inter-


secting rows a and c with columns b and d, then:
 
µab µcd
log = θ(uc − ua )(vd − vb )
µad µcb

This log odds ratio is stronger as |β| increases and for


pairs of categories that are farther apart. So, when
ui = i and vj = j the local odds ratios for adjacent rows
and adjacent columns have common value of eθ .
10
Tables with ordered categories: 2. Row and Column
Effects Models

The uniform association model assumes prespecified row


and column scores. Sometimes either the rows or co-
lumns (but not both) are not ordinal, so such scores
don’t exist for the nominal variable.

Another possibility is that equispaced scores are not ap-


propriate for a set of rows or columns, and it is conve-
nient to estimate appropriate scores based on the ob-
served data (for example, for the Likert scaled rows and
columns it might be that “Strongly disagree” is closer
to “Disagree” than “Disagree” is to “Neutral”).

Models that can fit tables of his type are the row effects
and column effects models.

11
The row effects model R has the form
logµij = λ + λX Y
i + λj + τi vj

Constraints are needed such as λX Y


I = λJ = τI = 0. The
{τi } are called row effects. This model has (I-1) more
parameters than the independence model.

Independence can be seen as a special case in which


τ 1 = τ2 = . . . = τI .

The row effects model treats the column as ordinal with


known scores and rows as nominal, since τ can take on
any values that sum to zero.

For this class of models for any pairs of rows r < s and
columns c < d the log of the odds ratio formed from the
2 × 2 table of those rows and columns is
 
µrcµsd
log = (τs − τr )(vd − vc )
µrd µsc

The log odds ratio is proportional to the distance bet-


ween the columns, with the constant of proportionality
being τs − τr .

12
The column effects model C takes the form
logµij = λ + λX Y
i + λj + ρj ui

where the ρ parameters sum to zero.

This model treats the rows as ordinal with known scores


and columns as nominal. Here the quantity ρd − ρc is a
measure of the closeness of the columns c and d with
respect to the conditional distribution of the rows given
the column.

13
A generalization of he row and column effects models
that allows for both row and column effects in the local
odds ratio is the row + column effects model (R+C)
logµij = λ + λX Y
i + λj + τi vj + ρj ui

The local log odds ratio for unit-spaced row and column
scores is
(τi+1 − τi ) + (ρj+1 − ρj )
incorporating row effects and column effects.

14
L×L model Example
library(gnm)
library(vcdExtra)
data(Mental) #or in the same way
dati<-expand.grid(mental=c("well","mild",
"moderate","impaired"),ses=1:6)
dati$Freq=c(64,94,58,46,57,94,54,40,57,105,65,60,
72,141,77,94,36,97,54,78,21,71,54,71)
Display the frequency table
Mental.tab <- xtabs(Freq ~ mental+ses, data=Mental)
Fit Independence model
indep <- glm(Freq ~ mental+ses,family = poisson, data = Mental)
deviance(indep) #or
o<-glm(Freq~factor(mental)+factor(ses), family=poisson, data=dati)
deviance(o)

Fit a Linear by Linear Model: use integer scores for rows


and cols
Cscore <- as.numeric(Mental$ses)
Rscore <- as.numeric(Mental$mental)

linlin <- glm(Freq ~ mental + ses + Rscore:Cscore,


family = poisson, data = Mental)

Or
linlin2<-glm(formula = Freq ~ factor(mental) + factor(ses) +
as.numeric(mental):as.numeric(ses),
family = poisson, data = dati)

Now compare models


anova(indep,linlin)
AIC(indep,linlin)

15
Row effects model Example
roweff <- glm(Freq ~ mental + ses + mental:Cscore,
family = poisson, data = Mental)

roweff <- glm(Freq ~ factor(mental)+factor(ses) + mental:Cscore,


family = poisson, data = dati)

16
Column effects model Example
coleff <- glm(Freq ~ mental + ses + Rscore:ses,
family = poisson, data = Mental)

coleff <- glm(Freq ~ factor(mental)+factor(ses) + Rscore:ses,


family = poisson, data = dati)

17
Exercise: student perception of statistics class
assessment methods

Aim: the study of the association between the me-


thods used in class assessment (Structured computer
assignments, Open-ended assignments, Article analysis,
Annotating output) and the amount students learned
(Didn’t learn anything, Learned a little bit, Learned
enough to be comfortable with topic, learned a great
deal).
dati<-expand.grid(response=gl(4,1),
assignments=gl(4,1,labels = c("Structured", "Open","ArtAnaly",
"Annotoutput")))
dati$Freq<-c(0,3,8,3,0,1,7,6,1,6,4,2,0,4,8,2)
# display the frequency table
(assign.tab <- xtabs(Freq ~response+assignment, data=dati))
chisq.test(assign.tab) #test for independence

In this specific case LbyL model, R model and R+C


model cannot be applied because the method used for
assessments is a NOMINAL variable.

The only model that makes sense is a column effects


model:
Rscore <- as.numeric(dati$response)

coleff <- glm(Freq ~ as.factor(response) + as.factor(assignments)


+ Rscore:assignments,family = poisson, data = dati)

18
Ordinal response variables: 1. Cumulative Logit Models
The logits of the first J − 1 cumulative probabilities are:
 
P (Y ≤ j|x)
logit[P (Y ≤ j|x)] = log =
1 − P (Y ≤ j|x)
 
π1 (x) + π2 (x) + . . . + πj (x)
= log j = 1, . . . , J − 1
πj+1 (x) + . . . + πJ (x)

A model for the j-th cumulative logit looks like an or-


dinary logit model for a binary response in which cate-
gories 1 to j combine to form a single category, and
categories j + 1 to J form a second category.
It is possible to consider parsimonious models that con-
sider all the J − 1 cumulative logits in a single model:
Proportional Odds Model.
A Proportional Odds Model assumes the following struc-
ture:
logit[P (Y ≤ j|x)] = αj + β T x j = 1, . . . , J − 1
It considers:

• different intercepts for each cumulative logit and


these intercepts will be an increasing function with
j;

• a parameter β describing the effect of X on the log


odds of response in category j or below; it assumes
19
an identical effect of X for all J − 1 cumulative
logits.

This means that when this model fits well, it requires a


single parameter rather than J −1 parameters to describe
the effect of X.

This class of models is called Proportional Odds Model


because it satisfies
logit[P (Y ≤ j|x1 )] − logit[P (Y ≤ j|x2 )] =

P (Y ≤ j|x1 )/P (Y > j|x1 )


= log = β T ( x1 − x 2 )
P (Y ≤ j|x2 )/P (Y > j|x2 )
in other words, the cumulative log odds is proportional
to the distance between x1 and x2 , that is, the odd to
give an answer ≤ j when X = x1 is exp[β(x1 − x2 )] times
the odd in X = x2 and this value will be equal for all
the logits.
Comments:

• When the model holds with β = 0, X and Y are


statistically independent;

• Explanatory variables in cumulative logit models


can be continuous, categorical or of both types.

• The ML fitting process uses an iterative algorithm


simultaneously for all j.

20
For simplicity, let’s consider only one predictor:
logit[P (Y ≤ j)] = αj + βx

Then the cumulative probabilities are given by:


P (Y ≤ j) = exp(αj + βx)/(1 + exp(αj + βx))
and since β is constant, the curves of cumulative pro-
babilities plotted against x are parallel.

21
Cheese-Tasting Example (McCullagh and Nelder,
1989)

In this example, subjects were randomly assigned to ta-


ste one of four different cheeses. Response categories
are 1=strong dislike to 9=excellent taste.

By inspection, we can see that D is the most preferable,


followed by A, C and B.

Let’s try to model these data by a proportional-odds


cumulative-logit model with three dummy codes to di-
stinguish among the four chesses.

22
• How many logit models?
(J −1)∗(k−1) where Jis the number of the response
categoris and K the number of regressors in the
model;

• The model will have 8 intercepts (one for each of


the logit equations) and 3 slopes, for a total of 11
free parameters.

• By comparison, the saturated model, which fits


a separate 9-category multinomial distribution to
each of the four cheeses, has 4 × (9 − 1) = 32 free
parameters.

• Therefore, the overall goodness-of-fit test will have


32-11 = 21 degrees of freedom.

23
The vglm() function

The VGAM library in R contains the vglm() function


useful in order to fit several models. Possible models
include the cumulative logit model (family function cu-
mulative) with proportional odds or partial proportional
odds or nonproportional odds, cumulative link models
(family function cumulative) with or without common
effects for each cutpoint, adjacent-categories logit mo-
dels (family function acat), and continuation-ratio logit
models (family functions cratio and sratio).

The vglm() function needs the response variable spe-


cified in its “cbinded” form (in its Full Disjunctive Co-
ding).

The syntax of the vglm() function is very similar to the


standard glm().
An important difference is that the weigths argument
unlike glm() has not to be a vector of frequencies but
weights defined a priori.

24
library(VGAM)
cheese <- read.table("cheese.dat.txt",
col.names=c("Cheese", "Response", "N"))
is.factor(cheese$Response)
cheese$Response<-factor(cheese$Response, ordered=T)
mod.sat<-vglm(Response~Cheese,cumulative,
weights=c(N+0.5),data=cheese)

mod.podds<-vglm(Response~Cheese,cumulative(parallel=TRUE),
weights=c(N+0.5),data=cheese)

summary(mod.sat)
summary(mod.podds)
matplot(t(mod.podds@predictors[seq(1,36,by=9),]),type="l",
ylab="Cumulative logits",main="Proportional odds model")
#Add a legend will be surely useful!
matplot(t((exp(mod.podds@predictors)/(1+exp(mod.podds@predictors)))
[seq(1,36,by=9),]),type="l",ylab="Cumulative Probability Curves",
main="Proportional odds model")

25
In this case, a positive coefficient β means that in-
creasing the value of X tends to lower the response
categories (i.e. produce greater dislike).
summary(mod.podds)

Call:
vglm(formula = Response ~ Cheese, family = cumulative(parallel = TRUE),
data = cheese, weights = c(N + 0.5))

Coefficients:
Estimate Std. Error z value
(Intercept):1 -4.84428 0.45697 -10.60089
(Intercept):2 -3.84779 0.37446 -10.27564
(Intercept):3 -2.86231 0.32751 -8.73959
(Intercept):4 -1.91322 0.29232 -6.54497
(Intercept):5 -0.73965 0.25589 -2.89044
(Intercept):6 0.10951 0.24755 0.44237
(Intercept):7 1.44853 0.28180 5.14020
(Intercept):8 2.89229 0.36928 7.83216
CheeseB 2.82260 0.38300 7.36978
CheeseC 1.44005 0.34794 4.13883
CheeseD -1.39122 0.35218 -3.95026

Residual deviance: 817.3119 on 277 degrees of freedom

Log-likelihood: -408.656 on 277 degrees of freedom

26
CheeseB 2.82260 0.38300 7.36978
CheeseC 1.44005 0.34794 4.13883
CheeseD -1.39122 0.35218 -3.95026

The second part of the output is the coefficient esti-


mates for the three dummy variables. The estimated
slope for the first dummy variable, labeled cheese B, is
2.82260. This indicates that cheese B does not taste
as good as cheese A. Looking at all three coefficients,
and noting that cheese A is the reference category such
that β2 compares cheese C to A and β3 cheese D to A,
we see that the implied ordering of cheeses in terms of
quality is D > A > C > B. Furthermore, D is signifi-
cantly better preferred than A, but A is not significantly
better than C.

The first part of the output includes the estimated in-


tercepts. The first parameter is the estimated log-
odds of falling into category 1 (strong dislike) versus
all other categories when all X-variables are zero. Be-
cause X1 = X2 = X3 = 0 when cheese=A, the esti-
mated log-odds of better taste for cheese A are exp(-
4.84428). From the above output, the first estimated
logit equation then is

P (Y ≤ 1)
logit[P (Y ≤ 1] = log =
P (Y > 1)

= −4.84428 + 2.82260X1 + 1.44005X2 − 1.39122X3

27
28
Ordinal response variables: 2. Adjacent-Category Logi-
ts models

Adjacent-Category Logits models can be defined as:


 
πj
logit[P (Y = j|Y = jorj+1)] = log , j = 1, . . . , J−1
πj+1
 
πj
log = αj + βx j = 1, . . . , J − 1
πj+1
with a common effect β.

Also in this case a set of logit will be defined and starting


from them it will be possible to derive all the J2 pairs

of response categories.

An Adjacent-Category Logits model can be seen as a


baseline logit where the baseline changes for each cate-
gory.

29
Job Satisfaction Example

Aim of this example is to study the relationship between


job satisfaction (Very Dissatisfied, Little Satisfied, Mo-
derately Satisfied, Very Satisfied) and income (< 5.000,
5.000 − 15.000, 15.000 − 25.000, > 25.000) stratified by
gender (1=female, 0=males), for black Americans.

For simplicity, we use job satisfaction scores and income


scores 1, 2, 3, 4.

The fitted model will be


log(πj /πj+1 ) = αj + β1 x + β2 g j = 1, 2, 3

IIt describes the odds of being very dissatisfied instead of


a little satisfied, a little instead of moderately satisfied,
and moderately instead of very satisfied. This model
is equivalent to the baseline-category logit model with
reference category 4, ovvero

log(πj /π4 ) = α∗j + β1 (4 − j)x + β2 (4 − j)g j = 1, 2, 3

30
In order to fit an Adjacent-Category Logit model in
R we have to specify the acat family specifying the Link
function applied to the ratios of the adjacent categories
probabilities (loge) and parallel=TRUE A logical if in the
formula some terms are assumed to have equal/unequal
coefficients.

table.7.8<-read.table("jobsat.txt", header=TRUE)
table.7.8$jobsatf<-ordered(table.7.8$jobsat,
labels=c("very diss","little sat","mod sat",
"very sat"))

table.7.8a<- data.frame(expand.grid(income=1:4,
gender=c(1,0)),unstack(table.7.8,freq~jobsatf))

library(VGAM)

fit.vglm<-vglm(cbind(very.diss,little.sat,
mod.sat,very.sat)~gender+income,
family= acat(link="loge",parallel=T,reverse=T),
data=table.7.8a)

summary(fit.vglm)

31
summary(fit.vglm)

Coefficients:
Estimate Std. Error z value
(Intercept):1 -0.550668 0.67945 -0.81046
(Intercept):2 -0.655007 0.52527 -1.24700
(Intercept):3 2.025934 0.57581 3.51842
gender 0.044694 0.31444 0.14214
income -0.388757 0.15465 -2.51372

Number of linear predictors: 3

Names of linear predictors:


log(P[Y=1]/P[Y=2]), log(P[Y=2]/P[Y=3]), log(P[Y=3]/P[Y=4])

Dispersion Parameter for acat family: 1

Residual deviance: 12.55018 on 19 degrees of freedom

The ML fit gives beta ˆ 1 = −0.389(SE = 0.155) and


β̂2 = 0.045(SE = 0.314). For this parameterization,
ˆ 1 < 0 means the odds of lower job satisfaction de-
beta
crease as income increases. Given gender, the estimated
odds of response in the lower of two adjacent catego-
ries multiplies by exp(−0.389) = 0.68 for each category
increase in income. The model describes 24 logits (th-
ree for each income × gender combination) with five
parameters. Its deviance G2 = 12.6 with df = 19. This
model with a linear trend for the income effect and a
lack of interaction between income and gender seems
adequate.

32
Ordinal response variables: 3. Continuation-Ratio Lo-
gits

Continuation-ratio logits can be defined as


 
πj
log j = 1, . . . , J − 1
πj+1 + πj+2 + . . . + πJ
or also
 
πj+1
log j = 1, . . . , J − 1
π 1 + π2 + . . . + πj
They are useful when the response variable represents a
sequential mechanism such as the survival as a function
of age.

Let ωj = P (Y = j|Y ≥ j), given the vector of explana-


tory variables x
πj (x)
ωj (x) = j = 1, . . . , J − 1
πj (x) + . . . + πJ (x)
h i
ωj (x)
and continuation-ratios became ordinary logits log 1−ωj (x)
.

33
Esempio: Streptococcus e grandezza delle tonsille

Aim of the study is to investigate the relationship bet-


ween tonsils size (Not enlarged, Enlarged, Greatly En-
larged) and the presence of Streptococcus (1 = yes, 0
= no). Let x be the indicator variable about the pre-
sence of Streptococcus pyogenes; then the continuatio
logit model will be
 
π1
log = α1 + βx
π 2 + π3
 
π2
log = α2 + βx
π3

where in the first part a common value of the cumulative


odds ratio will be estimated while in the second part we
will estimate a local odds rartio.

carrier<-c(1,0)
y1<-c(19,497)
y2<-c(29,560)
y3<-c(24,269)
tonsil<-cbind(carrier,y1,y2,y3)
tonsil<-as.data.frame(tonsil)
tonsil$carrier<-as.factor(tonsil$carrier)

library(VGAM)
fit.cratio<-vglm(cbind(y1,y2,y3)~carrier,
family=cratio(reverse=FALSE, parallel=TRUE),
34
data=tonsil)
summary(fit.cratio)
fitted(fit.cratio)

The model goodness of fit shows an adequacy of the


fitted model (deviance 0.01, df = 1); β̂ = −0.528(SE =
0.197)

For Streptococcus carriers the odd of having “Enlarged”


tonsils vs “Greatly Enlarged” is 0.59 (exp(-0.528)) the
odd of not carriers.
Section 9b:The Bradley-terry Model

35
Consider an experiment consisting of nij judges who
compare pairs of items Ti , i = 1, . . . , M +1. They
PP express
their preferences between Ti and Tj . Let N = i<j nij
be the total number of pairwise comparisons, and assu-
me independence for ratings of the same pair by diffe-
rent judges and for ratings of different pairs by the same
judge.
A model describing this experiment was proposed by
Bradley and Terry (1952) and Zermelo (1929). Let πi
be the worth of item Ti ,
πi
P [Ti > Tj ] = pi/ij =
π i + πj
i 6= j, where Ti > Tj means i is preferred over j. Suppose
that πi > 0. Let Yij be the number of times that Ti is
preferred over Tj in the nij comparisons of the pairs.
Then Yij ∼ Bin(nij , pi/ij ).

Maximum likelihood estimation of the parameters π1 , . . . , πM +1


involves maximizing,

M +1   yij  nij −yij


Y nij  πi πj
i<j
yij πi + πj πi + πj

By default, πM +1 ≡ 1 is used for identifiability, however,


this can be changed very easily.

36
Note that one can define linear predictors ηij of the form
   
πi πi
logit = log = λi − λj
πi + πj πj
.

The VGAM framework can handle the Bradley-Terry


model only for intercept models. It has
λj = ηj = logπj = β(1)j , j = 1, . . . , M.

As well as having many applications in the field of prefe-


rences, the Bradley-Terry model has many uses in mo-
delling “contests” between teams i and j, where only
one of the teams can win in each contest (ties are not
allowed under the classical model).

The R package BradleyTerry by D. Firth can fit the


Bradley-Terry model; see Firth (2005) for details.

37
Example: the brat() function in VGAM

Consider the effect of the food-enhancer monosodium


glutamate (MSG) on the flavour of apple sauce in the
data given in Table 4. Treatments 1, 2 and 3 are in-
creasing amounts of the substance, and Treatment 4
is a control with no MSG. Four independent compa-
risons were made of each of the six pairs. We apply
the vgam family function brat(), which implements the
Bradley-Terry model, to the apple sauce data.
amsg = matrix(c(NA, 3, 3, 3, 1, NA, 3, 4, 1, 1, NA, 0,
+ 1, 0, 4, NA), 4, 4, byrow = TRUE)

dimnames(amsg) = list(winner = as.character(1:4),


loser = as.character(1:4))

fit = vglm(Brat(amsg) ~ 1, brat)


summary(fit)

The first argument has to be specified in a Brat form: a


matrix of counts, which is considered M by M in dimen-
sion when there are ties, and M +1 by M +1 when there
are no ties. The rows are winners and the columns are
losers, e.g., the 2 − 1 element is how many times Com-
petitor 2 has beaten Competitor 1. The matrices are
best labelled with the competitors’ names.

38
Coef(fit)

alpha1 alpha2 alpha3


3.3576125 2.4456117 0.3693147

By default, the last reference group is baseline, so that


λ4 ≡ 1. We have λ̂1 ≈ 3.358, λ̂2 ≈ 2.446, λ̂3 ≈ 0.369,
therefore we conclude that Treatment 1 is the most
preferred, followed by Treatments 2 and 4, and lastly
Treatment 3. It appears that the more MSG, the worse
the taste, however, the control treatment tastes the
second worse. Finally,
InverseBrat(fitted(fit))

1 2 3 4
1 NA 0.5785771 0.9009064 0.7705165
2 0.42142293 NA 0.8688013 0.7097758
3 0.09909362 0.1311987 NA 0.2697077
4 0.22948346 0.2902242 0.7302923 NA

gives the estimated probabilities of Treatments i “bea-


ting” Treatments j, P̂ [i > j].

39
The Bradley-terry Model: the BradleyTerry2
package

In some application contexts there may be “player-specific”


explanatory variables available, and it is then natural to
consider model simplification of the form
p
X
λi = βr xir + Ui
r=1

in which ability of each player i is related to explana-


tory variables xi1 , . . . , xip through a linear predictor with
coefficients β1 , . . . , βp ; the {Ui } are independent errors.
See, for example, Springall (1973). The difference in
the abilities of player i and player j is modelled by

p
X p
X
λi = βr xir − βr xjr + Ui − Uj
r=1 r=1

where Ui ∼ N (0, σ 2 ) for all i. The Bradley-Terry mo-


del is then a generalized linear mixed model, which the
BTm function currently fits using the penalized quasi-
likelihood algorithm of Breslow and Clayton (1993).

The BTm function of the BradleyTerry package allows


such models to be specified in a natural way using the
standard S-language model formulae.
The simplest model, with just one predictor, asks for
random effects specification.
40
Example: the BTm() function in BradleyTerry2

The following comes from page 448 of Agresti (2002),


extracted from the larger table of Stigler (1994). The
data are counts of citations among four prominent jour-
nals of statistics:
> data(citations)
> citations

winner loser Freq


1 Biometrika Biometrika NA
2 Comm Statist Biometrika 33
3 JASA Biometrika 320
4 JRSS-B Biometrika 284
5 Biometrika Comm Statist 730
6 Comm Statist Comm Statist NA
7 JASA Comm Statist 813
8 JRSS-B Comm Statist 276
9 Biometrika JASA 498
10 Comm Statist JASA 68
11 JASA JASA NA
12 JRSS-B JASA 325
13 Biometrika JRSS-B 221
14 Comm Statist JRSS-B 17
15 JASA JRSS-B 142
16 JRSS-B JRSS-B NA

Here ‘winner’ means the cited journal, ‘loser’ the journal


in which the citation appears; thus, for example, Bio-
metrika was cited 498 times by papers in JASA during
the period under study.

41
The Bradley-Terry model can now be fitted by using
function BTm from the BradleyTerry package. Here we
fit the INTERCEPT model and store the result as an
object named citeModel:
> library(BradleyTerry2)

Convert frequencies to success/failure data:

> citations.sf <- countsToBinomial(citations)

> names(citations.sf)[1:2] <- c("journal1", "journal2")

> head(citations.sf)
journal1 journal2 win1 win2
1 Biometrika Comm Statist 730 33
2 Biometrika JASA 498 320
3 Biometrika JRSS-B 221 284
4 Comm Statist JASA 68 813
5 Comm Statist JRSS-B 17 276
6 JASA JRSS-B 142 325

42
Standard Bradley-Terry model fitted to these data

> citeModel <- BTm(cbind(win1, win2), journal1, journal2,


data = citations.sf)
> citeModel

Bradley Terry model fit by glm.fit

Call: BTm(outcome = cbind(win1, win2), player1 = journal1,


player2 = journal2, data = citations.sf)

Coefficients:
..Comm Statist ..JASA ..JRSS-B
-2.9491 -0.4796 0.2690

Degrees of Freedom: 6 Total (i.e. Null); 3 Residual


Null Deviance: 1925
Residual Deviance: 4.293 AIC: 46.39

The coefficients here are maximum likelihood estimates


of λ2 , λ3 , λ4 , with λ1 (the log-ability for Biometrika) set
to zero as an identifying convention.

43
If a different ‘reference’ journal is required, this can be
achieved using the optional refcat argument: for exam-
ple, making use of update to avoid re-specifying the
whole model,

> update(citeModel, refcat = "JASA")


Bradley Terry model fit by glm.fit

Call: BTm(outcome = cbind(win1, win2), player1 = journal1,


player2 = journal2, refcat = "JASA", data = citations.sf)

Coefficients [contrasts: ..=contr.treatment ]:


..Biometrika ..Comm Statist ..JRSS-B
0.4796 -2.4695 0.7485

Degrees of Freedom: 6 Total (i.e. Null); 3 Residual


Null Deviance: 1925
Residual Deviance: 4.293 AIC: 46.39

It is the same model in a different parameterization.

BTm(outcome = cbind(win1, win2), player1 = journal1, player2 = journal2,


formula = ~journal, id = "journal", refcat = "JASA", data = citations.sf)

44
Example by Gioè-Guastella (a.a 2016/2017)

library(VGAM)

data(football)
f<-football[-1]

ff1<-table(subset(f,result==1))
ff2<-table(subset(f,result==-1))
ff3<-ff1+ff2
ff3<-as.data.frame(ff3)
ff4<-matrix(ff3$Freq,29,29, byrow=T)
diag(ff4)<-rep(NA,29)
dimnames(ff4)<-list(ff3$home[1:29],ff3$home[1:29])
ff4<-t(ff4)

fit=vglm(Brat(ff4)~1,brat)

classifica<-sort((fit@coefficients))

45

You might also like