Models for Polytomous Responses AA 2016-2017
Models for Polytomous Responses AA 2016-2017
1
Polytomous data
2
Nominal response variable: baseline-category logit mo-
dels
Let Y be a nominal response variable with J categories.
Logit models for nominal responses pair each respon-
se category with a baseline category. The choice of
baseline category is arbitrary.
Given a vector x of explanatory variables
J
X
πj (x) = P (Y = j|x) πj (x) = 1
j=1
If we have n independent observations based on these
probabilities, the probability distribution for the number
of outcomes that occur for each J types is a multinomial
with probabilities
(π1 (x), . . . , πJ (x)).
This model is basically just an extension of the binary
logistic regression model. It gives a simultaneous repre-
sentation of the odds of being in one category relative
to being in another category, for all pairs of categories.
Once the model specifies logits for a certain J − 1 pairs
of categories, the rest are redundant.
If the last category (J) is the baseline, the baseline
category logits model
πj (x)
log = αj + βj0 x j = 1, . . . , J − 1
πJ (x)
will describe the effect of x on the J − 1 logits.
3
Notes
= (αa + βa x) − (αb + βb x)
= (αa − αb ) + (βa − βb )x
4
Alligator Food Choice Example
The data is taken from a study by the Florida Game and
Fresh Water Fish Commission of factors influencing the
primary food choice of alligators.
Primary food type has five categories: Fish, Inverte-
brate, Reptile, Birth and Other.
Explanatory variables are the Lake where alligators were
sampled and the Length of alligator.
food<-factor(c("fish","invert","rep","bird","other"),
levels=c("fish","invert","rep", "bird","other"))
size<-factor(c("<2.3",">2.3"),levels=c(">2.3","<2.3"))
gender<-factor(c("m","f"),levels=c("m","f"))
lake<-factor(c("hancock","oklawaha","trafford","george"),
levels=c("george","hancock", "oklawaha","trafford"))
table.7.1<-expand.grid(food=food,size=size,
gender=gender,lake=lake)
temp<-c(7,1,0,0,5,4,0,0,1,2,16,3,2,2,3,3,0,1,2,3,2,2,0,0,1,
13,7,6,0,0,3,9,1,0,2,0,1,0,1,0,3,7,1,0,1,8,6,6,3,5,2,4,1,1,
4,0,1,0,0,0,13,10,0,2,2,9,0,0,1,2,3,9,1,0,1,8,1,0,0,1)
table.7.1<-structure(.Data=table.7.1[rep(1:nrow(table.7.1),
temp),], row.names=1:219)
We fit several models
library(nnet)
5
fitS<-multinom(food~lake*size*gender,data=table.7.1)
fit0<-multinom(food~1,data=table.7.1) # null
fit1<-multinom(food~gender,data=table.7.1) # G
fit2<-multinom(food~size,data=table.7.1) # S
fit3<-multinom(food~lake,data=table.7.1) # L
fit4<-multinom(food~size+lake,data=table.7.1) # L+S
fit5<-multinom(food~size+lake+gender,data=table.7.1) #L+S+G
The likelihood ratio test for each model:
deviance(fit1)-deviance(fitS)
deviance(fit2)-deviance(fitS)
deviance(fit3)-deviance(fitS)
deviance(fit4)-deviance(fitS)
deviance(fit5)-deviance(fitS)
deviance(fit0)-deviance(fitS)
Collapsing over gender:
deviance(fit1)-deviance(fitS)
deviance(fit2)-deviance(fitS)
deviance(fit3)-deviance(fitS)
deviance(fit0)-deviance(fitS)
According to the AIC the best model is fit3:
summary(fit3)
In this example the baseline category is the one tha
crosses “fish”, “ > 2.3” and “george”.
Results:
6
Starting from these results we can evaluate all the re-
dundant odds ratios.
7
Ordinal response variables: Log-Linear Association
models
8
Ordinal response variables: 1. Linear by Linear (Uni-
form) association
• If θ = 0 independence holds.
Models that can fit tables of his type are the row effects
and column effects models.
11
The row effects model R has the form
logµij = λ + λX Y
i + λj + τi vj
For this class of models for any pairs of rows r < s and
columns c < d the log of the odds ratio formed from the
2 × 2 table of those rows and columns is
µrcµsd
log = (τs − τr )(vd − vc )
µrd µsc
12
The column effects model C takes the form
logµij = λ + λX Y
i + λj + ρj ui
13
A generalization of he row and column effects models
that allows for both row and column effects in the local
odds ratio is the row + column effects model (R+C)
logµij = λ + λX Y
i + λj + τi vj + ρj ui
The local log odds ratio for unit-spaced row and column
scores is
(τi+1 − τi ) + (ρj+1 − ρj )
incorporating row effects and column effects.
14
L×L model Example
library(gnm)
library(vcdExtra)
data(Mental) #or in the same way
dati<-expand.grid(mental=c("well","mild",
"moderate","impaired"),ses=1:6)
dati$Freq=c(64,94,58,46,57,94,54,40,57,105,65,60,
72,141,77,94,36,97,54,78,21,71,54,71)
Display the frequency table
Mental.tab <- xtabs(Freq ~ mental+ses, data=Mental)
Fit Independence model
indep <- glm(Freq ~ mental+ses,family = poisson, data = Mental)
deviance(indep) #or
o<-glm(Freq~factor(mental)+factor(ses), family=poisson, data=dati)
deviance(o)
Or
linlin2<-glm(formula = Freq ~ factor(mental) + factor(ses) +
as.numeric(mental):as.numeric(ses),
family = poisson, data = dati)
15
Row effects model Example
roweff <- glm(Freq ~ mental + ses + mental:Cscore,
family = poisson, data = Mental)
16
Column effects model Example
coleff <- glm(Freq ~ mental + ses + Rscore:ses,
family = poisson, data = Mental)
17
Exercise: student perception of statistics class
assessment methods
18
Ordinal response variables: 1. Cumulative Logit Models
The logits of the first J − 1 cumulative probabilities are:
P (Y ≤ j|x)
logit[P (Y ≤ j|x)] = log =
1 − P (Y ≤ j|x)
π1 (x) + π2 (x) + . . . + πj (x)
= log j = 1, . . . , J − 1
πj+1 (x) + . . . + πJ (x)
20
For simplicity, let’s consider only one predictor:
logit[P (Y ≤ j)] = αj + βx
21
Cheese-Tasting Example (McCullagh and Nelder,
1989)
22
• How many logit models?
(J −1)∗(k−1) where Jis the number of the response
categoris and K the number of regressors in the
model;
23
The vglm() function
24
library(VGAM)
cheese <- read.table("cheese.dat.txt",
col.names=c("Cheese", "Response", "N"))
is.factor(cheese$Response)
cheese$Response<-factor(cheese$Response, ordered=T)
mod.sat<-vglm(Response~Cheese,cumulative,
weights=c(N+0.5),data=cheese)
mod.podds<-vglm(Response~Cheese,cumulative(parallel=TRUE),
weights=c(N+0.5),data=cheese)
summary(mod.sat)
summary(mod.podds)
matplot(t(mod.podds@predictors[seq(1,36,by=9),]),type="l",
ylab="Cumulative logits",main="Proportional odds model")
#Add a legend will be surely useful!
matplot(t((exp(mod.podds@predictors)/(1+exp(mod.podds@predictors)))
[seq(1,36,by=9),]),type="l",ylab="Cumulative Probability Curves",
main="Proportional odds model")
25
In this case, a positive coefficient β means that in-
creasing the value of X tends to lower the response
categories (i.e. produce greater dislike).
summary(mod.podds)
Call:
vglm(formula = Response ~ Cheese, family = cumulative(parallel = TRUE),
data = cheese, weights = c(N + 0.5))
Coefficients:
Estimate Std. Error z value
(Intercept):1 -4.84428 0.45697 -10.60089
(Intercept):2 -3.84779 0.37446 -10.27564
(Intercept):3 -2.86231 0.32751 -8.73959
(Intercept):4 -1.91322 0.29232 -6.54497
(Intercept):5 -0.73965 0.25589 -2.89044
(Intercept):6 0.10951 0.24755 0.44237
(Intercept):7 1.44853 0.28180 5.14020
(Intercept):8 2.89229 0.36928 7.83216
CheeseB 2.82260 0.38300 7.36978
CheeseC 1.44005 0.34794 4.13883
CheeseD -1.39122 0.35218 -3.95026
26
CheeseB 2.82260 0.38300 7.36978
CheeseC 1.44005 0.34794 4.13883
CheeseD -1.39122 0.35218 -3.95026
P (Y ≤ 1)
logit[P (Y ≤ 1] = log =
P (Y > 1)
27
28
Ordinal response variables: 2. Adjacent-Category Logi-
ts models
29
Job Satisfaction Example
30
In order to fit an Adjacent-Category Logit model in
R we have to specify the acat family specifying the Link
function applied to the ratios of the adjacent categories
probabilities (loge) and parallel=TRUE A logical if in the
formula some terms are assumed to have equal/unequal
coefficients.
table.7.8<-read.table("jobsat.txt", header=TRUE)
table.7.8$jobsatf<-ordered(table.7.8$jobsat,
labels=c("very diss","little sat","mod sat",
"very sat"))
table.7.8a<- data.frame(expand.grid(income=1:4,
gender=c(1,0)),unstack(table.7.8,freq~jobsatf))
library(VGAM)
fit.vglm<-vglm(cbind(very.diss,little.sat,
mod.sat,very.sat)~gender+income,
family= acat(link="loge",parallel=T,reverse=T),
data=table.7.8a)
summary(fit.vglm)
31
summary(fit.vglm)
Coefficients:
Estimate Std. Error z value
(Intercept):1 -0.550668 0.67945 -0.81046
(Intercept):2 -0.655007 0.52527 -1.24700
(Intercept):3 2.025934 0.57581 3.51842
gender 0.044694 0.31444 0.14214
income -0.388757 0.15465 -2.51372
32
Ordinal response variables: 3. Continuation-Ratio Lo-
gits
33
Esempio: Streptococcus e grandezza delle tonsille
carrier<-c(1,0)
y1<-c(19,497)
y2<-c(29,560)
y3<-c(24,269)
tonsil<-cbind(carrier,y1,y2,y3)
tonsil<-as.data.frame(tonsil)
tonsil$carrier<-as.factor(tonsil$carrier)
library(VGAM)
fit.cratio<-vglm(cbind(y1,y2,y3)~carrier,
family=cratio(reverse=FALSE, parallel=TRUE),
34
data=tonsil)
summary(fit.cratio)
fitted(fit.cratio)
35
Consider an experiment consisting of nij judges who
compare pairs of items Ti , i = 1, . . . , M +1. They
PP express
their preferences between Ti and Tj . Let N = i<j nij
be the total number of pairwise comparisons, and assu-
me independence for ratings of the same pair by diffe-
rent judges and for ratings of different pairs by the same
judge.
A model describing this experiment was proposed by
Bradley and Terry (1952) and Zermelo (1929). Let πi
be the worth of item Ti ,
πi
P [Ti > Tj ] = pi/ij =
π i + πj
i 6= j, where Ti > Tj means i is preferred over j. Suppose
that πi > 0. Let Yij be the number of times that Ti is
preferred over Tj in the nij comparisons of the pairs.
Then Yij ∼ Bin(nij , pi/ij ).
36
Note that one can define linear predictors ηij of the form
πi πi
logit = log = λi − λj
πi + πj πj
.
37
Example: the brat() function in VGAM
38
Coef(fit)
1 2 3 4
1 NA 0.5785771 0.9009064 0.7705165
2 0.42142293 NA 0.8688013 0.7097758
3 0.09909362 0.1311987 NA 0.2697077
4 0.22948346 0.2902242 0.7302923 NA
39
The Bradley-terry Model: the BradleyTerry2
package
p
X p
X
λi = βr xir − βr xjr + Ui − Uj
r=1 r=1
41
The Bradley-Terry model can now be fitted by using
function BTm from the BradleyTerry package. Here we
fit the INTERCEPT model and store the result as an
object named citeModel:
> library(BradleyTerry2)
> head(citations.sf)
journal1 journal2 win1 win2
1 Biometrika Comm Statist 730 33
2 Biometrika JASA 498 320
3 Biometrika JRSS-B 221 284
4 Comm Statist JASA 68 813
5 Comm Statist JRSS-B 17 276
6 JASA JRSS-B 142 325
42
Standard Bradley-Terry model fitted to these data
Coefficients:
..Comm Statist ..JASA ..JRSS-B
-2.9491 -0.4796 0.2690
43
If a different ‘reference’ journal is required, this can be
achieved using the optional refcat argument: for exam-
ple, making use of update to avoid re-specifying the
whole model,
44
Example by Gioè-Guastella (a.a 2016/2017)
library(VGAM)
data(football)
f<-football[-1]
ff1<-table(subset(f,result==1))
ff2<-table(subset(f,result==-1))
ff3<-ff1+ff2
ff3<-as.data.frame(ff3)
ff4<-matrix(ff3$Freq,29,29, byrow=T)
diag(ff4)<-rep(NA,29)
dimnames(ff4)<-list(ff3$home[1:29],ff3$home[1:29])
ff4<-t(ff4)
fit=vglm(Brat(ff4)~1,brat)
classifica<-sort((fit@coefficients))
45