5.3) Ordinal logistic regression 2
5.3) Ordinal logistic regression 2
March 2013
1
Ordinal logistic regression
Many variables of interest are ordinal. That is, you can rank
the values, but the real distance between categories is
unknown.
3
Ordinal logistic regression
4
Each logit has its own j term but the same
coefficient . That means that the effect of the
independent variable is the same for different logit
functions.
5
The j terms, called the threshold values, often
aren’t of much interest.
7
Tables required in ordinal logistic regression
They rated judges on the scale: Poor (1), Only fair (2), Good (3),
and Excellent (4).
11
Goodness-of-fit statistics
If your model fits well, the observed and expected cell
counts are similar, the value of each statistic (e.g., the
chi-square statistic) is small, and the observed
significance level (P-value) is large.
You reject the null hypothesis that the model fits if the
observed significance level (i.e., p-value) for the
goodness of-fit statistics is small.
14
Procedure:
5) Click on other buttons as appropriate.
15
Procedure:
♣ The following hypothetical data set has a
three level variable called apply (coded 0, 1,
2), that we will use as our response (i.e.,
outcome, dependent) variable.
16
Procedure:
♣ We also have three variables that we will use as predictors:
17
Procedure:
Some Strategies You Might Try
♣ ANOVA: If you use only one continuous predictor, you could "flip"
the model around so that, say, gpa was the outcome variable and
apply was the predictor variable. Then you could run a one-way
ANOVA. This isn't a bad thing to do if you only have one predictor
variable (from the logistic model), and it is continuous.
19
Using the Ordinal Logistic Model
APPLY
Cumulativ e
Frequency Percent Valid Percent Percent
Valid .00 220 55.0 55.0 55.0
1.00 140 35.0 35.0 90.0
2.00 40 10.0 10.0 100.0
Total 400 100.0 100.0
Descriptive Statistics
Std.
N Minimum Maximum Mean Deviation
gpa
400 1.90 4.00 2.9989 .39794
Valid N
(listwise) 400
20
Frequency distribution
pared2
Cumulative
Frequency Percent Valid Percent Percent
Valid 1 63 15.8 15.8 15.8
2 337 84.2 84.2 100.0
Total 400 100.0 100.0
public2
Cumulative
Frequency Percent Valid Percent Percent
Valid 1 57 14.2 14.2 14.2
2 343 85.8 85.8 100.0
Total 400 100.0 100.0
21
Cross tabulation
Pared 2 Public 2
22
Warnings
♣ Regarding the above data, none of the cells is too small or empty
(has no cases), so we will run our model.
♣ We checked for empty cells when we did the crosstabs with the
response variable by each of the categorical predictor variables, and
those tables looked OK, so we will proceed with the analysis.
23
Case Processing Summary
Warnings
There are 357 (54.1%) cells (i.e., dependent variable levels by combinations of
predictor variable values) with zero frequencies.
-2 Log
Model Likelihood Chi-Square df Sig.
Interc ept Only 557.272
Final 533.091 24. 180 3 .000
Link f unct ion: Logit .
26
Before proceeding to examine the individual coefficients, you
want to look at an overall test of the null hypothesis that the
location coefficients for all of the variables in the model are 0.
27
Goodness-of-Fit
Goodness-of-Fit
Chi-Square df Sig.
Pearson
400.843 435 .878
Deviance
400.749 435 .879
Link function: Logit.
28
If your model fits well, the observed and expected cell counts
are similar, the value of each statistic is small, and the
observed significance level (p-value) is large.
30
The previous table contains the estimated
coefficients for the model. The estimates labeled
Threshold are the j ’s, the intercept equivalent
terms.
The estimates labeled Location are the ones
you’re interested in. They are the coefficients for
the predictor variables.
• For gpa, we would say that for a one unit increase in gpa, we would
expect a 0.62 increase in the expected value of apply in the log odds
scale, given that all of the other variables in the model are held
constant.
33
Parameter Estimates
Predictor Estimate Std. Wald df Sig. 95% Odds 95% C.I.
variable error ratio
Lower Upper
bound bound (eb) lower upper
Pared2 1.048 .268 15.231 1 <.001 .522 1.574 2.85 1.68 4.83
Public2 -0.059 .289 .041 1 .839 -.624 .507 0.94 .54 1.66
gpa 0.616 .263 5.499 1 .019 .101 1.130 1.85 1.11 3.10
34
Parameter Estimates
♣ In the column eb we see the results presented as proportional odds
ratios (the coefficient exponentiated).
♣ We will ignore the values for apply = 0 and apply = 1, as those are
the thresholds and not usually reported in terms of proportional
odds ratios.
♣ For a one unit increase in gpa, the odds of the middle and high
categories of apply versus the low category of apply is 1.85 times
greater, given that the other variables in the model are held
constant.
♣ In other words, ordinal logistic regression assumes that the coefficients that
describe the relationship between, say, the lowest versus all higher categories of
the response variable are the same as those that describe the relationship between
the next lowest category and all higher categories, etc.
♣ The null hypothesis of this chi-square test is that there is no difference in the
coefficients between models, so we "hope" to get a non-significant result.
38
Test of Parallel Lines
-2 Log
Model Likelihood Chi-Square df Sig.
Null Hypothesis
533.091
General
529.077 4.014 3 0.26
39
Test of Parallel Lines
The above test indicates that we have not violated the
proportional odds assumption.