0% found this document useful (0 votes)
15 views

5.3) Ordinal logistic regression 2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

5.3) Ordinal logistic regression 2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Ordinal logistic regression

Prof. Getu Degu

March 2013

1
Ordinal logistic regression
Many variables of interest are ordinal. That is, you can rank
the values, but the real distance between categories is
unknown.

Diseases are graded on scales from least severe to most


severe.

Survey respondents choose answers on scales from strongly


agree to strongly disagree.

Students are graded on scales from A to F.


2
Ordinal logistic regression
Ordinal regression is used with ordinal dependent
(response) variables, where the independents may be
categorical factors or continuous covariates.

Ordinal regression models are sometimes called cumulative


logit models. Ordinal regression typically uses the logit link
function.

Ordinal regression with a logit link is also called a


proportional odds model, since the parameters of the
predictor variables may be converted to odds ratios, as in
logistic regression.

3
Ordinal logistic regression

 For ordinal categorical variables, the


drawback of the multinomial regression
model is that the ordering of the categories is
ignored.

4
Each logit has its own j term but the same
coefficient . That means that the effect of the
independent variable is the same for different logit
functions.

 That’s an assumption you have to check. That’s


also the reason the model is also called the
proportional odds model.

5
 The j terms, called the threshold values, often
aren’t of much interest.

 Their values do not depend on the values of the


independent variable for a particular case. They
are like the intercept in a linear regression, except
that each logit has its own.

They’re used in the calculations of predicted


values.
6
Ordinal logistic regression
• Ordinal regression requires assuming that the effect of
the independents is the same for each level of the
dependent.

• Thus if an independent variable is age, then the effect


on the dependent variable for a 10 year increase in age
should be the same whether the difference is between
age 20 to age 30, or from age 50 to age 60.

• The "test of parallel lines assumption" tests this critical


assumption, which should not be taken for granted.

7
Tables required in ordinal logistic regression

 Frequency of the outcome variable


 Frequency of all categorical independent variables
 Descriptives of continuous independent variables

 Cross tabulation of the outcome variables vs. each


independent variable
 Case processing Summary
 Model Fitting Information
 Parameter Estimates
 Goodness-of-fit
 Test of Parallel Lines
8
Ordinal logistic regression
 Consider the following example. A random sample of voters in
country X was asked to rate their satisfaction with the criminal
justice system in the given country (X).

 They rated judges on the scale: Poor (1), Only fair (2), Good (3),
and Excellent (4).

 They also indicated whether they or anyone in their family was a


crime victim in the last three years.

 Now, you want to model the relationship between their rating


and having a crime victim in the household.
9
Defining the Event
 In ordinal logistic regression, the event of interest is observing
a particular score or less. For the rating of judges, you model
the following odds:
θ1 = prob(score of 1) / prob(score greater than 1)

θ2 = prob(score of 1 or 2) / prob(score greater than 2)

θ3 = prob(score of 1, 2, or 3) / prob(score greater than 3)

 The last category doesn’t have an odds associated with it


since the probability of scoring up to and including the last
score is 1.
10
Defining the Event
 All of the odds are of the form:

θj = prob(score ≤ j) / prob(score >j)


You can also write the equation as

θj = prob(score ≤ j ) / (1 – prob(score ≤ j )),

since the probability of a score greater than j is


1 – probability of a score less than or equal to j.

11
Goodness-of-fit statistics
If your model fits well, the observed and expected cell
counts are similar, the value of each statistic (e.g., the
chi-square statistic) is small, and the observed
significance level (P-value) is large.

You reject the null hypothesis that the model fits if the
observed significance level (i.e., p-value) for the
goodness of-fit statistics is small.

Good models have large observed significance levels


(p-values).
12
Procedure:
1) On the menu bar of the SPSS data editor window, click Analyze,
Regression, Ordinal

2) Select one or more variables that you want to analyze by clicking


on the variable labels in the Ordinal Regression dialog box.

► Click on the appropriate arrow button to select the


dependent variable for the dependent window (box).
► Optionally, select one or more variables for the
Factor(s) window and the Covariates window.
►To select multiple variables, hold down the Ctrl key
and choose the variables you want.
Procedure:
3) Click on the Options button to select the
appropriate iterations. Click on the Continue
button.

4) Click on the Output button to select the


appropriate options given under Display (Saved
Variables). The option “Test of parallel lines” must
be selected. Click on the Continue button.

14
Procedure:
5) Click on other buttons as appropriate.

6) Finally, click on the OK button in the Ordinal


Regression dialog box to run the analysis.

The output will be displayed in a new SPSS Viewer


window.

15
Procedure:
♣ The following hypothetical data set has a
three level variable called apply (coded 0, 1,
2), that we will use as our response (i.e.,
outcome, dependent) variable.

 College juniors are asked if they are unlikely,


somewhat likely, or very likely to apply to
graduate school. Hence, our outcome variable
has three categories.

16
Procedure:
♣ We also have three variables that we will use as predictors:

► pared, which is a 1/2 variable indicating whether at least


one parent has a graduate degree (i.e., 1= at least one
parent has a graduate degree, 2 = no degree);

► public, which is a 1/2 variable where 2 indicates that the


undergraduate institution is a public university and
1 indicates that it is a private university;

► gpa, which is the student's grade point average.

17
Procedure:
Some Strategies You Might Try

♣ OLS regression: This analysis is problematic because the


assumptions of OLS are violated when it is used with a non-interval
outcome variable.

♣ ANOVA: If you use only one continuous predictor, you could "flip"
the model around so that, say, gpa was the outcome variable and
apply was the predictor variable. Then you could run a one-way
ANOVA. This isn't a bad thing to do if you only have one predictor
variable (from the logistic model), and it is continuous.

♣ Multinomial logistic regression: This is similar to doing ordinal


logistic regression, except that it is assumed that there is no order to
the categories of the outcome variable (i.e., the categories are
nominal). The downside of this approach is that the information
contained in the ordering is lost. 18
Using the Ordinal Logistic Model

♣ Before we run our ordinal logistic model, we will see if any


cells are empty or extremely small. If any are, we may have
difficulty running our model.

♣ This can be done (assessed) by carrying out simple


crosstabs.

19
Using the Ordinal Logistic Model
APPLY

Cumulativ e
Frequency Percent Valid Percent Percent
Valid .00 220 55.0 55.0 55.0
1.00 140 35.0 35.0 90.0
2.00 40 10.0 10.0 100.0
Total 400 100.0 100.0

Descriptive Statistics

Std.
N Minimum Maximum Mean Deviation
gpa
400 1.90 4.00 2.9989 .39794
Valid N
(listwise) 400

20
Frequency distribution
pared2
Cumulative
Frequency Percent Valid Percent Percent
Valid 1 63 15.8 15.8 15.8
2 337 84.2 84.2 100.0
Total 400 100.0 100.0

public2
Cumulative
Frequency Percent Valid Percent Percent
Valid 1 57 14.2 14.2 14.2
2 343 85.8 85.8 100.0
Total 400 100.0 100.0
21
Cross tabulation
Pared 2 Public 2

Apply 1 2 Total 1 2 Total

1 20 200 220 31 189 220


2 30 110 140 16 124 140
3 13 27 40 10 30 40

Total 63 337 400 57 343 400

22
Warnings
♣ Regarding the above data, none of the cells is too small or empty
(has no cases), so we will run our model.

♣ While running our model, we see the following :


“There are 357 cells with zero frequencies”

♣ Technically, there are empty cells because of the continuous variable


in our model, gpa. However, we are not worried about this warning
message.

♣ We checked for empty cells when we did the crosstabs with the
response variable by each of the categorical predictor variables, and
those tables looked OK, so we will proceed with the analysis.
23
Case Processing Summary
Warnings

There are 357 (54.1%) cells (i.e., dependent variable levels by combinations of
predictor variable values) with zero frequencies.

Case Processing Summary


Marginal
N Percentage
apply 0 220 55.0%
1 140 35.0%
2 40 10.0%
pared2 1 63 15.8%
2 337 84.2%
public2 1 57 14.2%
2 343 85.8%
Valid 400 100.0%
Missing 0
Total 400
24
Case Processing Summary table
♣ In the Case Processing Summary table, we see the number
and percentage of cases in each level of our response
variable.

♣ These numbers look fine, but we would be concerned if


one level had very few cases in it. We also see that all 400
observations in our data set were used in the analysis.

♣ Fewer observations would have been used if any of our


variables had missing values.

♣ By default, SPSS does a listwise deletion of cases with


missing values.
25
Model Fitting Information table
 Next we see the Model Fitting Information table, which
gives the -2 log likelihood for the intercept-only and final
models. The -2 log likelihood can be used in
comparisons of nested models.

Model Fitting Information

-2 Log
Model Likelihood Chi-Square df Sig.
Interc ept Only 557.272
Final 533.091 24. 180 3 .000
Link f unct ion: Logit .

26
 Before proceeding to examine the individual coefficients, you
want to look at an overall test of the null hypothesis that the
location coefficients for all of the variables in the model are 0.

 You can base this on the change in –2 log-likelihood when the


variables are added to a model that contains only the intercept.

 The change in likelihood function has a chi-square distribution


even when there are cells with small observed and predicted
counts.

27
Goodness-of-Fit

Goodness-of-Fit

Chi-Square df Sig.
Pearson
400.843 435 .878
Deviance
400.749 435 .879
Link function: Logit.

28
 If your model fits well, the observed and expected cell counts
are similar, the value of each statistic is small, and the
observed significance level (p-value) is large.

 Good models have large observed significance levels (P-


values).

 The previous slide shows that the goodness-of-fit measures


have large observed significance levels (p-values), so it
appears that the model fits. Both Pearson and Deviance show
that the given model fits well (The P-values are much greater
than 0.05).
29
Parameter Estimates table
Parameter Estimates
95% Confidence
Interval
Std. Lower Upper
Estimate Error Wald df Sig. Bound Bound
Threshold [apply = .00] 2.203 .784 7.890 1 .005 .666 3.741
[apply = 1.00]
4.299 .809 28.224 1 .000 2.713 5.885
Location gpa .616 .263 5.499 1 .019 .101 1.130
[pared2=1.00] 1.048 .268 15.231 1 .000 .522 1.574
[pared2=2.00] 0a . . 0 . . .
[public2=1.00] -.059 .289 .041 1 .839 -.624 .507
[public2=2.00] 0a . . 0 . . .
Link function: Logit.
a. This parameter is set to zero because it is redundant.

30
The previous table contains the estimated
coefficients for the model. The estimates labeled
Threshold are the j ’s, the intercept equivalent
terms.
The estimates labeled Location are the ones
you’re interested in. They are the coefficients for
the predictor variables.

 For example, the coefficient for pared2 (coded


1 = yes, 2 = no), the independent variable in the
model, is 1.048. 31
Parameter Estimates table
• In the Parameter Estimates table we see the coefficients,
their standard errors, the Wald test and associated p-
values (Sig.), and the 95% confidence interval of the
coefficients.

• Both pared2 and gpa are statistically significant; public2


is not.
• The estimates in the output are given in units of ordered
logits, or ordered log odds.

• Therefore, for pared2, we would say that for a one unit


increase in pared2 (i.e., going from no degree to at least
one parent has degree), we expect a 1.05 increase in the
log odds of apply, given all of the other variables in the
model are held constant. 32
Parameter Estimates table

• For gpa, we would say that for a one unit increase in gpa, we would
expect a 0.62 increase in the expected value of apply in the log odds
scale, given that all of the other variables in the model are held
constant.

• The thresholds are not used in the interpretation of the results.


Some statistical packages call the thresholds "cutpoints" (thresholds
and cutpoints are the same thing);

• Other packages, such as SAS report intercepts, which are the


negative of the thresholds.

33
Parameter Estimates
Predictor Estimate Std. Wald df Sig. 95% Odds 95% C.I.
variable error ratio
Lower Upper
bound bound (eb) lower upper

Pared2 1.048 .268 15.231 1 <.001 .522 1.574 2.85 1.68 4.83

Public2 -0.059 .289 .041 1 .839 -.624 .507 0.94 .54 1.66

gpa 0.616 .263 5.499 1 .019 .101 1.130 1.85 1.11 3.10

34
Parameter Estimates
♣ In the column eb we see the results presented as proportional odds
ratios (the coefficient exponentiated).

♣ We have also calculated the lower and upper 95% confidence


interval. We would interpret the proportional odds ratios pretty
much as we would odds ratios from a binary logistic regression.

♣ We will ignore the values for apply = 0 and apply = 1, as those are
the thresholds and not usually reported in terms of proportional
odds ratios.

♣ Parental education (p <0.001)and grade point average (p=0.019) are


positively and significantly associated with the tendency to apply
for graduate school.
♣ There was no statistically significant effect of public2 on apply.
(p = 0.839) 35
Parameter Estimates
♣ For pared2, we would say, as we go from no degree to at least one
parent has a graduate degree, the odds of high apply versus the
combined middle and low categories are 2.85 greater, given that all
of the other variables in the model are held constant.

♣ Likewise, the odds of the combined middle and high categories


versus low apply is 2.85 times greater, given that all of the other
variables in the model are held constant.

♣ For a one unit increase in gpa, the odds of the middle and high
categories of apply versus the low category of apply is 1.85 times
greater, given that the other variables in the model are held
constant.

♣ Because of the proportional odds assumption, the same increase,


1.85 times, is found between high apply and the combined
categories of middle and low apply.
36
Parameter Estimates (Examining the Coefficients)
 From the observed significance levels, we see that having or
not having a degree by parents and current GPA are related to
applying to graduate school.

On the other hand, the undergraduate institution (public or


private) didn’t show any statistically significant association with
applying to graduate school.

 Both parental education and GPA have positive coefficients.


 Students with parental education (i.e., at least one parent has
a graduate degree) are more likely to apply to graduate
school.

 Students with high GPA are more likely to apply to graduate


school compared to those with low GPA. 37
Assumption: test of parallel lines
♣ One of the assumptions underlying ordinal logistic regression is that the
relationship between each pair of outcome groups is the same.

♣ In other words, ordinal logistic regression assumes that the coefficients that
describe the relationship between, say, the lowest versus all higher categories of
the response variable are the same as those that describe the relationship between
the next lowest category and all higher categories, etc.

♣ This is called the proportional odds assumption or the parallel regression


assumption.
♣ Because the relationship between all pairs of groups is the same, there is only one
set of coefficients (only one model). If this was not the case, we would need
different models to describe the relationship between each pair of outcome
groups.

♣ We need to test the proportional odds assumption.

♣ The null hypothesis of this chi-square test is that there is no difference in the
coefficients between models, so we "hope" to get a non-significant result.
38
Test of Parallel Lines

Test of Parallel Linesa

-2 Log
Model Likelihood Chi-Square df Sig.
Null Hypothesis
533.091
General
529.077 4.014 3 0.26

The null hypothesis states that the location parameters (slope


coefficients) are the same across response categories.

a. Link function: Logit.

39
Test of Parallel Lines
 The above test indicates that we have not violated the
proportional odds assumption.

 If you reject the assumption of parallelism, you should


consider using multinomial logistic regression, which
estimates separate coefficients for each category.

 Since the observed significance level (see the table of


test of parallel lines) is large, you don’t have sufficient
evidence to reject the parallelism hypothesis.
40

You might also like