Logit
Logit
Outline
Introduction and Description Some Potential Problems and Solutions Writing Up the Results
Why use logistic regression? Estimation by maximum likelihood Interpreting coefficients Hypothesis testing Evaluating the performance of the model
The Data
EVAC 0 0 0 1 1 0 0 0 0 0 0 0 1 PETS 1 1 1 1 0 0 0 1 1 0 0 1 1 MOBLHOME 0 0 1 1 0 0 0 0 0 0 0 0 1 TENURE 16 26 11 1 5 34 3 3 10 2 2 25 20 EDUC 16 12 13 10 12 12 14 16 12 18 12 16 12
OLS Results
Dependent Variable: Variable (Constant) PETS MOBLHOME TENURE EDUC FLOYD R2 F-stat EVAC B 0.190 -0.137 0.337 -0.003 0.003 0.198 0.145 36.010 t-value 2.121 -5.296 8.963 -2.973 0.424 8.147
Problems:
Predicted Values outside the 0,1 range
Descriptive Statistics N Unst andardiz ed Predicted Value Valid N (listwise) 1070 1070 Minimum -.08498 Max imum .76027 Mean .2429907
Std. Deviat
.1632
Heteroskedasticity
Park Test
Dependent Variable: LNESQ B t-stat (Constant) -2.34 -15.99 LNTNSQ -0.20 -6.19
p is the probability that the event Y occurs, p(Y=1) p/(1-p) is the "odds ratio" ln[p/(1-p)] is the log odds ratio, or "logit"
More: The logistic distribution constrains the estimated probabilities to lie between 0 and 1. The estimated probability is:
p = 1/[1 + exp(- - X)] if you let + X =0, then p = .50 as + X gets really big, p approaches 1 as + X gets really small, p approaches 0
MLE involves finding the coefficients (, ) that makes the log of the likelihood function (LL < 0) as large as possible Or, finds the coefficients that make -2 times the log of the likelihood function (-2LL) as small as possible The maximum likelihood estimates solve the following condition:
Interpreting Coefficients
Since:
ln[p/(1-p)] = + X + e
The slope coefficient () is interpreted as the rate of change in the "log odds" as X changes not very useful. Since:
An interpretation of the logit coefficient which is usually more intuitive is the "odds ratio"
Since:
[p/(1-p)] = exp( + X)
Households with pets are 1.933 times more likely to evacuate than those without pets.
Hypothesis Testing
An Example:
Variable B S.E. Wald R Sig t-value -3.28 5.42 -2.48 1.07 -1.33
PETS -0.6593 0.2012 MOBLHOME 1.5583 0.2874 TENURE -0.0198 0.008 EDUC 0.0501 0.0468 Constant -0.916 0.69
10.732 -0.1127 0.0011 29.39 0.1996 0 6.1238 -0.0775 0.0133 1.1483 0.0000 0.2839 1.7624 1 0.1843
Model Chi-Square
The model likelihood ratio (LR), statistic is LR[i] = -2[LL() - LL(, ) ]
{Or, as you are reading SPSS printout: LR[i] = [-2LL (of beginning model)] - [-2LL (of ending model)]}
The LR statistic is distributed chi-square with i degrees of freedom, where i is the number of independent variables Use the Model Chi-Square statistic to determine if the overall model is statistically significant.
An Example:
Beginning Block Number 1. Method: Enter -2 Log Likelihood 687.35714 Variable(s) Entered on Step Number 1.. PETS PETS MOBLHOME MOBLHOME TENURE TENURE EDUC EDUC Estimation terminated at iteration number 3 because Log Likelihood decreased by less than .01 percent. -2 Log Likelihood 641.842 Chi-Square Model 45.515 df 4 Sign. 0.0000
An Example:
Observed 0 1 Predicted 0 1 328 24 139 44 Overall % Correct 93.18% 24.04% 69.53%
Pseudo-R2
One psuedo-R2 statistic is the McFadden's-R2 statistic: McFadden's-R2 = 1 - [LL(,)/LL()] {= 1 - [-2LL(, )/-2LL()] (from SPSS printout)}
where the R2 is a scalar measure which varies between 0 and (somewhat close to) 1 much like the R2 in a LP model.
An Example:
Beginning -2 LL Ending -2 LL Ending/Beginning McF. R = 1 - E./B.
2
Omitted Variable Bias Irrelevant Variable Bias Functional Form Multicollinearity Structural Breaks
Omitted variable(s) can result in bias in the coefficient estimates. To test for omitted variables you can conduct a likelihood ratio test: LR[q] = {[-2LL(constrained model, i=k-q)] - [-2LL(unconstrained model, i=k)]}
where LR is distributed chi-square with q degrees of freedom, with q = 1 or more omitted variables {This test is conducted automatically by SPSS if you specify "blocks" of independent variables}
An Example:
Variable PETS MOBLHOME TENURE EDUC CHILD WHITE FEMALE Constant Beginning -2 LL Ending -2 LL B -0.699 1.570 -0.020 0.049 0.009 0.186 0.018 -1.049 Wald 10.968 29.412 5.993 1.079 0.011 0.422 0.008 2.073 687.36 641.41 Sig 0.001 0.000 0.014 0.299 0.917 0.516 0.928 0.150
Since the chi-squared value is less than the critical value the set of coefficients is not statistically significant. The full model is not an improvement over the partial model.
The inclusion of irrelevant variable(s) can result in poor model fit. You can consult your Wald statistics or conduct a likelihood ratio test.
Functional Form
Errors in functional form can result in biased coefficient estimates and poor model fit. You should try different functional forms by logging the independent variables, adding squared terms, etc. Then consult the Wald statistics and model chi-square statistics to determine which model performs best.
Multicollinearity
The presence of multicollinearity will not lead to biased coefficients. But the standard errors of the coefficients will be inflated. If a variable which you think should be statistically significant is not, consult the correlation coefficients. If two variables are correlated at a rate greater than .6, .7, .8, etc. then try dropping the least theoretically important of the two.
Structural Breaks
You may have structural breaks in your data. Pooling the data imposes the restriction that an independent variable has the same effect on the dependent variable for different groups of data when the opposite may be true. You can conduct a likelihood ratio test: LR[i+1] = -2LL(pooled model) [-2LL(sample 1) + -2LL(sample 2)]
where samples 1 and 2 are pooled, and i is the number of independent variables.
An Example
Is the evacuation behavior from Hurricanes Dennis and Floyd statistically equivalent?
Floyd B -0.66 1.56 -0.02 0.05 -0.92 687.36 641.84 45.52 Dennis B -1.20 2.00 -0.02 -0.04 -0.78 440.87 382.84 58.02 Pooled B -0.79 1.62 -0.02 0.02 -0.97 1186.64 1095.26 91.37
Variable PETS MOBLHOME TENURE EDUC Constant Beginning -2 LL Ending -2 LL Model Chi-Square
p = .01
Since the chi-squared value is greater than the critical value the set of coefficients are statistically different. The pooled model is inappropriate.
Writing Up Results
Present descriptive statistics in a table Make it clear that the dependent variable is discrete (0, 1) and not continuous and that you will use logistic regression. Logistic regression is a standard statistical procedure so you don't (necessarily) need to write out the formula for it. You also (usually) don't need to justify that you are using Logit instead of the LP model or Probit (similar to logit but based on the normal distribution [the tails are less fat]).
An Example:
"The dependent variable which measures the willingness to evacuate is EVAC. EVAC is equal to 1 if the respondent evacuated their home during Hurricanes Floyd and Dennis and 0 otherwise. The logistic regression model is used to estimate the factors which influence evacuation behavior."
An Example:
Table 2. Logistic Regression Results Dependent Variable = EVAC Variable B B/S.E. PETS MOBLHOME TENURE EDUC Constant Model Chi-Squared -0.6593 1.5583 -0.0198 0.0501 -0.916 45.515 -3.28 5.42 -2.48 1.07 -1.33
When describing the statistics in the tables, point out the highlights for the reader. What are the statistically significant variables?
"The results from Model 1 indicate that coastal residents behave according to risk theory. The coefficient on the MOBLHOME variable is negative and statistically significant at the p < .01 level (tvalue = 5.42). Mobile home residents are 4.75 times more likely to evacuate.
Is the overall model statistically significant? The overall model is significant at the .01 level according to the Model chi-square statistic. The model predicts 69.5% of the responses correctly. The McFadden's R2 is .066."
Also
You usually don't need to discuss the magnitude of the coefficients--just the sign (+ or -) and statistical significance. If your audience is unfamiliar with the extensions (beyond SPSS or SAS printouts) to logistic regression, discuss the calculation of the statistics in an appendix or footnote or provide a citation. Always state the degrees of freedom for your likelihood-ratio (chi-square) test.
References
http://personal.ecu.edu/whiteheadj/data/logit/
http://personal.ecu.edu/whiteheadj/data/logit/logitpap.htm E-mail: [email protected]