0% found this document useful (0 votes)
53 views

2 Dealing With Logistic Regression

1. Logistic regression is used to predict a binary dependent variable from independent variables. It uses a logistic function to model the probabilities of different outcomes. 2. To diagnose logistic regression models, tests are conducted to check for specification errors, goodness of fit, and issues like multicollinearity. Specification error can be tested using linktest in Stata. Goodness of fit is assessed using the Hosmer-Lemeshow test, where a non-significant result indicates good fit. Multicollinearity is tested using the collin command. 3. Ordered logistic regression extends logistic regression to dependent variables with more than two ordered categories. It models the odds of being in a higher category given predictor variables.

Uploaded by

Prabin Ghimire
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views

2 Dealing With Logistic Regression

1. Logistic regression is used to predict a binary dependent variable from independent variables. It uses a logistic function to model the probabilities of different outcomes. 2. To diagnose logistic regression models, tests are conducted to check for specification errors, goodness of fit, and issues like multicollinearity. Specification error can be tested using linktest in Stata. Goodness of fit is assessed using the Hosmer-Lemeshow test, where a non-significant result indicates good fit. Multicollinearity is tested using the collin command. 3. Ordered logistic regression extends logistic regression to dependent variables with more than two ordered categories. It models the odds of being in a higher category given predictor variables.

Uploaded by

Prabin Ghimire
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Logistic Regression Analysis

1. Overall Understanding

Logistic Regression
Definition Logistic regression is a statistical model that in its basic form uses
a logistic function to model a binary dependent variable.
In regression analysis, logistic regression (or logit regression) is
estimating the parameters of a logistic model (a form of
binary regression).
Binary Logistic Regression
When to use Logistic regression is the statistical technique used to predict the
relationship between predictors (our independent variables) and a
predicted variable (the dependent variable) where the dependent
variable is binary (e.g., sex [male vs. female], response [yes vs. no],
score [high vs. low], etc…).
Necessary Logistic Pre-estimation For Specification Error test [perform
Regression Diagnosis  Specification error linktest]
 Goodness of fit For Goodness of fit test [ perform lfit]
 Other Diagnostics For Other Diagnostics Tests [perform
Post-estimation fitstat]
 Multicollinearity For Multicollinearity test [perform
 Heteroscedasticity collin]
For Heteroscedasticity test [perform
hettest]

Possible Regression Interpretation


Performance Coefficient An increase in x increase/decrease the likelihood that
1. Binary Logit y=1(makes that outcome more/less likely).
Model In other words, an increase in x makes the outcome of 1
2. Binary Probit more or less likely.
Model Odds Ratio The odds ratio or relative risk is p/(1-p) and measure the
probability that y=1 relative to the probability that y=0.
Odds ratio is 2 means that the outcome y=1 is twice as
likely as the outcome of y=0.
Odds ratio are estimated with the logistic model and not
possible with probit regression.
Marginal An increase in x increases (decreases) the probability
effect that y=1 by the marginal effect expressed as a percent.
 for dummy independent variables, the marginal
effect is expressed in comparison to the base
category (x=0)
 For continuous independent variables, the
marginal effect is expressed for a one-unit change
in x.

Ordered Logistic Regression 


When to use Ordered Logistic Regression (also called the logit model or cumulative
link model) is a sub-type of logistic regression where the Y-category

1
is ordered. It is used when your dependent variable has: A
meaningful order, and. More than two categories (or levels).

Necessary Logistic Pre-estimation For Specification Error test [perform


Regression Diagnosis  Specification error linktest]
Post-estimation For Multicollinearity test [perform collin]
 Multicollinearity For Heteroscedasticity test [perform
 Heteroscedasticity hettest]

Possible Regression Interpretation


Performance: Coefficient For a one unit increase in X, we would expect a (value
1. Ordered Logit of coefficient, for example 0.62) increase in the log
Model odds of being in a higher level of Y, given that all of the
2. Ordered Probit other variables in the model are held constant. 
Model (In other words, one unit change in x change by y=1
unit in in the log of odds).
Odds Ratio The interpretation of the odds ratio is same as Binary
Logistic Regression.
Marginal The interpretation of the marginal ration is same for all
effect logistic regression.

Tobit Regression Model


When to use In statistics, a tobit model is any of a class of regression models in
which the observed range of the dependent variable is censored in
some way. It is also called censored regression analysis.
Necessary Logistic Pre-estimation For Specification Error test [perform
Regression Diagnosis  Specification error linktest]
Post-estimation For Multicollinearity test [perform collin]
 Multicollinearity For Heteroscedasticity test [perform
 Heteroscedasticity hettest]

Possible Regression Interpretation


Performance: Coefficient The regression coefficient for the tobit model is same as
Tobit Regression Model the OLS regression that means:
One unit change in x leads to the increase/ decrease in
(coefficient level value for the particular variable).
Marginal The interpretation of the marginal ration is same for all
effect logistic regression.

2. Understanding Logistic Regression Diagnosis with Stata Command


Following steps is necessary:
1. Specification Error
The Stata command linktest can be used to detect a specification error, and it is issued after
the logit or logistic command. The idea behind linktest is that if the model is properly
specified, one should not be able to find any additional predictors that are statistically
significant except by chance. After the regression command (in our

2
case, logit or logistic), linktest uses the linear predicted value (_hat) and linear predicted
value squared (_hatsq) as the predictors to rebuild the model. The variable _hat should be a
statistically significant predictor, since it is the predicted value from the model. 
Step 1: run logistic regression (logit DV IVs) [Result appears]
Step 2: linktest [result of hat and hat test appears]
Step 3: Compare your specification error

Decision:
1. Check _hat value, it should be statistically significant
2. Check _hatsq value, it should not be statistically significant
3. If 1 and 2 appears as per our desire, the result confirms, on one
hand, that we have chosen meaningful predictors. 

2. Goodness of fit
We have seen from our previous lessons that Stata’s output of logistic regression contains the
log likelihood chi-square and pseudo R-square for the model.
The log likelihood chi-square is an omnibus test to see if the model as a whole is statistically
significant. 
Another commonly used test of model fit is the Hosmer and Lemeshow’s goodness-of-fit test.
The idea behind the Hosmer and Lemeshow’s goodness-of-fit test is that the predicted
frequency and observed frequency should match closely, and that the more closely they
match, the better the fit. The Hosmer-Lemeshow goodness-of-fit statistic is computed as the
Pearson chi-square from the contingency table of observed frequencies and expected
frequencies. Similar to a test of association of a two-way table, a good fit as measured by
Hosmer and Lemeshow’s test will yield a large p-value. When there are continuous predictors
in the model, there will be many cells defined by the predictor variables, making a very large
contingency table, which would yield significant result more than often. So a common
practice is to combine the patterns formed by the predictor variables into 10 groups and form
a contingency table of 2 by 10.
Step 1: run logistic regression (logit DV IVs) [Result appears]
Step 2: lfit [Result appears]

Decision:
If p-value is greater than 0.05 or 5%, we can say that Hosmer and Lemeshow’s
goodness-of-fit test indicates that our model fits the data well.
3. Other Diagnostics

There are many other measures of model fit, such AIC (Akaike Information Criterion) and
BIC (Bayesian Information Criterion).

Step 1: run logistic regression (logit DV IVs) [Result appears]


Step 2: fitstat [Result appears]

3
Note: Many times, fitstat is used to compare models. Note that fitstat should only be
used to compare nested models.

Decision:
1.The value of fitstat provides many diagnostic results of the dataset that we use.
2.It provides readers meaningful insight about the fitness of dataset.

4. Multicollinearity
Multicollinearity (or collinearity for short) occurs when two or more independent variables in
the model are approximately determined by a linear combination of other independent
variables in the model. Notice that Stata issues a note, informing us that the
variable yr_rnd has been dropped from the model due to collinearity. We cannot assume that
the variable that Stata drops from the model is the "correct" variable to omit from the model;
rather, we need to rely on theory to determine which variable should be omitted.

Step 1: run logistic regression (logit DV IVs) [Result appears]


Step 2: run collin test (collin DV IVs) [Result appears]

Decision:
As a rule of thumb, a tolerance of 0.1 or less (equivalently VIF of 10 or
greater)  is a cause for concern.

5. Influential Observations (Heteroscedasticity)


Step 1: run regression (reg DV IVs) [Result appears]
Step 2: check heteroscedasticity (hettest) [Result appears]
Step 3: if heteroscedasticity appears (i.e. p value is less that 0.05 or5%) use robust
command

For practice: Use Stata do file provided during the session.

Best Wishes

You might also like