0% found this document useful (0 votes)
40 views

Chapter 7. Software Application

The document discusses linear regression analysis using Stata software. It provides examples of using Stata to estimate linear regression models and perform diagnostic tests on the models. Specifically, it estimates a model of factors influencing daily calorie intake using the food_security database. It finds that family size, access to irrigation, and access to off-farm activities negatively impact calorie intake, while income and fertilizer use positively impact it. The document also discusses how to test the linear regression assumptions, including for normality, heteroskedasticity, multicollinearity, and correct model specification. Finally, it provides a numerical example estimating a logit model of factors affecting student academic performance.

Uploaded by

berhanu seyoum
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views

Chapter 7. Software Application

The document discusses linear regression analysis using Stata software. It provides examples of using Stata to estimate linear regression models and perform diagnostic tests on the models. Specifically, it estimates a model of factors influencing daily calorie intake using the food_security database. It finds that family size, access to irrigation, and access to off-farm activities negatively impact calorie intake, while income and fertilizer use positively impact it. The document also discusses how to test the linear regression assumptions, including for normality, heteroskedasticity, multicollinearity, and correct model specification. Finally, it provides a numerical example estimating a logit model of factors affecting student academic performance.

Uploaded by

berhanu seyoum
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

Advanced Research Methods and Software

Application

Baro Beyene

OSU
2022

1
Content

Chapter 8

Stata Software Application on Some


Econometric Models

2
1. Linear Regression Analysis

This section describes the use of Stata to do regression


analysis. Regression analysis involves estimating an
equation that best describes the data.

One variable is considered the dependent variable,


while the others are considered independent (or
explanatory) variables.

11/3/2022 3
Linear Regression Analysis

Stata is capable of many types of regression analysis


and associated statistical test.

In this section, we touch on only a few of the more


common commands and procedures. The commands
described in this section are:

11/3/2022 4
Linear Regression Analysis

Consider the database food_security to estimate a linear model of


determinants of daily calorie intake (lncalav). Suppose the factors
influencing daily calorie intake per capita of households are
farming system (farmsy), gender (femal), family size (famsz),
land allocate to staples (landst), access to irrigation (irrig),
fertilizer quantity used for crop productio (frtqt), oxen used for
draught power (oxen), (log of) annual gross income in ETB
(lninom), and access to off−farm activities (ofarm).

11/3/2022 5
Linear Regression Analysis

Based on this information workout the following problems:


a. Estimate the OLS model for determinants of (log) daily calorie
intake per adult equivalent of farm households in the study area.
b. Which variables are positively/adversely and significantly
affecting daily calorie intake?
c. How do you interpret the fitness of this OLS model to identify
determinants of calorie intake?

11/3/2022 6
Linear Regression Analysis

11/3/2022 7
Linear Regression Analysis
• According to the OLS model outputs, five variables
(famsz, irrig, frtqt, lninom, and ofarm) are statistically
significant variables affecting daily calorie intake of
households.
• Family size, access to irrigation, and access to off-farm
activities are factors adversely affecting daily calorie
intake while the remaining two variables enhance
calorie intake of households.
• About 19.3% of the variation in daily calorie intake is
explained by this OLS model

11/3/2022 8
Linear Regression Analysis
• However, interpretation of OLS model outputs is
possible if and only if the basic assumptions of
classical linear regression model are satisfied.
• There are many post-estimation tests used to check the
satisfaction of the basic assumptions of multiple linear
regression model.
• Tests for heteroscedasticity, omitted variables and
multicollinearity are the most important postestimation
tests that must be reported with the OLS model
outputs.

11/3/2022 9
Linear Regression Analysis

Post estimation test of OLS Regression/ Diagnostic Test

-Multicollinearity

-Hetroscedasticity

-Autocorrelation ( time series data)

-Normality

-Model Misspecification

11/3/2022 10
Diagnostic Test/Post Estimation Tests
1. Tests for Normality of Residuals
– kdensity -- produces kernel density plot with normal
distribution over laid.

11/3/2022 11
Diagnostic Test/Post Estimation Tests
1. Tests for Normality of Residuals
– pnorm -- graphs a standardized normal probability (P-P) plot.
– qnorm --- plots the quantiles of varname against the quantiles of
a normal distribution.
– mvtest normal residual (Doornik-Hansen test for multivariate
normality)

11/3/2022 12
Diagnostic Test/Post Estimation Tests
1. Tests for Normality of Residuals
– The Skewness-Kurtosis (Jarque-Bera) test
H0: the residual distribution is normal

11/3/2022 13
Diagnostic Cont…
2. Tests for Heteroscedasticity
– hettest – performs Breusch-Pagan/ Cook and Weisberg test
for heteroscedasticity.
– H0: No heteroscedasticity

11/3/2022 14
Diagnostic Cont…
2. Tests for Heteroscedasticity
– imtest-- computes the White general test for
Heteroscedasticity
– H0: No heteroscedasticity

11/3/2022 15
Diagnostic Cont…

3. Tests for Multicollinearity

Vif: Calculates the variance inflation factor for


the independent variables in the linear model.
This test involves the regression of one explanatory
variables on another explanatory variable and if the
auxiliary R2 is greater than 0.9, there is a problem of
Multicollinearity between explanatory variables.
11/3/2022 16
Diagnostic Cont…

3. Tests for Multicollinearity

11/3/2022 17
Diagnostic Cont…

4. Tests for Model Specification


– linktest -- performs a link test for model specification.
– H0: No specification problem
– The value of y hat square should not be significantly contributing to
the test model for correctly specified model.

11/3/2022 18
Diagnostic Cont…

4. Tests for Model Specification


– ovtest -- performs regression specification error test (RESET) for omitted
variables.
– H0: No omitted variable

11/3/2022 19
2 Linear Probability Model
Linear Probability Model(LPM)

Y=1

Y=0 X=income
X=X1 X=X2
Linear Probability Model(LPM)
Linear Probability Model(LPM)

These nonlinear regression models include:

A. The Logit Model Binary Choice Models

B. The Probit Model

C. Multinomial Logit and Probit Model (MNL & MNP)

D. Ordered Logit and Probit Model.


Numerical Example on LPM
Using LPM data from your Stata training folder, regress
poverty on family size and migration, test for
heteroskedasticity, normality and multicollinearity.
. reg poverty fs migration

Source SS df MS Number of obs = 20


F(2, 17) = 13.35
Model 2.93291409 2 1.46645705 Prob > F = 0.0003
Residual 1.86708591 17 .109828583 R-squared = 0.6110
Adj R-squared = 0.5653
Total 4.8 19 .252631579 Root MSE = .3314

poverty Coef. Std. Err. t P>|t| [95% Conf. Interval]

fs .0794074 .034043 2.33 0.032 .007583 .1512318


migration -.3628224 .203684 -1.78 0.093 -.792558 .0669132
_cons .2660414 .2456304 1.08 0.294 -.2521936 .7842763
There is negative predicted
probabilities for some
Numerical Example on LPM observations and probability
greater than one.

Test of normality: mvtest normal e


. mvtest normal e

Test for multivariate normality

Doornik-Hansen chi2(2) = 9.916 Prob>chi2 = 0.0070

Test of heteroskedasticity : hettest


. hettest

Breusch-Pagan / Cook-Weisberg test for heteroskedasticity


Ho: Constant variance
Variables: fitted values of poverty

chi2(1) = 2.80
Prob > chi2 = 0.0942
3 The Logit and Probit Models
Logit and Probit
Logit and Probit

This requires a nonlinear functional form for the


probability. This can be possible if we assume that the
dependent or the error term (Ui) follows some sorts of
cumulative distribution functions.

The two important nonlinear functions which are


proposed for this are the logistic CDF and the normal
CDF.

Pr (Yi=1/Xi) = Pi = G (β0 + β1Xi) = G (Zi)


Logit and Probit
Logit and Probit
Logit and Probit
Logit and Probit
Logit and Probit
Pi

Cumulative Normal Distribution Function

Pi =1

Logistic Distribution Function

Pi=0

0
Logit and Probit
Numerical Example on Logit and Probit

Suppose that we want to examine factors affecting students


academic performance in particular course.

Assume that academic performance is measured by the


grades (A, B, C, D, F) scored by the students.

Further assume that data on three independent variables


namely; previous CGPA, PC ownerships and average score
in exercises (ASE) were collected from 32.
Numerical Example on Logit and Probit
Numerical Example on Logit and Probit

A. Logit Interpretation of Logit Model


logit grade gpa ase pc

Logistic regression Number of obs = 32


LR chi2(3) = 15.40
Prob > chi2 = 0.0015
Log likelihood = -12.889633 Pseudo R2 = 0.3740

grade Coef. Std. Err. z P>|z| [95% Conf. Interval]

gpa 2.826113 1.262941 2.24 0.025 .3507938 5.301432


ase .0951577 .1415542 0.67 0.501 -.1822835 .3725988
pc 2.378688 1.064564 2.23 0.025 .29218 4.465195
_cons -13.02135 4.931325 -2.64 0.008 -22.68657 -3.35613
Numerical Example on Logit and Probit

B. Odds Ratio Interpretation of Logit Model


logit grade gpa ase pc, or

Logistic regression Number of obs = 32


LR chi2(3) = 15.40
Prob > chi2 = 0.0015
Log likelihood = -12.889633 Pseudo R2 = 0.3740

grade Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]

gpa 16.87972 21.31809 2.24 0.025 1.420194 200.6239


ase 1.099832 .1556859 0.67 0.501 .8333651 1.451502
pc 10.79073 11.48743 2.23 0.025 1.339344 86.93802
_cons 2.21e-06 .0000109 -2.64 0.008 1.40e-10 .03487
Numerical Example on Logit and Probit
C. Probability Interpretation of the logit model
. mfx

Marginal effects after logit


y = Pr(grade) (predict)
= .25282025

variable dy/dx Std. Err. z P>|z| [ 95% C.I. ] X

gpa .5338589 .23704 2.25 0.024 .069273 .998445 3.11719


ase .0179755 .02624 0.69 0.493 -.033448 .069399 21.9375
pc* .4564984 .18105 2.52 0.012 .10164 .811357 .4375

(*) dy/dx is for discrete change of dummy variable from 0 to 1


Numerical Example on Logit and Probit
D. Probit Estimation
probit grade gpa ase pc
Probit regression Number of obs = 32
LR chi2(3) = 15.55
Prob > chi2 = 0.0014
Log likelihood = -12.818803 Pseudo R2 = 0.3775

grade Coef. Std. Err. z P>|z| [95% Conf. Interval]

gpa 1.62581 .6938825 2.34 0.019 .2658255 2.985795


ase .0517289 .0838903 0.62 0.537 -.1126929 .2161508
pc 1.426332 .5950379 2.40 0.017 .2600795 2.592585
_cons -7.45232 2.542472 -2.93 0.003 -12.43547 -2.469166
Numerical Example on Logit and Probit

E. Probability Interpretation of Probit Model


. mfx

Marginal effects after probit


y = Pr(grade) (predict)
= .26580809

variable dy/dx Std. Err. z P>|z| [ 95% C.I. ] X

gpa .5333471 .23246 2.29 0.022 .077726 .988968 3.11719


ase .0169697 .02712 0.63 0.531 -.036184 .070123 21.9375
pc* .464426 .17028 2.73 0.006 .130682 .79817 .4375

(*) dy/dx is for discrete change of dummy variable from 0 to 1


Numerical Example on Logit and Probit

Logit: as GPA increases by one point, the log of the odds ratio
increases by 2.8 and statistically significant.

Odds ratio: as GPA increases by one point, the probability of


getting A is 16.87 times the probability of getting other grades
(B, C, D, F)

Marginal Effect: Both logit and probit give us similar results. As


GPA increases by one point, the probability of getting grade A by
the student increases by 53%.
Thank You!

11/3/2022 43

You might also like