0% found this document useful (0 votes)
147 views38 pages

The Probit Model: Alexander Spermann University of Freiburg University of Freiburg Sose 2009

The document outlines the Probit model, beginning with notation and statistical foundations, then introducing the latent variable framework of the Probit model. It discusses maximum likelihood estimation and provides an example application analyzing the effect of a new teaching method using student data on GPA, test scores, and grades.

Uploaded by

roshni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
147 views38 pages

The Probit Model: Alexander Spermann University of Freiburg University of Freiburg Sose 2009

The document outlines the Probit model, beginning with notation and statistical foundations, then introducing the latent variable framework of the Probit model. It discusses maximum likelihood estimation and provides an example application analyzing the effect of a new teaching method using student data on GPA, test scores, and grades.

Uploaded by

roshni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

The Probit Model

Alexander Spermann
University of Freiburg
SoSe 2009
Course outline

1. Notation and statistical foundations


2. Introduction to the Probit model
3. Application
4. Coefficients and marginal effects
5. Goodness-of-fit
6. Hypothesis tests

2
Notation and statistical foundations
1. y i = β 1 + β 2 x i 2 + K + β k x ik + ε i G ujarati
y t = β 0 + β 1 x t 1 + K + β k x tk + u t W ooldridge
2. M atrix
Y = Xβ +ε
Y = x 'β + ε
^ ^
Y = X β+u
y i = x i' β + ε i

x i' β x i' β
 β0   β1 
(1 x1 x2 ) β1  (1 x2 x3 )  β 2 
   
β  β 
 2  3

3
Notation and statistical foundations – Vectors

 Column vector:  a1 
a 
a =  2
nx1 M
a 
 n
 Transposed (row vector): a ' = [ a1 a2 K an ]
1 xn

 Inner product:  b1 
b 
a 'b = [ a1 a2 K an ]  2  = ∑ ai bi
M
b 
 n

4
Notation and statistical foundations – density function

 PDF: probability density function f(x)


 Example: Normal distribution:

1  ( x−µ ) 
2
−  
1
φ ( x) =
2 σ 2 

e
σ 2π
 Example: Standard normal distribution:
N(0,1), µ = 0, σ = 1
x2
1 −
φ ( x) = e 2


µ =0
5
Notation and statistical foundations – distibutions

 Standard logistic distribution:


ex π2
f ( x) = , µ = 0, σ =
2

(1 + e )x 2 3

Exponential distribution:
 1 e −θ , x ≥ 0
x

f ( x ) =  0θ, x ≤ 0 , θ > 0, µ = θ , σ 2 = θ 2


 Poisson distribution:
e −θ θ x
f ( x) = , µ = θ ,σ 2 = θ
x!
6
Notation and statistical foundations – CDF

 CDF: cumulative distribution function F(x)


 Example: Standard normal distribution:
z x2
1 −
Φ(z) = ∫ e 2
dx
−∞ 2π

 The cdf is the integral of the pdf.

7
Notation and statistical foundations – logarithms

 Rule I: y = xz
log y = log x + log z

 Rule II: y=x n

log y = n log x

 Rule III: y = a xb
log y = log a + b log x

8
Introduction to the Probit model – binary variables

 Why not use OLS instead?

1
y=
0
OLS
(linear)
1 x x x x

xxxx x x x
0

 Nonlinear estimation, for example by maximum likelihood.

9
Introduction to the Probit model – latent variables

 Latent variable: Unobservable variable y* which can take


all values in (-∞, +∞).
 Example: y* = Utility(Labour income) - Utility(Non labour
income)
 Underlying latent model:
1, yi * > 0
yi = 
0, yi * ≤ 0
yi* = xi' β + ε i

10
Introduction to the Probit model – latent variables

 Probit is based on a latent model:


P ( yi = 1 | x ) = P ( yi > 0 | x )
*
φ (ε )
= P( xi' β + ε i > 0 | x)
= P(ε i > − xi' β | x)
= 1 − F (− xi' β )
− xi' β
Assumption: Error terms are independent and normally
distributed:
xi' β
P( yi = 1 | x) = 1− Φ(− ), σ ≡ 1
σ
= Φ( xi' β ) because of symmetry
− xi' β xi' β

11
Introduction to the Probit model – CDF

 Example:

1 CDF = Φ( z )
0,8

0,5

0,2

z = xi' β 1-0,2=0,8
−z 0 z

12
Introduction to the Probit model – CDF Probit vs. Logit

 F(z) lies between zero and one


 CDF of Probit: CDF of Logit:

z = xi' β z = xi' β

13
Introduction to the Probit model – PDF Probit vs. Logit

 PDF of Probit: PDF of Logit:

14
Introduction to the Probit model – The ML principle

 Joint density:

∏ F ( x β ) [1 − F ( x β ) ]
(1 − y i )
f ( y | x, β ) = '
i
yi '
i
i

= ∏i
Fi y i (1 − Fi ) 1− y i

 Log likelihood function:

ln L = ∑ yi ln Fi + (1 − yi ) ln(1 − Fi )
i

15
Introduction to the Probit model – The ML principle

 The principle of ML: Which value of β maximizes the


probability of observing the given sample?

∂ ln L  yi f i (1 − yi )(− f i ) 
= ∑ + xi
∂β i  Fi 1 − Fi 
 yi − Fi 
= ∑ f i xi
i  Fi (1 − Fi ) 
=0

16
Introduction to the Probit model – Example

 Example taken from Greene, Econometric Analysis, 5. ed.


2003, ch. 17.3.
 10 observations of a discrete distribution
 Random sample: 5, 0, 1, 1, 0, 3, 2, 3, 4, 1
 PDF:
e − θ θ xi
f ( xi , θ ) =
xi !
 Joint density :
e −10θ ⋅ θ ∑i e −10θ ⋅θ 20
10 xi

f ( x1 , x2 , K, x10 | θ ) = ∏ f ( xi ,θ ) = 10
=
∏x!
i =1 207,36
i
i =1
 Which value of θ makes occurance of the observed sample
most probable?
17
Introduction to the Probit model – Example

ln L (θ ) = −10θ + 20 ln θ − 12, 242


d ln L (θ ) 20
= −10 + =0
dθ θ

L(θ | x)

L(θ | x)
ln L(θ | x) d 2 ln L(θ ) 20
=− 2
dθ 2
θ
⇒ Maximum
2 θ

18
Application

 Analysis of the effect of a new teaching method in


economic sciences
 Data:
Beobachtung GPA TUCE PSI Grade Beobachtung GPA TUCE PSI Grade
1 2,66 20 0 0 17 2,75 25 0 0
2 2,89 22 0 0 18 2,83 19 0 0
3 3,28 24 0 0 19 3,12 23 1 0
4 2,92 12 0 0 20 3,16 25 1 1
5 4 21 0 1 21 2,06 22 1 0
6 2,86 17 0 0 22 3,62 28 1 1
7 2,76 17 0 0 23 2,89 14 1 0
8 2,87 21 0 0 24 3,51 26 1 0
9 3,03 25 0 0 25 3,54 24 1 1
10 3,92 29 0 1 26 2,83 27 1 1
11 2,63 20 0 0 27 3,39 17 1 1
12 3,32 23 0 0 28 2,67 24 1 0
13 3,57 23 0 0 29 3,65 21 1 1
14 3,26 25 0 1 30 4 23 1 1
15 3,53 26 0 0 31 3,1 21 1 0
16 2,74 19 0 0 32 2,39 19 1 1

Source: Spector, L. and M. Mazzeo, Probit Analysis and Economic Education. In:
Journal of Economic Education, 11, 1980, pp.37-44

19
Application – Variables

 Grade
Dependent variable. Indicates whether a student improved
his grades after the new teaching method PSI had been
introduced (0 = no, 1 = yes).
 PSI
Indicates if a student attended courses that used the new
method (0 = no, 1 = yes).
 GPA
Average grade of the student
 TUCE
Score of an intermediate test which shows previous
knowledge of a topic. 20
Application – Estimation

 Estimation results of the model (output from Stata):

21
Application – Discussion

 ML estimator: Parameters were obtained by maximization


of the log likelihood function.
Here: 5 iterations were necessary to find the maximum of
the log likelihood function (-12.818803)
 Interpretation of the estimated coefficients:
 Estimated coefficients do not quantify the influence of the
rhs variables on the probability that the lhs variable takes
on the value one.
 Estimated coefficients are parameters of the latent model.

22
Coefficients and marginal effects

 The marginal effect of a rhs variable is the effect of an unit


change of this variable on the probability P(Y = 1|X = x),
given that all other rhs variables are constant:
∂P( yi = 1 | xi ) ∂E ( yi | xi )
= = ϕ ( xi' β )β
∂xi ∂xi

 Recap: The slope parameter of the linear regression model


measures directly the marginal effect of the rhs variable on
the lhs variable.

23
Coefficients and marginal effects

 The marginal effect depends on the value of the rhs


variable.
 Therefore, there exists an individual marginal effect for
each person of the sample:

24
Coefficients and marginal effects – Computation

 Two different types of marginal effects can be calculated:


 Average marginal effect
Stata command: margin

 Marginal effect at the mean:


Stata command: mfx compute

25
Coefficients and marginal effects – Computation

 Principle of the computation of the average marginal


effects:

 Average of individual marginal effects


26
Coefficients and marginal effects – Computation

 Computation of average marginal effects depends on type


of rhs variable:
 Continuous variables like TUCE and GPA:
1 n
AME = ∑ ϕ ( xi' β )β
n i =1

 Dummy variable like PSI:


1 n
[ ]
AME= ∑ Φ(xi' β xik =1) − Φ(xi' β xik = 0)
n i=1

27
Coefficients and marginal effects – Interpretation

 Interpretation of average marginal effects:


 Continuous variables like TUCE and GPA:
An infinitesimal change of TUCE or GPA changes the
probability that the lhs variable takes the value one by X%.
 Dummy variable like PSI:
A change of PSI from zero to one changes the probability
that the lhs variable takes the value one by X percentage
points.

28
Coefficients and marginal effects – Interpretation
Variable Estimated marginal effect Interpretation
GPA 0.364 If the average grade of a
student goes up by an
infinitesimal amount,
the probability for the
variable grade taking
the value one rises by
36.4 %.
TUCE 0.011 Analog to GPA,with an
increase of 1.1%.
PSI 0.374 If the dummy variable
changes from zero to
one, the probability for
the variable grade
taking the value one
rises by 37.4 ppts.

29
Coefficients and marginal effects – Significance
 Significance of a coefficient: test of the hypothesis whether
a parameter is significantly different from zero.
 The decision problem is similar to the t-test, wheras the
probit test statistic follows a standard normal distribution.
The z-value is equal to the estimated parameter divided by
its standard error.
 Stata computes a p-value which shows directly the
significance of a parameter:

z-value p-value Interpretation


GPA : 3.22 0.001 significant
TUCE: 0,62 0,533 insignificant
PSI: 2,67 0,008 significant
30
Coefficients and marginal effects

 Only the average of the marginal effects is displayed.


 The individual marginal effects show large variation:

Stata command: margin, table

31
Coefficients and marginal effects

 Variation of marginal effects may be quantified by the


confidence intervals of the marginal effects.
 In which range one can expect a coefficient of the
population?
 In our example:
Estimated coefficient Confidence interval (95%)
GPA: 0,364 - 0,055 - 0,782
TUCE: 0,011 - 0,002 - 0,025
PSI: 0,374 0,121 - 0,626

32
Coefficients and marginal effects

 What is calculated by mfx?


 Estimation of the marginal effect at the sample mean.

Sample mean

33
Goodness of fit

 Goodness of fit may be judged by McFaddens Pseudo R².


 Measure for proximity of the model to the observed data.
 Comparison of the estimated model with a model which
only contains a constant as rhs variable.
 ln Lˆ ( M Full ) : Likelihood of model of interest.

 ln Lˆ ( M Intercept ): Likelihood with all coefficients except that of


the intercept restricted to zero.
 It always holds that ln Lˆ ( M Full ) ≥ ln Lˆ ( M Intercept )

34
Goodness of fit

 The Pseudo R² is defined as:


ln Lˆ ( M Full )
PseudoR = R
2 2
McF = 1−
ln Lˆ ( M Intercept )

 Similar to the R² of the linear regression model, it holds


that 0 ≤ RMcF
2
≤1

 An increasing Pseudo R² may indicate a better fit of the


model, whereas no simple interpretation like for the R² of
the linear regression model is possible.

35
Goodness of fit

 A high value of R²McF does not necessarily indicate a good


fit, however, as R²McF = 1 if ln Lˆ ( M Full ) = 0.
 R²McF increases with additional rhs variables. Therefore, an
adjusted measure may be appropriate:
ln Lˆ ( M Full ) − K
PseudoR 2
adjusted =R
2
McF = 1−
ln Lˆ ( M Intercept )

 Further goodness of fit measures: R² of McKelvey and


Zavoinas, Akaike Information Criterion (AIC), etc. See
also the Stata command fitstat.

36
Hypothesis tests

 Likelihood ratio test: possibility for hypothesis testing, for


example for variable relevance.
 Basic principle: Comparison of the log likelihood functions
of the unrestricted model (ln LU) and that of the restricted
model (ln LR)
 Test statistic: LR = −2ln λ = −2(lnLR − lnLU ) χ 2 ( K)

LR
λ = 0 ≤ λ ≤1
LU

 The test statistic follows a χ² distribution with degrees of


freedom equal to the number of restrictions.

37
Hypothesis tests

 Null hypothesis: All coefficients except that of the intercept


are equal to zero.
 In the example: LR χ 2 (3) = 15,55
 Prob > chi2 = 0.0014
 Interpretation: The hypothesis that all coefficients are equal
to zero can be rejected at the 1 percent significance level.

38

You might also like