0% found this document useful (0 votes)
4 views

1. Linear regression Model - Applied_Part 3

Uploaded by

sahrish.khan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

1. Linear regression Model - Applied_Part 3

Uploaded by

sahrish.khan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

1.

LINEAR REGRESSION MODEL


APPLIED
PART 3

References:
Verbeek: Chapter 2 (2.5)
Wooldridge: Chapter 4

1
The examples use the INJURY dataset.

library(tidyverse)
library(stargazer)
library(summarytools)
library(readxl)
INJURY <- read_csv("INJURY.csv")

2
3. OLS: Estimator and Estimates

a. Hypothesis Testing

- test involving one parameter


- test involving more parameters: linear restrictions
- test involving more parameters: joint significance
- general test: Wald test

b. Size, power and p-values

3
a. Hypothesis Testing
Often, economic theory implies certain restrictions upon our coefficients. For example,
MPC=1, or k = 0.

We can check whether our estimates deviate “significantly” from these restrictions by
means of a statistical test.

To perform a test, we need to:


1. Formulate hypotheses
null: a restriction on our model
alternative: broader statement on the model
They are exclusive.

4
2. Define a rejection region and an acceptance region:
- what data will support the model restrictions and what data will not
- how far from zero my estimate of k needs to be to conclude that k is not
zero

3. Check whether the estimate obtained with the data 𝛽መ𝑘 falls in the
rejection/acceptance region.

• Define appropriate test statistic; its distribution is known if the null hypothesis
is true.
• Define a confidence level α and determine whether the value of the statistic falls
in the rejection region, i.e. the statistic is unlikely to come from the said
distribution
5
Tests involving one parameter

Given (A1) to (A5) we have that 𝜷~𝑵(𝜷, 𝝈𝟐 𝑿′ 𝑿 −𝟏 ), therefore
𝛽መ𝑘 − 𝛽𝑘
𝑧= ~𝑁 0,1
𝜎 𝑐𝑘𝑘
where 𝑐𝑘𝑘 is the k-diagonal element of 𝑋 ′ 𝑋 −1

We need to replace 𝜎 with s which has a χ-squared distribution, hence:


𝛽መ𝑘 − 𝛽𝑘
𝑡𝑘 = ~𝑡𝑁−𝐾
𝑠 𝑐𝑘𝑘
(see handout)
Student’s t distribution like normal, fatter tails if N-K small

6
William Sealy Gosset (1876- 1937)
Statistician & Guinness brewer in Dublin & London
Published papers on small sample inference as “Student”

7
As sample size increases, smaller tails
As degrees of freedom increase, smaller tails
→ Tend to the normal distribution

8
Simple test: 𝐻0 : 𝛽𝑘 = 𝛽𝑘0
𝐻1 : 𝛽𝑘 ≠ 𝛽𝑘0 or 𝛽𝑘 > 𝛽𝑘0

෡𝑘 −𝛽0
𝛽 𝑘
Under the null hypothesis 𝑡𝑘 = 𝑠𝑒(𝛽෡𝑘 ) ~𝑡𝑁−𝐾

Set α=0.05 or lower (the higher N, the lower α should be)


• 𝑃 𝑡𝑘 > 𝑡𝑁−𝐾,𝛼Τ2 = 𝛼 for a two-sided test
• 𝑃 𝑡𝑘 > 𝑡𝑁−𝐾,𝛼 = 𝛼 for a one-sided test

For N-k≥50, 𝑡𝑁−𝐾 are very similar to the critical values of normal distribution.
For 𝛼 = 0.05:
- two-sided test: 𝑡𝑁−𝐾,𝛼Τ2 = 1.96
- one-sided test: 𝑡𝑁−𝐾,𝛼 = 1.64
9
10
Reject 𝐻0 if statistics is greater than critical value: basically we consider values
“too large” if they are unlikely to come from a t-distribution with mean 𝛽𝑘0 .

NOTE: test is on 𝛽𝑘 , unknown parameter, 𝛽መ𝑘 is the estimate!

11
Even simpler test: 𝐻0 : 𝛽𝑘 = 0
𝐻1 : 𝛽𝑘 ≠ 0 or 𝛽𝑘 > 0 or 𝛽𝑘 < 0

෡𝑘
𝛽
Then under the null hypothesis 𝑡𝑘 = ෡𝑘 ) ~𝑡𝑁−𝐾 (t-ratio!)
𝑠𝑒(𝛽
If 𝑡𝑘 > 2 variable is not significant.

NOTE!
1. An estimate could be “statistically significant” but be tiny, i.e. not economically
significant
2. With huge datasets nearly everything is “significant”
3. What qualifies as statistically significant depends on the choice of α which is arbitrary

12
Setting α=.05 means that a two standard deviation difference is
considered sufficient for statistical significance (i.e. t>2)

In particle physics experiments to detect a particle (like the Higgs Boson


in CERN) a three standard deviation difference is required for “evidence
of a particle”

A five standard deviation difference is required for a “discovery” (5σ)


This corresponds to α= .0000003

13
• Recall example of injury data: for Kentucky only
reg3 <- lm(log(durat) ~ afchnge + highearn + afhigh , data=subset(INJURY, ky==1))

14
Add some controls: gender and marital status, ln(age), ln(prewage), injury type

coef on afchnge is not significant

coef on highearn is now not significant

coef on afhigh remains significant and


positive, slightly higher

Head injuries have significant negative


impact on the duration of the claim

15
Tests involving more parameters: linear restriction
Suppose we want to test a restriction involving more than one parameter.
𝐻0 : 𝑟1 𝛽1 + ⋯ + 𝑟𝑘 𝛽𝑘 = 𝑟 ′ 𝛽 = 𝑞
𝐻1 : 𝑟′𝛽 ≠ 𝑞
q scalar, r’ vector (1xK)
𝑟 ′ 𝛽መ linear combination of random variables 𝛽መ and 𝑉 𝑟 ′ 𝛽መ = 𝑟 ′ 𝑉 𝛽መ 𝑟
Test statistic:
𝑟 ′ 𝛽መ − 𝑟 ′ 𝛽
𝑡= ~𝑡𝑁−𝐾
𝑠𝑒(𝑟 𝛽)′ መ
And use 2 as a critical value at 5%

16
Sometimes, instead of performing the test you can transform the original
model to incorporate the constraint and perform a simple test of
significance on one parameter.

Ex: equality of two coefficients

17
Tests involving more parameters: joint significance
Suppose we want to test whether J coefficients are jointly equal to zero.
𝐻0 : 𝛽𝐾−𝐽+1 = ⋯ = 𝛽𝐾 = 0
𝐻1 : 𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 ≠ 0

The easiest way to obtain a test statistic for this is to estimate the model
twice:
• once without the restrictions: full set of regressors (model 1)
• once with the restrictions imposed: omit the corresponding x variables
(model 0)

18
Test is based on residual sum of squares.
𝑆0 : residual sum of squares from restricted model
𝑆1 : residual sum of squares from unrestricted model
𝑆0 > 𝑆1 but if 𝐻0 correct, difference should be close to zero

Under (A1)-(A5), if null hypothesis is true, the statistic


𝑆0 − 𝑆1 2
~𝜒 𝐽
𝜎2
substituting our estimate for 𝜎 2 (see handout) we get

(𝑆0 −𝑆1 )/𝐽 𝐽


~𝐹(𝐽,𝑁−𝐾) = 𝐹𝑁−𝐾
𝑆1 /(𝑁 − 𝐾)

19
Test can also be based on 𝑅2 .
𝑅02 : from restricted model
𝑅12 : from unrestricted model
𝑅12 > 𝑅02 but if 𝐻0 correct, difference should be close 0

(𝑆0 −𝑆1 )/𝐽 (𝑅12 −𝑅02 )/𝐽 𝐽


= ~𝐹 = 𝐹
𝑆1 /(𝑁 − 𝐾) (1 − 𝑅12 )/(𝑁 − 𝐾) (𝐽,𝑁−𝐾) 𝑁−𝐾

In both cases, one-sided test:


• F high→ the difference between the two S or R2 is “large” which means that the
restrictions are not valid
• F low → the difference between the two S or R2 is not large enough to reject
the null hypothesis

20
• If testing that all 𝛽2 = ⋯ = 𝛽𝑘 = 0 , then 𝑆0 =total sum of squares and 𝑅02 =0 so
statistic becomes
(𝑅12 )/𝐽
(1 − 𝑅12 )/(𝑁 − 𝐾)

• Sometimes t-test on each of the J vars does not reject the null, and yet F-test on
joint significance rejects the null

• Also the reverse may be true

21
Example: test joint significance of injury type dummies:
Two ways to do test of joint significance in R
1. Use linearHypothesis function from package car

install.packages("car") # car=companion to applied regression


library(car)
Hnull1 <- c("head=0", "neck=0", "upextr=0", "trunk=0", "lowback=0", "lowextr=0",
"occdis=0")
linearHypothesis(reg4,Hnull1)

22
23
2. Manually
- Estimate the unrestricted model → get 𝑆1 → 8328.2

24
- Estimate the restricted model → get 𝑆0 → 8426.5

25
- Calculate the F statistic

(𝑆0 −𝑆1 )/𝐽 (8426.5 − 8328.2)/7


= = 9.06
𝑆1 /(𝑁 − 𝐾) 8328.2/(5360 − 15)

26
QUESTION:

Suppose you wanted to test whether the variable previous wages had a
different effect on the duration of benefits depending on whether the
worker is a higher earner or not; how would you augment your
specification and how would you perform the test?

27
Create an interaction variable equal to lprewagehe = lprewage × highearn

the coefficient on that interaction is


significant at 5% →
the wages before the injury have an
even
higher positive impact on the duration
of the claim for the higher earners
than for the lower earner:

for low earners a 1% increase in


previous
earnings increases the duration by
0.19%,
while for high earners it increases the
duration by (o.19 + 0.38)%

28
General Test: Wald Test
In general, the previous test can be represented as
𝐻0 : 𝑅𝛽 = 𝑞
𝐻1 : 𝑅𝛽 ≠ 𝑞
where R is (JxK) and q is (Jx1)
Under (A1)-(A5), and if null hypothesis is true


𝑅𝛽~𝑁 𝑅𝛽, 𝑅𝑉 𝛽መ 𝑅′
And its quadratic form is distributed as

′ −1
𝑅𝛽መ − 𝑞 𝑅𝑉 𝛽መ 𝑅′ 𝑅𝛽መ − 𝑞 ~𝜒𝐽=#𝑜𝑓 𝑟𝑒𝑠𝑡𝑟𝑖𝑐𝑡𝑖𝑜𝑛𝑠

29
Using 𝑠 2 for the estimated variance of 𝛽መ we have the test statistic
′ −1
𝜉 = 𝑅𝛽መ − 𝑞 𝑅𝑉෠ 𝛽መ 𝑅′ 𝑅𝛽መ − 𝑞
In large samples the difference between 𝑠 2 and 𝜎 2 is negligible and 𝜉~𝜒𝐽 .

𝜉 𝐽
Or you can use ~𝐹𝑁−𝐾 .
𝐽

Advantage of this test is that it does not require to estimate two models, just
estimate the unrestricted model!

30
Test whether head and neck injury have same effect on duration of benefit

The two models are nested

Add test = "Chisq“ to obtain chi-


square statistic – here no
difference since J=1

F= 24.34 → reject null


hypothesis

31
Test also that gender and marital status have no effect

Chi-square = 27.51 → reject null


hypotheses

32
NOTE!!!

1. All statistics’ distributions seen in previous tests are derived based


- on assumption of normality of error term (hence 𝛽መ𝑘 is normally
distributed),
OR
- on asymptotic theory 𝛽መ𝑘 ~𝑁

2. Asymptotically, t-student distribution tend to a normal distribution

33
b. Size, power and p-values
Type I error: we reject the null hypothesis, while it is actually true.
The probability of a type I error (the size α of the test) is directly
controllable by the researcher by choosing the confidence level.
E.g. a confidence level of 95% corresponds with a size of 5%.

Type II error: we do not reject the null hypothesis while it is false.


The probability of not making a type II error is called the power of a test.
We would like the power of a test to be high.

34
By reducing the size of a test to e.g. 1%, the probability of rejecting the null
hypothesis will decrease, even if it is false.

Thus, a lower probability of a type I error will imply a higher probability of a type
II error.
(There is a trade off between the two error types.)

If you want to be really sure of not convicting an innocent person, then you are
more likely to acquit a guilty one.
Or get more evidence/data

In general, larger samples imply better power properties. Accordingly, in large


samples we may prefer to work with a size of 1% rather than the “standard” 5%.

35
t distribution
Degrees of freedom = N-k

36
𝑛 𝐽
𝐹𝑚 = 𝐹𝑁−𝐾

37
Note that we say:
“We reject the null hypothesis” (at the 95% confidence level) or
“We do not reject the null hypothesis”.

We typically do not say:


“We accept the null hypothesis”.

Why?
Two mutually exclusive hypotheses (e.g. β2 = 0 and β2 = 0.01) may not be
rejected by the data, but it is silly to accept both hypotheses.
(Sometimes, tests are just not very powerful)

38
P-values
The p-value denotes the marginal significance level for which the null
hypothesis is rejected.

If a p-value is smaller than the size α (e.g. 0.05) we reject the null hypothesis.

Many modern software packages provide p-values with their tests. This allows
you to perform the test without checking tables of critical values. It also
allows you to perform the test without understanding what is going on.

Note that a p-value of 0.08 indicates that the null is rejected at the 10% level
but not at the 5% level.

39
Important!
The p-value gives the probability under the null hypothesis of finding a
test statistic that exceed the value we have found in the data, i.e. to find
more extreme results.
This is not the probability that the null hypothesis is true, given
the data.
A small p-value does not mean that an effect is economically important.
With a new sample, the p-value will be different (because its value
depends upon the randomness of the sample).

40

You might also like