0% found this document useful (0 votes)
5 views

Topic 3a

Uploaded by

Edlyn Linet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Topic 3a

Uploaded by

Edlyn Linet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 64

Topic 3

Simple Linear Regression

1
Learning objectives
 Explain the assumptions of classical linear
regression using the Gauss-Markov Theorem
(GMT) framework.
 Estimate the parameters of a regression using
the ordinary least squares (OLS) method.
 Interpret the coefficients of a regression.
 Understand the steps of hypothesis testing
 Standard errors
 Confidence intervals

2
Gauss-Markov Theorem:
Under the 5 Gauss-Markov assumptions,
the OLS estimator is the best, linear,
unbiased estimator of the true parameters
(β’s) conditional on the sample values of
the explanatory variables. In other words,
the OLS estimators is BLUE

3
5 Gauss-Markov Assumptions for
Simple Linear Model (Wooldridge, p.65)
1. Linear in Parameters y  0  1 x1  u
2. Random Sampling of n ( xi , yi ) : i 1, 2,...n
observations

3. Sample variation in x ( x1 x2 x3 ... xn )


explanatory variables. xi’s
are not all the same value

4. Zero conditional mean. The E (u x ) 0


error u has an expected
value of 0, given any values
of the explanatory variable

5. Homoskedasticity. The error Var (u x)  2


has the same variance given
any value of the explanatory
variable. 4
How Good are the Estimates?
Properties of Estimators
 Small Sample Properties
 Trueregardless of how much data we have
 Most desirable characteristics
 Unbiased
 Efficient
 BLUE (Best Linear Unbiased Estimator)

5
“Second Best” Properties of
Estimators
 Asymptotic (or large sample) Properties
 True in hypothetical instance of infinite data
 In practice applicable if N>50 or so
 Asymptotically unbiased
 Consistency
 Asymptotic efficiency

6
Bias
 A parameter is unbiased if

ˆ
E (  j )  j , j 0,1,...., k
 In other words, the average value of the estimator
in repeated sampling equals the true parameter.
 Note that whether an estimator is biased or not
implies nothing about its dispersion.
7
Efficiency
 An estimator is efficient if its variance is less
than any other estimator of the parameter.

 This criterion ˆonly useful in combination with


others. (e.g.  j =2 is low variance, but biased)
ˆ j is the “best” Unbiased estimator if
Var ( ˆ j ) Var (  j )
,where  j is any other unbiased estimator
of β
8
F(βx)
Unbiased and
efficient estimator
of β Biased estimator
High Sampling
of β
Variance means
inefficient
estimator of β

0 9
True β β + bias
BLUE
(Best Linear Unbiased Estimate)
 An Estimator ˆ j is BLUE
if:

 ˆ j is a linear function

 ˆ j is unbiased: E ( ˆ j )  j , j 0,1,...., k
ˆ j is the most efficient: Var ( ˆ j ) Var (  j )

10
Large Sample Properties
 Asymptotically Unbiased
 As n becomes larger E( ˆ j ) trends toward β
j
 Consistency
 If the bias and variance both decrease as n
gets larger, the estimator is consistent.
 Asymptotic Efficiency
 asymptotic distribution with finite mean and
variance
 is consistent
 no estimator has smaller asymptotic variance
11
F(βx)
Demonstration of
Consistency

n=50

n=16

n=4

0 12
True β
Linear Regression Model

13
Types of
Regression Models
1 Explanatory Regression 2+ Explanatory
Variable M odels Variables

Simple M ultiple

Non- Non-
Linear Linear
Linear Linear

14
Linear Equations
Y
Y = mX + b
C ha ng e
m = S lo pe in Y
C ha ng e in X
b = Y -in terce pt
X

15
Linear Regression Model
 1. Relationship Between Variables Is
a Linear Function
Population Population Random
Y-Intercept Slope Error

Yi   0  1X i   i
Dependent Independent
(Response) (Explanatory) Variable
Variable (e.g., Years s. serocon.)
(e.g., CD+ c.)
Population & Sample
Regression Models
Population

 


17
Population & Sample
Regression Models
Population

Unknown

Relationship
Yi  0  1X i   i
 


18
Population & Sample
Regression Models
Population Random Sample

Unknown

Relationship
Yi  0  1X i   i 

 


19
Population & Sample
Regression Models
Population Random Sample

Unknown

Relationship
Yi  0  1X i   i 

 


20
Population Linear Regression Model

Y Yi  0   1X i   i Observed
value

i = Random error

E Y   0  1 X i

X
Observed value
21
Sample Linear Regression
Model
Y 𝑒^𝑖=𝑌 𝑖 −^𝑌 𝑖

^i = Random
error
Unsampled
observation
yˆi ˆ0  ˆ1 xi
X
Observed value
22
Estimating Parameters:
Ordinary Least Squares
Method

23
Scatter plot
 1. Plot of All (Xi, Yi) Pairs
 2. Suggests How Well Model Will Fit
Y
60
40
20
0 X
0 20 40 60

24
Thinking Challenge

How would you draw a line through the


points? How do you determine which line
‘fits best’?

Y
60
40
20
0 X
0 20 40 60

25
Thinking Challenge
How would you draw a line through the
points? How do you determine which line
‘fits best’?

Slope changed
Y
60
40
20
0 X
0 20 40 60
Intercept unchanged
26
Thinking Challenge
How would you draw a line through the
points? How do you determine which line
‘fits best’?
Slope unchanged

Y
60
40
20
0 X
0 20 40 60
Intercept changed
27
Thinking Challenge
How would you draw a line through the
points? How do you determine which line
‘fits best’?

Slope changed
Y
60
40
20
0 X
0 20 40 60
Intercept changed
28
Ordinary Least Squares
 1. ‘Best Fit’ Means Difference Between
Actual Y Values & Predicted Y Values Are
a Minimum. But Positive Differences Off-
Set Negative ones

29
Ordinary Least Squares
 1. ‘Best Fit’ Means Difference Between
Actual Y Values & Predicted Y Values is a
Minimum. But Positive Differences Off-Set
Negative ones. So square errors!

   ˆ
n n
2
Yi  Yˆi 2
i
i 1 i 1

30
Ordinary Least Squares
 1. ‘Best Fit’ Means Difference Between
Actual Y Values & Predicted Y Values Are
a Minimum. But Positive Differences Off-
Set Negative. So square errors!

   ˆ
n n
2
Yi  Yˆi 2
i
i 1 i 1
 2. LS Minimizes the Sum of the
Squared Differences (errors) (SSE)
31
Ordinary Least Squares Graphically
𝑛
^ ^ 2 ^ 2 ^2 ^
𝑂 𝐿𝑆𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒𝑠 ∑ 𝑒𝑖 =𝑒1 +𝑒2 +𝑒3 +𝑒 4
2 2

𝑖=1

Y 𝑒^2=𝑌 2 −𝑌
^ 2

^4
^2
^1 ^3
yˆi ˆ0  ˆ1 xi
X
32
Coefficient Equations
 Prediction equation
yˆi ˆ0  ˆ1 xi

 Sample slope
SS xy  xi  x yi  y 
ˆ1  
2
SS xx  ix  x 
 Sample Y - intercept

ˆ0  y  ˆ1x
33
Interpretation of Coefficients
^
 1. Slope (1)
^
 Estimated Y Changes by 1 for Each 1 Unit
Increase in X
^
 If 1 = 2, then Y Is Expected to Increase by 2 for
Each 1 Unit Increase in X

34
Interpretation of Coefficients
^
 1. Slope (1)
^
 Estimated Y Changes by 1 for Each 1 Unit
Increase in X
 A 1 unit increase in X leads to a (+/-) unit
change
^ in Y
 If 1 = 2, then Y Is Expected to Increase by 2 for
Each 1 Unit Increase
^ in X
 2. Y-Intercept (0)
 Average
^ Value of Y When X = 0
 If  = 4, then Average Y Is Expected to Be
0
35
4 When X Is 0
Parameter Estimation Example
 Obstetrics: What is the relationship between
Mother’s Estriol level & Birthweight using the
following data?
Estriol Birthweight
(mg/24h) (g/1000)
1 1
2 1
3 2
4 2
5 4

36
Exercise
 Plot a scatter diagram
 Estimate the linear regression
 Interpret your results based on economic theory
 Show that
 Show that the SSE ≈ 0

=0

37
Scatterplot
Birthweight vs. Estriol level
Birthweight

4
3
2
1
0
0 1 2 3 4 5 6

Estriol level
38
Parameter Estimation Solution
Table
Xi Yi Xi2 Yi2 XiYi
1 1 1 1 1
2 1 4 1 2
3 2 9 4 6
4 2 16 4 8
5 4 25 16 20
15 10 55 26 37
39
Parameter Estimation Solution
n
   n

  X i    Yi 
n
   i 1  1510

i 1
X Y
i i  37 
n 5
ˆ1  i 1
 0.70

n

2

15
2

  i X 55 
n
  5
 2 i 1
X i 
i 1 n

ˆ0 Y  ˆ1 X 2  0.70 3  0.10


40
Coefficient Interpretation
Solution
^
 1. Slope (1)
A 1 unit Increase in Estriol (X) leads to a 0.7
unit increase in birthweight (Y)

41
Coefficient Interpretation
Solution
^
 1. Slope (1)
 Birthweight (Y) Is Expected to Increase by .7
Units for Each 1 unit Increase in Estriol (X)
^
 2. Intercept (0)
 Average Birthweight (Y) Is -.10 Units When
Estriol level (X) Is 0
 Difficult to explain
 The birthweight (or any weight for that matter)

should always be positive


42
43
Limitations of simple linear regression
 Only considers one independent variable.
 The dependent variable must be continuous.
 Cannot show causation.
 Sensitive to outliers.
 Can only describe linear relationships.
 Only looks at the mean of the dependent variable.

44
Statistical Inference

Hypothesis testing
Confidence intervals
Hypothesis Testing

Two-Tailed Test about a Population Mean: Small n

Reject H0 Reject H0
Anderson, Sweeney, and Williams

/2
/2

t
-t/2 t/2
0
46
(Critical values)
Student’s t-test
 The t-test is used to test hypotheses about
means when the population variance is
unknown (the usual case). Closely related
to z, the unit normal.
 Remember: If the sample is small (n < 30)
and the population variance s is
unknown, then we use the t-test and not
the z-test.
Steps of Hypothesis Testing

1. Determine the null and alternative hypotheses.


2. Specify the level of significance .
3. Select and calculate the test statistic that will be used to
test the hypothesis.
Using the Test Statistic
4. Use to determine the critical value for the test statistic.
The critical value comes from the Student’s t-distribution
table.
Anderson, Sweeney, and Williams

5. State the rejection rule for H0 . Use the value of the test
statistic and the rejection rule to determine whether to
reject H0.
6. Make a conclusion on the statistical significance of the
coefficient.

48
How do we compute the test statistic?

For our cases =0


How do we get the t-critical?
 From the Student’s t-distribution tables

Recall that this is a 2-tailed test, so check =


0.05 from tables
Finding the Standard Errors
and = SE

and = SE =
Where:

51
How do we find the critical values?
t distribution values
With comparison to the Z value
Confidence t t t Z
Level (10 d.f.) (20 d.f.) (30 d.f.) ____

.80 1.372 1.325 1.310 1.28


.90 1.812 1.725 1.697 1.64
.95 2.228 2.086 2.042 1.96
.99 3.169 2.845 2.750 2.58

Note: t Z as n increases

from “Statistics for Managers” Using Microsoft ® Excel 4th Edition, Prentice-Hall 2004
Confidence Intervals
Confidence Interval: An interval of values computed from the
sample, that is almost sure to cover the true population
value.

We make confidence intervals using values computed from the sample, not the
known values from the population.

The confidence level is the probability that we do not find a statistically


significant effect of the effect of an independent variable is zero.

It is related to the significance level and it is defined as 1 - 


Confidence Intervals
Interpretation: In 95% of the samples we take, the true
population proportion (or mean) will be in the interval.

We are 95% confident that lies between the lower limit and the
upper limit

This is also the same as saying we are 95% confident that the true population
proportion (or mean) will be in the interval
How do we compute the intervals?
Single population mean (small n, normally distributed)
How do we compute the intervals?
Single population mean (small n, normally distributed)
Hypothesis Testing Example
 Obstetrics: What is the relationship between
Mother’s Estriol level & Birthweight using the
following data?
Estriol Birthweight
(mg/24h) (g/1000)
1 1
2 1
3 2
4 2
5 4

57
Exercise 2
 Compute the standard errors for and
 Test the statistical significance of the slope at 5% level
()
 Compute the confidence intervals for and
 Write out the compact form of the regression equation:

() (SE)
=?
n =?

58
Exercise 3
 The following data relates to the quantity
demanded and price of a commodity
collected from five markets.
Price 1 2 3 4 5
Quantity demanded 15 10 14 8 3

59
Exercise 3
 Plot a scatter diagram
 Estimate the linear regression
 Interpret your results based on economic theory
 Show that
 Show that the SSE ≈ 0
 Compute the standard error for
 Test the statistical significance of the slope at 5% level ()
 Write out the compact form of the regression equation
 Compute the confidence intervals for

60
Conclusion from Statistical Analysis
Types of Statistical Errors
Type I and Type II Error
Type I and Type II Error

 False Positive: (Type 1 Error)


 Interpretation: You predicted positive and it’s false.
 You predicted that a man is pregnant but he actually
is not.
 False Negative: (Type 2 Error)
 Interpretation: You predicted negative and it’s false.
 You predicted that a woman is not pregnant but she
actually is.

You might also like