0% found this document useful (0 votes)
39 views

Lecture 2 SLR - 1

Uploaded by

1621995944
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views

Lecture 2 SLR - 1

Uploaded by

1621995944
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

ECON 7310 Elements of Econometrics

Lecture 2: Linear Regression with One Regressor

1 / 28
Outline:

▶ The population linear regression model (LRM)


▶ The ordinary least squares (OLS) estimator and the sample regression
line
▶ Measures of fit of the sample regression
▶ The least squares assumptions
▶ The sampling distribution of the OLS estimator

2 / 28
Linear Regression

▶ Linear regression lets us estimate the slope of the population regression


line.
▶ The slope of the population regression line is the expected effect on Y of
a unit change in X .
▶ Ultimately our aim is to estimate the causal effect on Y of a unit change
in X – but for now, just think of the problem of fitting a straight line to data
on two variables, Y and X .

3 / 28
Linear Regression

▶ The problem of statistical inference for linear regression is, at a general


level, the same as for estimation of the mean or of the differences
between two means.
▶ Statistical, or econometric, inference about the slope entails:
▶ Estimation:
How should we draw a line through the data to estimate the population
slope? Answer: ordinary least squares (OLS).
What are advantages and disadvantages of OLS?
▶ Hypothesis testing:
How to test if the slope is zero?
▶ Confidence intervals:
How to construct a confidence interval for the slope?

4 / 28
The Linear Regression Model SW Section 4.1

▶ The population regression line:

Test Score = β0 + β1 STR


▶ β1 = slope of population regression line
= change in test score for a unit change in student-teacher ratio (STR)
▶ Why are β0 and β1 “population” parameters?
▶ We would like to know the population value of β1 .
▶ We don’t know β1 , so must estimate it using data.

5 / 28
The Population Linear Regression Model

Consider
Yi = β0 + β1 Xi + ui
for i = 1, . . . , n
▶ We have n observations, (Xi , Yi ), i = 1, .., n.
▶ X is the independent variable or regressor or right-hand-side variable
▶ Y is the dependent variable or left-hand-side variable
▶ β0 = intercept
▶ β1 = slope
▶ ui = the regression error
▶ The regression error consists of omitted factors. In general, these
omitted factors are other factors that influence Y , other than the variable
X . The regression error also includes error in the measurement of Y .

6 / 28
The population regression model in a picture

▶ Observations on Y and X (n = 7); the population regression line; and


the regression error (the “error term"):

7 / 28
The Ordinary Least Squares Estimator (SW Section 4.2)

▶ How can we estimate β0 and β1 from data? Recall that was the least
squares estimator of µY solves,
n
X
min (Yi − m)2
m
i=1

▶ By analogy, we will focus on the least squares (“ordinary least squares”


or “OLS”) estimator of the unknown parameters β0 and β1 . The OLS
estimator solves,
Xn
min [Yi − (b0 + b1 Xi )]2
b0 ,b1
i=1

▶ In fact, we estimate the conditional expectation function E[Y |X ] under


the assumption that E[Y |X ] = β0 + β1 X

8 / 28
Mechanics of OLS

▶ The population regression line:

Test Score = β0 + β1 STR

9 / 28
Mechanics of OLS

▶ The OLS estimator minimizes the average squared difference between


the actual values of Yi and the prediction (“predicted value”) based on
the estimated line.
▶ This minimization problem can be solved using calculus (Appendix 4.2).
▶ The result is the OLS estimators of β0 and β1 .

10 / 28
OLS estimator, predicted values, and residuals

▶ The OLS estimators are


Pn
i=1 (Xi − X )(Yi − Y)
βb1 = Pn 2
i=1 (Xi − X )

βb0 = Y − βb1 X
▶ These are estimates of the unknown population parameters β0 and β1 .
▶ The OLS predicted (fitted) values Y
bi and residuals u
bi are

Y
bi = βb0 + βb1 Xi
bi = Yi − Y
u bi

▶ The estimated intercept, βb0 , and slope, βb1 , and residuals u


bi are
computed from a sample of n observations (Xi , Yi ) i = 1, . . . , n.

11 / 28
Predicted values & residuals

▶ One of the districts in the data set is Antelope, CA, for which
STR = 19.33 and TestScore = 657.8

predicted value: = 698.9 − 2.28 × 19.33 = 654.8


residual: = 657.8 − 654.8 = 3.0

12 / 28
OLS regression: R output

TestScore = 698.93 − 2.28 × STR


We will discuss the rest of this output later.

13 / 28
Measures of fit Section 4.3

▶ Two regression statistics provide complementary measures of how well


the regression line “fits” or explains the data:
▶ The regression R 2 measures the fraction of the variance of Y that is
explained by X ; it is unit free and ranges between zero (no fit) and one
(perfect fit)
▶ The standard error of the regression (SER) measures the magnitude of
a typical regression residual in the units of Y .

14 / 28
Regression R 2

▶ The sample variance of Yi = n1 ni=1 (Yi − Y )2


P

The sample variance of Y bi = 1 Pn (Y b )2 , where in fact Y


bi − Y b = Y.
n i=1
2
R is simply the ratio of those two sample variances.
▶ Formally, we define R 2 as follows (two equivalent definitions);
Pn b
Explained Sum of Squares (ESS) (Yi − Y )2
R 2 := = Pi=1
n
Total Sum of Squares (TSS) i=1 (Yi − Y )
2
Pn
Residual Sum of Squares (RSS) b2
u
R 2 := 1 − = 1 − Pn i=1 i
Total Sum of Squares i=1 (Yi − Y )
2

▶ R 2 = 0 ⇐⇒ ESS = 0 and R 2 = 1 ⇐⇒ ESS = TSS. Also, 0 ≤ R 2 ≤ 1


▶ For regression with a single X ,
R 2 = the square of the sample correlation coefficient between X and Y

15 / 28
The Standard Error of the Regression (SER)

▶ The SER measures the spread of the distribution of u. The SER is


(almost) the sample standard deviation of the OLS residuals (?)
v
u n
u 1 X
SER := t bi2
u
n−2
i=1

▶ The SER:
▶ has the units of ui , which are the units of Yi
▶ measures the average “size” of the OLS residual (the average “mistake”
made by the OLS regression line)
▶ The root mean squared error (RMSE) is closely related to the SER:
v
u n
u1 X
RMSE := t bi2
u
n
i=1

▶ When n is large, SER ≈ RMSE. 1

1
Here, n − 2 is the degrees of freedom – need to subtract 2 because there are two parameters
to estimate. For details, see section 18.4.
16 / 28
Example of the R 2 and the SER

▶ TestScore = 698.9 − 2.28 × STR, R 2 = 0.05, SER = 18.6


▶ STR explains only a small fraction of the variation in test scores.
▶ Does this make sense?
▶ Does this mean the STR is unimportant in a policy sense?

17 / 28
Least Squares Assumptions (SW Section 4.4)

▶ What, in a precise sense, are the properties of the sampling distribution


of the OLS estimator? When will it be unbiased? What is its variance?
▶ To answer these questions, we need to make some assumptions about
how Y and X are related to each other, and about how they are collected
(the sampling scheme)
▶ These assumptions – there are three – are known as the Least Squares
Assumptions.

18 / 28
Least Squares Assumptions (SW Section 4.4)

Yi = β0 + β1 Xi + ui , i = 1, . . . , n

1. The conditional distribution of u given X has mean zero, that is,


E(u|X = x) = 0.
▶ This implies that OLS estimators are unbiased
2. (Xi , Yi ), i = 1, · · · , n, are i.i.d.
▶ This is true if (X , Y ) are collected by simple random sampling
▶ This delivers the sampling distribution of βb0 and βb1
3. Large outliers in X and/or Y are rare.
▶ Technically, X and Y have finite fourth moments
▶ Outliers can result in meaningless values of βb1

19 / 28
Least squares assumption #1: E(u|X = x) = 0.

For any given value of X , the mean of u is zero:

Example: TestScorei = β0 + β1 STRi + ui , ui = other factors


▶ What are some of these “other factors”?
▶ Is E(u|X = x) = 0 plausible for these other factors?

20 / 28
Least squares assumption #1: E(u|X = x) = 0 (continued)

▶ A benchmark for thinking about this assumption is to consider an ideal


randomized controlled experiment:
▶ X is randomly assigned to people (students randomly assigned to
different size classes; patients randomly assigned to medical
treatments). Randomization is done by computer – using no information
about the individual.
▶ Because X is assigned randomly, all other individual characteristics –
the things that make up u – are distributed independently of X , so u and
X are independent
▶ Thus, in an ideal randomized controlled experiment, E(u|X = x) = 0
(that is, LSA #1 holds)
▶ In actual experiments, or with observational data, we will need to think
hard about whether E(u|X = x) = 0 holds.

21 / 28
Least squares assumption #2: (Xi , Yi ), i = 1, · · · , n are i.i.d.

▶ This arises automatically if the entity (individual, district) is sampled by


simple random sampling:
▶ The entities are selected from the same population, so (Xi , Yi ) are
identically distributed for all i = 1, . . . , n.
▶ The entities are selected at random, so the values of (X , Y ) for different
entities are independently distributed.
▶ The main place we will encounter non-i.i.d. sampling is when data are
recorded over time for the same entity (panel data and time series data)
– we will deal with that complication when we cover panel data.

22 / 28
Least squares assumption #3: Large outliers are rare
Technical statement: E(X 4 ) < ∞ and E(Y 4 ) < ∞

▶ A large outlier is an extreme value of X or Y


▶ On a technical level, if X and Y are bounded, then they have finite fourth
moments. (Standardized test scores automatically satisfy this; STR,
family income, etc. satisfy this too.)
▶ The substance of this assumption is that a large outlier can strongly
influence the results – so we need to rule out large outliers.
▶ Look at your data! If you have a large outlier, is it a typo? Does it belong
in your data set? Why is it an outlier?

23 / 28
OLS can be sensitive to an outlier

▶ Is the lone point an outlier in X or Y ?


▶ In practice, outliers are often data glitches (coding or recording
problems). Sometimes they are observations that really shouldn’t be in
your data set. Plot your data before running regressions!

24 / 28
The Sampling Distribution of the OLS Estimator (SW Section 4.5)

The OLS estimator is computed from a sample of data. A different sample


yields a different value of βb1 . This is the source of the “sampling uncertainty”
of βb1 . We want to:
▶ quantify the sampling uncertainty associated with
▶ use βb1 to test hypotheses such as β1 = 0
▶ construct a confidence interval for β1
▶ All these require figuring out the sampling distribution of the OLS
estimator.

25 / 28
Sampling Distribution of βb1

▶ We can show that βb1 is unbiased, i.e., E[βb1 ] = β1 . Similarly for βb0 .
▶ We do not derive V (βb1 ), as it requires some tedious algebra. Moreover,
we do not need to memorize the formula of it. Here, we just emphasize
two aspects of V (βb1 ).
▶ First, V (βb1 ) is inversely proportional to n, just like V (Y n ). Combining
p
E[βb1 ] = β1 , it is then suggested that βb1 −→ β1 , i.e., βb1 is consistent.
That is, as sample size grows, β1 gets closer to β1 .
b
▶ Second, V (βb1 ) is inversely proportional to the variance of X ; see the
graphs below.

26 / 28
Sampling Distribution of βb1

Low x variation High x variation


⇒ low precision ⇒ high precision

▶ Intuitively, if there is more variation in X , then there is more information


in the data that you can use to fit the regression line.

27 / 28
Sampling Distribution of βb1

▶ The exact sampling distribution is complicated – it depends on the


population distribution of (Y , X ) – but when n is large we get some
simple (and good) approximations:
▶ Let SE(βb1 ) be the standard error (SE) of βb1 , i.e., a consistent estimator
q
for the standard deviation of βb1 which is V (βb1 )
▶ Then, it turns out that

βb1 − β1 approx
∼ N (0, 1)
SE(βb1 )
▶ Using this approximate distribution, we can conduct statistical inference
on βb1 , i.e., hypothesis testing, confidence interval ⇒ Ch5.

28 / 28

You might also like