0% found this document useful (0 votes)
16 views

STAT 445-Lecture 1_2021

Uploaded by

manuelkendricks1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

STAT 445-Lecture 1_2021

Uploaded by

manuelkendricks1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

STAT 445

REGRESSION ANALYSIS
Dr. Godwin Debrah
Linear Regression
Introduction
• Regression analysis is a statistical technique for investigating and
modeling the relationship between variables

• Regression Analysis is applied in many fields including Engineering,


physical and chemical sciences and very widely in social sciences such
as Economics

• A researcher who wants to study the effect of fertilizer on soybean


yield for example, can employ regression analysis
Introduction Contd.
• As we try to explain one variable, say 𝑦𝑦, in terms of another variable,
say 𝑥𝑥, we must confront 3 issues

First, since there is never an exact relationship between two


variables, how do we allow for other factors to affect 𝑦𝑦?

Second, what is the functional relationship between 𝑦𝑦 and 𝑥𝑥?

Third, how can we be sure we are capturing a ceteris paribus


relationship between 𝑦𝑦 and 𝑥𝑥 (if that is the desired goal)?
Introduction Contd.
• We can resolve these issues by plotting some data that may be
available, and write down an equation relating 𝑦𝑦 to 𝑥𝑥. The scatter
diagram shows delivery time (𝑦𝑦) against delivery volume (𝑥𝑥) of coca
cola trucks.
Introduction Contd.
• A simple equation to capture the relationship we see from the scatter
diagram, can be written as follows

𝑦𝑦 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥 + 𝑢𝑢 ------------------(1)

• Equation (1) which is assumed to hold in the population of interest,


defines the simple linear regression model

• It is also called the two-variable linear regression model or bivariate


linear regression model, because it relates the two variables 𝑥𝑥 and 𝑦𝑦
The Simple Regression Model
• Definition of the simple linear regression model
• The parameters 𝛽𝛽0 and 𝛽𝛽1 are often called regression coefficents

Intercept Slope parameter

𝑦𝑦 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥 + 𝑢𝑢

Dependent variable,
explained variable, Independent variable, Error term,
response variable, explanatory variable, disturbance,
Predicted variable Control variable, unobservables,…
regressand Predictor variable
regressor,…
The Simple Regression Model
• The errors are assumed to have mean zero and unknown
variance 𝜎𝜎 2

• We usually assume that the errors are uncorrelated.


• It is convenient to view the regressor 𝑥𝑥 as controlled by the data
analyst and measured with negligible error, while the response 𝑦𝑦
is a random variable.
That is, there is a probability distribution for 𝑦𝑦 at each possible
value for 𝑥𝑥
The Simple Regression Model
• Interpretation of the simple linear regression model
“Studies how y varies with changes in x :”
∆𝑦𝑦 ∆𝑢𝑢
= 𝛽𝛽1 as long as =0
∆𝑥𝑥 ∆𝑥𝑥
Interpretation only correct if all other
things remain equal when the indepen-
dent variable is increased by one unit
By how much does the dependent
variable change if the independent
variable is increased by one unit?

• The simple linear regression model is rarely applicable in practice but


its discussion is useful for pedagogical reasons
The Simple Regression Model
• Example: Soybean yield and fertilizer
𝑦𝑦𝑦𝑦𝑦𝑦𝑦𝑦𝑦𝑦 = 𝛽𝛽0 + 𝛽𝛽1 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 + 𝑢𝑢

Rainfall,
Measures the effect of fertilizer on
land quality,
yield, holding all other factors fixed
presence of parasites, …

• Example: A simple wage equation


𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 = 𝛽𝛽0 + 𝛽𝛽1 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 + 𝑢𝑢
Labor force experience,
Measures the change in hourly wage tenure with current employer,
given another year of education, work ethic, intelligence, …
holding all other factors fixed
The Simple Regression Model
• A few things to note
Regression equations are valid only over the region of the regressor
variables contained in the observed data

A regression model does not imply a cause - and - effect relationship between the
variables even though a strong empirical relationship may exist

To establish causality, the relationship between the regressors and the response
variable must have a basis outside the sample data — for example, the
relationship may be suggested by theoretical considerations.
Regression analysis can aid in confirming a cause - and - effect relationship, but it
cannot be the sole basis of such a claim
The Simple Regression Model
• Population regression function (PFR)
• Recall that there is a probability distribution for 𝑦𝑦 at each possible value
for 𝑥𝑥. The mean of this distribution is

𝐸𝐸 𝑦𝑦 𝑥𝑥 = 𝐸𝐸 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥 + 𝑢𝑢 𝑥𝑥
= 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥 + 𝐸𝐸 𝑢𝑢 𝑥𝑥
= 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥
provided we assume 𝐸𝐸 𝑢𝑢 𝑥𝑥 =0
• This means that the average value of the dependent variable can be
expressed as a linear function of the explanatory variable
The Simple Regression Model

Population regression function

For individuals with 𝑥𝑥 = 𝑥𝑥2


the average value of
𝑦𝑦 𝑖𝑖𝑖𝑖 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥2
The Simple Regression Model
• Deriving the Ordinary Least Squares estimates
• In order to estimate the regression model one needs data
• A random sample of n observations
𝑥𝑥1 , 𝑦𝑦1 First observation
𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 ; 𝑖𝑖 = 1, … , 𝑛𝑛
𝑥𝑥2 , 𝑦𝑦2 Second observation

𝑥𝑥3 , 𝑦𝑦3 Third observation


Value of the dependent
Value of the expla- variable of the i-th ob-
natory variable of servation
the i-th observation
𝑥𝑥𝑛𝑛 𝑦𝑦𝑛𝑛 n-th observation
The Simple Regression Model

• Fit as good as possible a regression line through the data points:

For example, the i-th Fitted regression line


data point 𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖
The Simple Regression Model

• The Least Squares criterion:


• From the sample regression model, we minimize sum of squred
residuals.
• Note:
Residuals are not the same as Errors
Population regression model: 𝑦𝑦 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥 + 𝑢𝑢
Sample regression model: 𝑦𝑦𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥𝑖𝑖 + 𝑢𝑢𝑖𝑖 , 𝑖𝑖 = 1,2, … , 𝑛𝑛
The Simple Regression Model
The Least Squares criterion
• Regression residuals 𝑢𝑢� 𝑖𝑖 = 𝑦𝑦𝑖𝑖 − 𝑦𝑦�𝑖𝑖 = 𝑦𝑦𝑖𝑖 − 𝛽𝛽̂0 − 𝛽𝛽̂1 𝑥𝑥𝑖𝑖

• Minimize sum of squared residuals

𝑛𝑛

min � 𝑢𝑢� 𝑖𝑖2 → 𝛽𝛽̂0 , 𝛽𝛽̂1


𝑖𝑖=1
The Simple Regression Model—Least Squares Criterion
𝑛𝑛
2
𝑆𝑆 𝛽𝛽0 , 𝛽𝛽1 = � 𝑦𝑦𝑖𝑖 − 𝛽𝛽0 − 𝛽𝛽1 𝑥𝑥𝑖𝑖
𝑖𝑖=1
The least-squares estimators of 𝛽𝛽0 and 𝛽𝛽1 , say 𝛽𝛽̂0 and 𝛽𝛽̂1 , must satisfy
𝑛𝑛
𝜕𝜕𝜕𝜕
� ̂ , ̂ = −2 � 𝑦𝑦𝑖𝑖 − 𝛽𝛽̂0 − 𝛽𝛽̂1 𝑥𝑥𝑖𝑖 =0
𝜕𝜕𝛽𝛽0 𝛽𝛽0 𝛽𝛽1
𝑖𝑖=1
and
𝑛𝑛
𝜕𝜕𝜕𝜕
� ̂ , ̂ = −2 � 𝑦𝑦𝑖𝑖 − 𝛽𝛽̂0 − 𝛽𝛽̂1 𝑥𝑥𝑖𝑖 𝑥𝑥𝑖𝑖 = 0
𝜕𝜕𝛽𝛽1 𝛽𝛽0 𝛽𝛽1
𝑖𝑖=1

Simplifying these two equations yields


The Simple Regression Model—Least Squares
Criterion
• These are called the least-squares normal equations
𝑛𝑛 𝑛𝑛

𝑛𝑛𝛽𝛽̂0 + 𝛽𝛽̂1 � 𝑥𝑥𝑖𝑖 = � 𝑦𝑦𝑖𝑖


𝑖𝑖=1 𝑖𝑖=1

𝑛𝑛 𝑛𝑛 𝑛𝑛

𝛽𝛽̂0 � 𝑥𝑥𝑖𝑖 + 𝛽𝛽̂1 � 𝑥𝑥𝑖𝑖2 = � 𝑦𝑦𝑖𝑖 𝑥𝑥𝑖𝑖


𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1
The Simple Regression Model—Least Squares
Criterion
• The solution to the least-squares normal equations is

∑𝑛𝑛𝑖𝑖=1(𝑥𝑥𝑖𝑖 − 𝑥𝑥)(𝑦𝑦
̅ 𝑖𝑖 − 𝑦𝑦)

𝛽𝛽̂1 = 𝑛𝑛 2
, 𝛽𝛽̂0 = 𝑦𝑦� − 𝛽𝛽̂1 𝑥𝑥̅
∑𝑖𝑖=1(𝑥𝑥𝑖𝑖 − 𝑥𝑥)̅

𝑆𝑆𝑥𝑥𝑥𝑥
̂
• We often conveniently write 𝛽𝛽1 = or 𝛽𝛽̂1 = ∑𝑛𝑛𝑖𝑖=1 𝑐𝑐𝑖𝑖 𝑦𝑦𝑖𝑖 where
𝑆𝑆𝑥𝑥𝑥𝑥
2
• 𝑆𝑆𝑥𝑥𝑥𝑥 = ∑𝑛𝑛 ̅ and 𝑆𝑆𝑥𝑥𝑥𝑥 =
𝑖𝑖=1(𝑥𝑥𝑖𝑖 − 𝑥𝑥) ∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖 − 𝑥𝑥̅ 𝑦𝑦𝑖𝑖 − 𝑦𝑦� = ∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖 − 𝑥𝑥̅ 𝑦𝑦𝑖𝑖 and
𝑥𝑥𝑖𝑖 −𝑥𝑥̅
𝑐𝑐𝑖𝑖 =
𝑆𝑆𝑥𝑥𝑥𝑥
�𝑦𝑦
𝜎𝜎
̂
Using simple algebra, we can also write 𝛽𝛽1 = 𝜌𝜌�𝑥𝑥𝑥𝑥 ( � ) where 𝜌𝜌�𝑥𝑥𝑥𝑥 is sample
𝜎𝜎𝑥𝑥
correlation between 𝑥𝑥 and 𝑦𝑦 and 𝜎𝜎�𝑥𝑥 , 𝜎𝜎�𝑦𝑦 denote sample standard deviations
Properties of the Least - Squares Estimators
and the Fitted Regression Model
• The sum of the residuals,�𝑢𝑢𝑖𝑖 in any regression model that contains an intercept is
always zero. This property follows directly from the first normal equation

• The sum of the observed values 𝑦𝑦𝑖𝑖 is equal to the sum of the fitted values, 𝑦𝑦�𝑖𝑖

• The least-squares regression line always passes through the centroid (𝑥𝑥,̅ 𝑦𝑦)
� of the
data
• The sum of the residuals weighted by the corresponding value of the regressor
variable always equals zero that is ∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖 𝑢𝑢� 𝑖𝑖 = 0

• The sum of the residuals weighted by the corresponding fitted value always
equals zero, that is ∑𝑛𝑛𝑖𝑖=1 𝑢𝑢� 𝑖𝑖 𝑦𝑦�𝑖𝑖 = 0
The Simple Regression Model
• CEO Salary and return on equity

𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 = 𝛽𝛽0 + 𝛽𝛽1 𝑟𝑟𝑟𝑟𝑟𝑟 + 𝑢𝑢


Salary in thousands of Gh cedis Average return on equity of the CEO‘s firm

• Fitted regression
� = 963.191 + 18.501𝑟𝑟𝑟𝑟𝑟𝑟
𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠

If the return on equity increases by 1 percentage point,


Intercept
then salary is predicted to change by ₵18,501
The Simple Regression Model

Fitted regression line


(depends on sample)

Unknown population regression line


The Simple Regression Model
• Wage and education
𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 = 𝛽𝛽0 + 𝛽𝛽1 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 + 𝑢𝑢

Hourly wage in cedis Years of education

• Fitted regression

𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤
� = 0.90 + 0.54𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒

In the sample, one more year of education is


Intercept
associated with an increase in hourly wage by ₵0.54
The Simple Regression Model
• Voting outcomes and campaign expenditures (two parties)
𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣 = 𝛽𝛽0 + 𝛽𝛽1 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 + 𝑢𝑢

Percentage of vote for candidate A


Percentage of campaign expenditures candidate A

• Fitted regression
� = 26.81 + 0.464𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠
𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣

Intercept If candidate A‘s share of spending increases by one


percentage point, he or she receives 0.464 percen-
tage points more of the total vote
The Simple Regression Model
• Expected values and variances of the OLS estimators
• The estimated regression coefficients are random variables because they are
calculated from a random sample

∑𝑛𝑛𝑖𝑖=1(𝑥𝑥𝑖𝑖 − 𝑥𝑥)(𝑦𝑦
̅ 𝑖𝑖 − 𝑦𝑦)

̂
𝛽𝛽1 = , ̂0 = 𝑦𝑦� − 𝛽𝛽̂1 𝑥𝑥̅
𝛽𝛽
𝑛𝑛
∑𝑖𝑖=1(𝑥𝑥𝑖𝑖 − 𝑥𝑥)̅ 2

Data is random and depends on particular sample that has been drawn

• The question is what the estimators will estimate on average and how large their
variability in repeated samples is

𝐸𝐸 𝛽𝛽̂0 = ? , 𝐸𝐸 𝛽𝛽̂1 = ? 𝑉𝑉𝑉𝑉𝑉𝑉 𝛽𝛽̂0 = ? , 𝑉𝑉𝑉𝑉𝑉𝑉 𝛽𝛽̂1 = ?


The Simple Regression Model

• Standard assumptions for linear regression model


• Assumption SLR.1 (Linear in parameters)
𝑦𝑦 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥 + 𝑢𝑢 the population model is linear in parameters
The Simple Regression Model

• Assumptions for the linear regression model (cont.)


• Assumption SLR.2 (Sample variation in the explanatory variable)
𝑛𝑛
The values of the explanatory variables are not all
�(𝑥𝑥𝑖𝑖 − 𝑥𝑥)̅ 2 > 0 the same (otherwise it would be impossible to stu-
dy how different values of the explanatory variable
𝑖𝑖=1 lead to different values of the dependent variable)

• Assumption SLR.3 (Zero conditional mean)


𝐸𝐸 𝑢𝑢𝑖𝑖 𝑥𝑥𝑖𝑖 = 0 The value of the explanatory variable must
contain no information about the mean of the
unobserved factors
The Simple Regression Model

• (Unbiasedness of OLS)
𝑆𝑆𝑆𝑆𝑆𝑆. 1 − 𝑆𝑆𝑆𝑆𝑆𝑆. 3 ⇒ 𝐸𝐸 𝛽𝛽̂0 = 𝛽𝛽0 , 𝐸𝐸 𝛽𝛽̂1 = 𝛽𝛽1

• Interpretation of unbiasedness
• The estimated coefficients may be smaller or larger, depending on the sample
that is the result of a random draw
• However, on average, they will be equal to the values that characterize the
true relationship between 𝑦𝑦 and 𝑥𝑥 in the population
• “On average” means if sampling was repeated, i.e. if drawing the random
sample and doing the estimation was repeated many times
• In a given sample, estimates may differ considerably from true values
Properties of the Least - Squares Estimators
and the Fitted Regression Model
𝑥𝑥𝑖𝑖 −𝑥𝑥̅
• Recall we can write 𝛽𝛽̂1 = ∑𝑛𝑛𝑖𝑖=1 𝑐𝑐𝑖𝑖 𝑦𝑦𝑖𝑖 where 𝑐𝑐𝑖𝑖 =
𝑆𝑆𝑥𝑥𝑥𝑥
and 𝛽𝛽̂0 = 𝑦𝑦� − 𝛽𝛽̂1 𝑥𝑥̅
• Thus the Least-Squares estimators are linear combinations of the
observations 𝑦𝑦𝑖𝑖
• The least-squares estimators of 𝛽𝛽0 and 𝛽𝛽1 are UNBIASED ESTIMATORS

• To show these desirable properties, we first need to note the following;


We have assumed 𝐸𝐸(𝑢𝑢𝑖𝑖 ) = 0, 𝑣𝑣𝑣𝑣𝑣𝑣 𝑢𝑢𝑖𝑖 = 𝑣𝑣𝑣𝑣𝑣𝑣 𝑦𝑦𝑖𝑖 = 𝜎𝜎 2 and that 𝑢𝑢𝑖𝑖 ’s and hence
𝑦𝑦𝑖𝑖′ s are uncorrelated
Finally, it can be shown that; ∑𝑛𝑛𝑖𝑖=1 𝑐𝑐𝑖𝑖 𝑥𝑥𝑖𝑖 = 1 and ∑𝑛𝑛𝑖𝑖=1 𝑐𝑐𝑖𝑖 = 0
Properties of the Least - Squares Estimators
• Showing that E(𝛽𝛽̂0 )=𝛽𝛽0 is left as an exercise

𝑛𝑛 𝑛𝑛

𝐸𝐸 𝛽𝛽̂1 = 𝐸𝐸(� 𝑐𝑐𝑖𝑖 𝑦𝑦𝑖𝑖 ) = � 𝑐𝑐𝑖𝑖 𝐸𝐸 𝑦𝑦𝑖𝑖


𝑖𝑖=1 𝑖𝑖=1

𝑛𝑛 𝑛𝑛 𝑛𝑛

= � 𝑐𝑐𝑖𝑖 (𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥𝑖𝑖 ) = 𝛽𝛽0 � 𝑐𝑐𝑖𝑖 + 𝛽𝛽1 � 𝑐𝑐𝑖𝑖 𝑥𝑥𝑖𝑖


𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1

⇒ 𝐸𝐸 𝛽𝛽̂1 = 𝛽𝛽1
The Simple Regression Model
• Variances of the OLS estimators
• Depending on the sample, the estimates will be nearer or farther away from
the true population values
• How far can we expect our estimates to be away from the true population
values on average (= sampling variability)?
• Sampling variability is measured by the estimator‘s variances
𝑣𝑣𝑣𝑣𝑣𝑣 𝛽𝛽̂1 , 𝑣𝑣𝑣𝑣𝑣𝑣 𝛽𝛽̂0

• Assumption SLR.4 (Homoskedasticity)


𝑉𝑉𝑉𝑉𝑉𝑉 𝑢𝑢𝑖𝑖 𝑥𝑥𝑖𝑖 = 𝜎𝜎 2 The value of the explanatory variable must
contain no information about the variability of
the unobserved factors
The Simple Regression Model

• Graphical illustration of homoskedasticity

The variability of the


unobserved
influences does not depend on
the value of the explanatory
variable
The Simple Regression Model
• An example for heteroskedasticity: Wage and education

The variance of the unobserved


determinants of wages
increases
with the level of education
Gauss- Markov Theorem
• Assumptions SLR.1 – SLR.4 are called Gauss-Markove assumptions for
simple linear regression.

• Under assumptions SLR.1 – SLR.4, Gauss-Markov theorem states that,


the least-squares estimators are unbiased and have minimum
variance when compared with all other unbiased estimators that are
linear combinations of 𝑦𝑦𝑖𝑖 .
• We often say the least-squares estimators are best linear unbiased
estimators (BLUE), where “best” implies minimum variance
The Simple Regression Model
• (Variances of the OLS estimators)
Under assumptions SLR.1 – SLR.6:
𝜎𝜎 2 𝜎𝜎 2
𝑉𝑉𝑉𝑉𝑉𝑉 𝛽𝛽̂1 = 𝑛𝑛 =
∑𝑖𝑖=1(𝑥𝑥𝑖𝑖 − 𝑥𝑥)̅ 2 𝑆𝑆𝑆𝑆𝑆𝑆𝑥𝑥

𝜎𝜎 2 𝑛𝑛−1 ∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖2 𝜎𝜎 2 𝑛𝑛−1 ∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖2


𝑉𝑉𝑉𝑉𝑉𝑉 𝛽𝛽̂0 = 𝑛𝑛 =
∑𝑖𝑖=1(𝑥𝑥𝑖𝑖 − 𝑥𝑥)̅ 2 𝑆𝑆𝑆𝑆𝑆𝑆𝑥𝑥

• Conclusion:
• The sampling variability of the estimated regression coefficients will be the
higher, the larger the variability of the unobserved factors, and the lower, the
higher the variation in the explanatory variable
Variances of the OLS estimators
𝑛𝑛 𝑛𝑛

𝑉𝑉𝑉𝑉𝑉𝑉 𝛽𝛽̂1 = 𝑉𝑉𝑉𝑉𝑉𝑉 � 𝑐𝑐𝑖𝑖 𝑦𝑦𝑖𝑖 = � 𝑐𝑐𝑖𝑖2 𝑉𝑉𝑉𝑉𝑉𝑉 𝑦𝑦𝑖𝑖


𝑛𝑛 𝑖𝑖=1 𝑖𝑖=1
𝑛𝑛
𝜎𝜎 ∑𝑖𝑖=1(𝑥𝑥𝑖𝑖 − 𝑥𝑥)̅ 2
2 𝜎𝜎 2
= 𝜎𝜎 2 � 𝑐𝑐𝑖𝑖2 = 2 =
𝑆𝑆𝑥𝑥𝑥𝑥 𝑆𝑆𝑥𝑥𝑥𝑥
𝑖𝑖=1

𝑉𝑉𝑉𝑉𝑉𝑉 𝛽𝛽̂0 = 𝑉𝑉𝑉𝑉𝑉𝑉 𝑦𝑦� − 𝛽𝛽̂1 𝑥𝑥̅


= 𝑉𝑉𝑉𝑉𝑉𝑉 𝑦𝑦� + 𝑥𝑥̅ 2 𝑉𝑉𝑉𝑉𝑉𝑉 𝛽𝛽̂1 − 2𝑥𝑥𝐶𝐶𝐶𝐶𝐶𝐶(
̅ � 𝛽𝛽̂1 )
𝑦𝑦,

1 𝑥𝑥̅ 2
𝑉𝑉𝑉𝑉𝑉𝑉 𝛽𝛽̂0 = 𝑉𝑉𝑉𝑉𝑉𝑉 𝑦𝑦� + 𝑥𝑥̅ 2 𝑉𝑉𝑉𝑉𝑉𝑉 𝛽𝛽̂1 = 𝜎𝜎 2 +
𝑛𝑛 𝑆𝑆𝑥𝑥𝑥𝑥
The Simple Regression Model
• Estimating the error variance
The variance of u does not depend on x, i.e. equal to
• 𝑉𝑉𝑉𝑉𝑉𝑉 𝑢𝑢𝑖𝑖 𝑥𝑥𝑖𝑖 = 𝜎𝜎 2 = 𝑉𝑉𝑉𝑉𝑉𝑉 𝑢𝑢𝑖𝑖 the unconditional variance

1 𝑛𝑛 1 𝑛𝑛
• 𝜎𝜎� 2 = ∑ (𝑢𝑢� − 𝑢𝑢� 𝑖𝑖 ) = ∑𝑖𝑖=1 𝑢𝑢� 𝑖𝑖2
2
𝑛𝑛 𝑖𝑖=1 𝑖𝑖
One could estimate the variance of the
𝑛𝑛 errors by calculating the variance of the
residuals in the sample; unfortunately this
estimate would be biased
𝑛𝑛
1
2
𝜎𝜎� = � 𝑢𝑢� 𝑖𝑖2
𝑛𝑛 − 2
𝑖𝑖=1
An unbiased estimate of the error variance can be
obtained by substracting the number of estimated
regression coefficients
from the number of observations
The Simple Regression Model
• (Unbiasedness of the error variance)
𝑆𝑆𝑆𝑆𝑆𝑆. 1 − 𝑆𝑆𝑆𝑆𝑆𝑆. 4 ⇒ 𝐸𝐸 𝜎𝜎� 2 = 𝜎𝜎 2
• Calculation of standard errors for regression coefficients
Plug in for
𝑠𝑠𝑠𝑠 𝛽𝛽̂1 = 𝑉𝑉𝑉𝑉𝑉𝑉 𝛽𝛽̂1 = 𝜎𝜎� 2 ⁄𝑆𝑆𝑆𝑆𝑆𝑆𝑥𝑥 the unknown

𝑛𝑛

𝑠𝑠𝑠𝑠 𝛽𝛽̂0 = 𝑉𝑉𝑉𝑉𝑉𝑉 𝛽𝛽̂0 = 𝜎𝜎� 2 𝑛𝑛−1 � 𝑥𝑥𝑖𝑖2 �𝑆𝑆𝑆𝑆𝑆𝑆𝑥𝑥


𝑖𝑖=1

The estimated standard deviations of the regression coefficients are called “standard errors.”
They measure how precisely the regression coefficients are estimated.
The Simple Regression Model
• Goodness-of-Fit
“How well does the explanatory variable explain the dependent variable?”

• Measures of Variation
𝑛𝑛 𝑛𝑛 𝑛𝑛

𝑆𝑆𝑆𝑆𝑆𝑆 = � 𝑦𝑦𝑖𝑖 − 𝑦𝑦� 2 𝑆𝑆𝑆𝑆𝑅𝑅 = � 𝑦𝑦�𝑖𝑖 − 𝑦𝑦� 2 𝑆𝑆𝑆𝑆𝐸𝐸 = �(𝑦𝑦𝑖𝑖 − 𝑦𝑦�𝑖𝑖 )2


𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1

Total sum of squares, regression sum of squares, Residual sum of squares,


represents total variation represents variation represents variation not
in the dependent variable explained by regression explained by regression
The Simple Regression Model
• Decomposition of total variation
𝑆𝑆𝑆𝑆𝑆𝑆 = 𝑆𝑆𝑆𝑆R + 𝑆𝑆𝑆𝑆𝐸𝐸

Total variation Explained part Unexplained part

• Goodness-of-fit measure (R-squared)

2
𝑆𝑆𝑆𝑆𝑅𝑅 𝑆𝑆𝑆𝑆𝐸𝐸
𝑅𝑅 = =1−
𝑆𝑆𝑆𝑆𝑆𝑆 𝑆𝑆𝑆𝑆𝑆𝑆 R-squared measures the fraction of the
total variation that is explained by the
regression
The Simple Regression Model
• CEO Salary and return on equity
� = 963.191 + 18.501𝑟𝑟𝑟𝑟𝑟𝑟
𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 The regression explains only 1.3%
of the total variation in salaries

𝑛𝑛 = 209, 𝑅𝑅2 = 0.0132


• Voting outcomes and campaign expenditures
� = 26.81 + 0.464𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠
𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣
The regression explains 85.6% of the total
variation in election outcomes
𝑛𝑛 = 173, 𝑅𝑅2 = 0.856
• Caution: A high R-squared does not necessarily
mean that the regression has a causal
interpretation!
The Simple Regression Model

• Regression through the origin


• There are certain relationships for which this is reasonable.
• For example, if income is zero, then income tax revenues must also be
zero.
• In addition, there are settings where a model that originally has a
nonzero intercept is transformed into a model without an intercept
• You will be asked to work through this on problem set 1.

You might also like