0% found this document useful (0 votes)

16 views

STAT 445-Lecture 1_2021

Uploaded by

manuelkendricks1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views

STAT 445-Lecture 1_2021

Uploaded by

manuelkendricks1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

STAT 445

REGRESSION ANALYSIS
Dr. Godwin Debrah
Linear Regression
Introduction
• Regression analysis is a statistical technique for investigating and
modeling the relationship between variables

• Regression Analysis is applied in many fields including Engineering,

physical and chemical sciences and very widely in social sciences such
as Economics

• A researcher who wants to study the effect of fertilizer on soybean

yield for example, can employ regression analysis
Introduction Contd.
• As we try to explain one variable, say 𝑦𝑦, in terms of another variable,
say 𝑥𝑥, we must confront 3 issues

First, since there is never an exact relationship between two

variables, how do we allow for other factors to affect 𝑦𝑦?

Second, what is the functional relationship between 𝑦𝑦 and 𝑥𝑥?

Third, how can we be sure we are capturing a ceteris paribus

relationship between 𝑦𝑦 and 𝑥𝑥 (if that is the desired goal)?
Introduction Contd.
• We can resolve these issues by plotting some data that may be
available, and write down an equation relating 𝑦𝑦 to 𝑥𝑥. The scatter
diagram shows delivery time (𝑦𝑦) against delivery volume (𝑥𝑥) of coca
cola trucks.
Introduction Contd.
• A simple equation to capture the relationship we see from the scatter
diagram, can be written as follows

𝑦𝑦 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥 + 𝑢𝑢 ------------------(1)

• Equation (1) which is assumed to hold in the population of interest,

defines the simple linear regression model

• It is also called the two-variable linear regression model or bivariate

linear regression model, because it relates the two variables 𝑥𝑥 and 𝑦𝑦
The Simple Regression Model
• Definition of the simple linear regression model
• The parameters 𝛽𝛽0 and 𝛽𝛽1 are often called regression coefficents

Intercept Slope parameter

𝑦𝑦 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥 + 𝑢𝑢

Dependent variable,
explained variable, Independent variable, Error term,
response variable, explanatory variable, disturbance,
Predicted variable Control variable, unobservables,…
regressand Predictor variable
regressor,…
The Simple Regression Model
• The errors are assumed to have mean zero and unknown
variance 𝜎𝜎 2

• We usually assume that the errors are uncorrelated.

• It is convenient to view the regressor 𝑥𝑥 as controlled by the data
analyst and measured with negligible error, while the response 𝑦𝑦
is a random variable.
That is, there is a probability distribution for 𝑦𝑦 at each possible
value for 𝑥𝑥
The Simple Regression Model
• Interpretation of the simple linear regression model
“Studies how y varies with changes in x :”
∆𝑦𝑦 ∆𝑢𝑢
= 𝛽𝛽1 as long as =0
∆𝑥𝑥 ∆𝑥𝑥
Interpretation only correct if all other
things remain equal when the indepen-
dent variable is increased by one unit
By how much does the dependent
variable change if the independent
variable is increased by one unit?

• The simple linear regression model is rarely applicable in practice but

its discussion is useful for pedagogical reasons
The Simple Regression Model
• Example: Soybean yield and fertilizer
𝑦𝑦𝑦𝑦𝑦𝑦𝑦𝑦𝑦𝑦 = 𝛽𝛽0 + 𝛽𝛽1 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 + 𝑢𝑢

Rainfall,
Measures the effect of fertilizer on
land quality,
yield, holding all other factors fixed
presence of parasites, …

• Example: A simple wage equation

𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 = 𝛽𝛽0 + 𝛽𝛽1 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 + 𝑢𝑢
Labor force experience,
Measures the change in hourly wage tenure with current employer,
given another year of education, work ethic, intelligence, …
holding all other factors fixed
The Simple Regression Model
• A few things to note
Regression equations are valid only over the region of the regressor
variables contained in the observed data

A regression model does not imply a cause - and - effect relationship between the
variables even though a strong empirical relationship may exist

To establish causality, the relationship between the regressors and the response
variable must have a basis outside the sample data — for example, the
relationship may be suggested by theoretical considerations.
Regression analysis can aid in confirming a cause - and - effect relationship, but it
cannot be the sole basis of such a claim
The Simple Regression Model
• Population regression function (PFR)
• Recall that there is a probability distribution for 𝑦𝑦 at each possible value
for 𝑥𝑥. The mean of this distribution is

𝐸𝐸 𝑦𝑦 𝑥𝑥 = 𝐸𝐸 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥 + 𝑢𝑢 𝑥𝑥
= 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥 + 𝐸𝐸 𝑢𝑢 𝑥𝑥
= 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥
provided we assume 𝐸𝐸 𝑢𝑢 𝑥𝑥 =0
• This means that the average value of the dependent variable can be
expressed as a linear function of the explanatory variable
The Simple Regression Model

Population regression function

For individuals with 𝑥𝑥 = 𝑥𝑥2

the average value of
𝑦𝑦 𝑖𝑖𝑖𝑖 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥2
The Simple Regression Model
• Deriving the Ordinary Least Squares estimates
• In order to estimate the regression model one needs data
• A random sample of n observations
𝑥𝑥1 , 𝑦𝑦1 First observation
𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 ; 𝑖𝑖 = 1, … , 𝑛𝑛
𝑥𝑥2 , 𝑦𝑦2 Second observation

𝑥𝑥3 , 𝑦𝑦3 Third observation

Value of the dependent
Value of the expla- variable of the i-th ob-
natory variable of servation
the i-th observation
𝑥𝑥𝑛𝑛 𝑦𝑦𝑛𝑛 n-th observation
The Simple Regression Model

• Fit as good as possible a regression line through the data points:

For example, the i-th Fitted regression line

data point 𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖
The Simple Regression Model

• The Least Squares criterion:

• From the sample regression model, we minimize sum of squred
residuals.
• Note:
Residuals are not the same as Errors
Population regression model: 𝑦𝑦 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥 + 𝑢𝑢
Sample regression model: 𝑦𝑦𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥𝑖𝑖 + 𝑢𝑢𝑖𝑖 , 𝑖𝑖 = 1,2, … , 𝑛𝑛
The Simple Regression Model
The Least Squares criterion
• Regression residuals 𝑢𝑢� 𝑖𝑖 = 𝑦𝑦𝑖𝑖 − 𝑦𝑦�𝑖𝑖 = 𝑦𝑦𝑖𝑖 − 𝛽𝛽̂0 − 𝛽𝛽̂1 𝑥𝑥𝑖𝑖

• Minimize sum of squared residuals

𝑛𝑛

min � 𝑢𝑢� 𝑖𝑖2 → 𝛽𝛽̂0 , 𝛽𝛽̂1

𝑖𝑖=1
The Simple Regression Model—Least Squares Criterion
𝑛𝑛
2
𝑆𝑆 𝛽𝛽0 , 𝛽𝛽1 = � 𝑦𝑦𝑖𝑖 − 𝛽𝛽0 − 𝛽𝛽1 𝑥𝑥𝑖𝑖
𝑖𝑖=1
The least-squares estimators of 𝛽𝛽0 and 𝛽𝛽1 , say 𝛽𝛽̂0 and 𝛽𝛽̂1 , must satisfy
𝑛𝑛
𝜕𝜕𝜕𝜕
� ̂ , ̂ = −2 � 𝑦𝑦𝑖𝑖 − 𝛽𝛽̂0 − 𝛽𝛽̂1 𝑥𝑥𝑖𝑖 =0
𝜕𝜕𝛽𝛽0 𝛽𝛽0 𝛽𝛽1
𝑖𝑖=1
and
𝑛𝑛
𝜕𝜕𝜕𝜕
� ̂ , ̂ = −2 � 𝑦𝑦𝑖𝑖 − 𝛽𝛽̂0 − 𝛽𝛽̂1 𝑥𝑥𝑖𝑖 𝑥𝑥𝑖𝑖 = 0
𝜕𝜕𝛽𝛽1 𝛽𝛽0 𝛽𝛽1
𝑖𝑖=1

Simplifying these two equations yields

The Simple Regression Model—Least Squares
Criterion
• These are called the least-squares normal equations
𝑛𝑛 𝑛𝑛

𝑛𝑛𝛽𝛽̂0 + 𝛽𝛽̂1 � 𝑥𝑥𝑖𝑖 = � 𝑦𝑦𝑖𝑖

𝑖𝑖=1 𝑖𝑖=1

𝑛𝑛 𝑛𝑛 𝑛𝑛

𝛽𝛽̂0 � 𝑥𝑥𝑖𝑖 + 𝛽𝛽̂1 � 𝑥𝑥𝑖𝑖2 = � 𝑦𝑦𝑖𝑖 𝑥𝑥𝑖𝑖

𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1
The Simple Regression Model—Least Squares
Criterion
• The solution to the least-squares normal equations is

∑𝑛𝑛𝑖𝑖=1(𝑥𝑥𝑖𝑖 − 𝑥𝑥)(𝑦𝑦
̅ 𝑖𝑖 − 𝑦𝑦)
�
𝛽𝛽̂1 = 𝑛𝑛 2
, 𝛽𝛽̂0 = 𝑦𝑦� − 𝛽𝛽̂1 𝑥𝑥̅
∑𝑖𝑖=1(𝑥𝑥𝑖𝑖 − 𝑥𝑥)̅

𝑆𝑆𝑥𝑥𝑥𝑥
̂
• We often conveniently write 𝛽𝛽1 = or 𝛽𝛽̂1 = ∑𝑛𝑛𝑖𝑖=1 𝑐𝑐𝑖𝑖 𝑦𝑦𝑖𝑖 where
𝑆𝑆𝑥𝑥𝑥𝑥
2
• 𝑆𝑆𝑥𝑥𝑥𝑥 = ∑𝑛𝑛 ̅ and 𝑆𝑆𝑥𝑥𝑥𝑥 =
𝑖𝑖=1(𝑥𝑥𝑖𝑖 − 𝑥𝑥) ∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖 − 𝑥𝑥̅ 𝑦𝑦𝑖𝑖 − 𝑦𝑦� = ∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖 − 𝑥𝑥̅ 𝑦𝑦𝑖𝑖 and
𝑥𝑥𝑖𝑖 −𝑥𝑥̅
𝑐𝑐𝑖𝑖 =
𝑆𝑆𝑥𝑥𝑥𝑥
�𝑦𝑦
𝜎𝜎
̂
Using simple algebra, we can also write 𝛽𝛽1 = 𝜌𝜌�𝑥𝑥𝑥𝑥 ( � ) where 𝜌𝜌�𝑥𝑥𝑥𝑥 is sample
𝜎𝜎𝑥𝑥
correlation between 𝑥𝑥 and 𝑦𝑦 and 𝜎𝜎�𝑥𝑥 , 𝜎𝜎�𝑦𝑦 denote sample standard deviations
Properties of the Least - Squares Estimators
and the Fitted Regression Model
• The sum of the residuals,�𝑢𝑢𝑖𝑖 in any regression model that contains an intercept is
always zero. This property follows directly from the first normal equation

• The sum of the observed values 𝑦𝑦𝑖𝑖 is equal to the sum of the fitted values, 𝑦𝑦�𝑖𝑖

• The least-squares regression line always passes through the centroid (𝑥𝑥,̅ 𝑦𝑦)
� of the
data
• The sum of the residuals weighted by the corresponding value of the regressor
variable always equals zero that is ∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖 𝑢𝑢� 𝑖𝑖 = 0

• The sum of the residuals weighted by the corresponding fitted value always
equals zero, that is ∑𝑛𝑛𝑖𝑖=1 𝑢𝑢� 𝑖𝑖 𝑦𝑦�𝑖𝑖 = 0
The Simple Regression Model
• CEO Salary and return on equity

𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 = 𝛽𝛽0 + 𝛽𝛽1 𝑟𝑟𝑟𝑟𝑟𝑟 + 𝑢𝑢

Salary in thousands of Gh cedis Average return on equity of the CEO‘s firm

• Fitted regression
� = 963.191 + 18.501𝑟𝑟𝑟𝑟𝑟𝑟
𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠

If the return on equity increases by 1 percentage point,

Intercept
then salary is predicted to change by ₵18,501
The Simple Regression Model

Fitted regression line

(depends on sample)

Unknown population regression line

The Simple Regression Model
• Wage and education
𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 = 𝛽𝛽0 + 𝛽𝛽1 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 + 𝑢𝑢

Hourly wage in cedis Years of education

• Fitted regression

𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤
� = 0.90 + 0.54𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒

In the sample, one more year of education is

Intercept
associated with an increase in hourly wage by ₵0.54
The Simple Regression Model
• Voting outcomes and campaign expenditures (two parties)
𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣 = 𝛽𝛽0 + 𝛽𝛽1 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 + 𝑢𝑢

Percentage of vote for candidate A

Percentage of campaign expenditures candidate A

• Fitted regression
� = 26.81 + 0.464𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠
𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣

Intercept If candidate A‘s share of spending increases by one

percentage point, he or she receives 0.464 percen-
tage points more of the total vote
The Simple Regression Model
• Expected values and variances of the OLS estimators
• The estimated regression coefficients are random variables because they are
calculated from a random sample

∑𝑛𝑛𝑖𝑖=1(𝑥𝑥𝑖𝑖 − 𝑥𝑥)(𝑦𝑦
̅ 𝑖𝑖 − 𝑦𝑦)
�
̂
𝛽𝛽1 = , ̂0 = 𝑦𝑦� − 𝛽𝛽̂1 𝑥𝑥̅
𝛽𝛽
𝑛𝑛
∑𝑖𝑖=1(𝑥𝑥𝑖𝑖 − 𝑥𝑥)̅ 2

Data is random and depends on particular sample that has been drawn

• The question is what the estimators will estimate on average and how large their
variability in repeated samples is

𝐸𝐸 𝛽𝛽̂0 = ? , 𝐸𝐸 𝛽𝛽̂1 = ? 𝑉𝑉𝑉𝑉𝑉𝑉 𝛽𝛽̂0 = ? , 𝑉𝑉𝑉𝑉𝑉𝑉 𝛽𝛽̂1 = ?

The Simple Regression Model

• Standard assumptions for linear regression model

• Assumption SLR.1 (Linear in parameters)
𝑦𝑦 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥 + 𝑢𝑢 the population model is linear in parameters
The Simple Regression Model

• Assumptions for the linear regression model (cont.)

• Assumption SLR.2 (Sample variation in the explanatory variable)
𝑛𝑛
The values of the explanatory variables are not all
�(𝑥𝑥𝑖𝑖 − 𝑥𝑥)̅ 2 > 0 the same (otherwise it would be impossible to stu-
dy how different values of the explanatory variable
𝑖𝑖=1 lead to different values of the dependent variable)

• Assumption SLR.3 (Zero conditional mean)

𝐸𝐸 𝑢𝑢𝑖𝑖 𝑥𝑥𝑖𝑖 = 0 The value of the explanatory variable must
contain no information about the mean of the
unobserved factors
The Simple Regression Model

• (Unbiasedness of OLS)
𝑆𝑆𝑆𝑆𝑆𝑆. 1 − 𝑆𝑆𝑆𝑆𝑆𝑆. 3 ⇒ 𝐸𝐸 𝛽𝛽̂0 = 𝛽𝛽0 , 𝐸𝐸 𝛽𝛽̂1 = 𝛽𝛽1

• Interpretation of unbiasedness
• The estimated coefficients may be smaller or larger, depending on the sample
that is the result of a random draw
• However, on average, they will be equal to the values that characterize the
true relationship between 𝑦𝑦 and 𝑥𝑥 in the population
• “On average” means if sampling was repeated, i.e. if drawing the random
sample and doing the estimation was repeated many times
• In a given sample, estimates may differ considerably from true values
Properties of the Least - Squares Estimators
and the Fitted Regression Model
𝑥𝑥𝑖𝑖 −𝑥𝑥̅
• Recall we can write 𝛽𝛽̂1 = ∑𝑛𝑛𝑖𝑖=1 𝑐𝑐𝑖𝑖 𝑦𝑦𝑖𝑖 where 𝑐𝑐𝑖𝑖 =
𝑆𝑆𝑥𝑥𝑥𝑥
and 𝛽𝛽̂0 = 𝑦𝑦� − 𝛽𝛽̂1 𝑥𝑥̅
• Thus the Least-Squares estimators are linear combinations of the
observations 𝑦𝑦𝑖𝑖
• The least-squares estimators of 𝛽𝛽0 and 𝛽𝛽1 are UNBIASED ESTIMATORS

• To show these desirable properties, we first need to note the following;

We have assumed 𝐸𝐸(𝑢𝑢𝑖𝑖 ) = 0, 𝑣𝑣𝑣𝑣𝑣𝑣 𝑢𝑢𝑖𝑖 = 𝑣𝑣𝑣𝑣𝑣𝑣 𝑦𝑦𝑖𝑖 = 𝜎𝜎 2 and that 𝑢𝑢𝑖𝑖 ’s and hence
𝑦𝑦𝑖𝑖′ s are uncorrelated
Finally, it can be shown that; ∑𝑛𝑛𝑖𝑖=1 𝑐𝑐𝑖𝑖 𝑥𝑥𝑖𝑖 = 1 and ∑𝑛𝑛𝑖𝑖=1 𝑐𝑐𝑖𝑖 = 0
Properties of the Least - Squares Estimators
• Showing that E(𝛽𝛽̂0 )=𝛽𝛽0 is left as an exercise

𝑛𝑛 𝑛𝑛

𝐸𝐸 𝛽𝛽̂1 = 𝐸𝐸(� 𝑐𝑐𝑖𝑖 𝑦𝑦𝑖𝑖 ) = � 𝑐𝑐𝑖𝑖 𝐸𝐸 𝑦𝑦𝑖𝑖

𝑖𝑖=1 𝑖𝑖=1

𝑛𝑛 𝑛𝑛 𝑛𝑛

= � 𝑐𝑐𝑖𝑖 (𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥𝑖𝑖 ) = 𝛽𝛽0 � 𝑐𝑐𝑖𝑖 + 𝛽𝛽1 � 𝑐𝑐𝑖𝑖 𝑥𝑥𝑖𝑖

𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1

⇒ 𝐸𝐸 𝛽𝛽̂1 = 𝛽𝛽1
The Simple Regression Model
• Variances of the OLS estimators
• Depending on the sample, the estimates will be nearer or farther away from
the true population values
• How far can we expect our estimates to be away from the true population
values on average (= sampling variability)?
• Sampling variability is measured by the estimator‘s variances
𝑣𝑣𝑣𝑣𝑣𝑣 𝛽𝛽̂1 , 𝑣𝑣𝑣𝑣𝑣𝑣 𝛽𝛽̂0

• Assumption SLR.4 (Homoskedasticity)

𝑉𝑉𝑉𝑉𝑉𝑉 𝑢𝑢𝑖𝑖 𝑥𝑥𝑖𝑖 = 𝜎𝜎 2 The value of the explanatory variable must
contain no information about the variability of
the unobserved factors
The Simple Regression Model

• Graphical illustration of homoskedasticity

The variability of the

unobserved
influences does not depend on
the value of the explanatory
variable
The Simple Regression Model
• An example for heteroskedasticity: Wage and education

The variance of the unobserved

determinants of wages
increases
with the level of education
Gauss- Markov Theorem
• Assumptions SLR.1 – SLR.4 are called Gauss-Markove assumptions for
simple linear regression.

• Under assumptions SLR.1 – SLR.4, Gauss-Markov theorem states that,

the least-squares estimators are unbiased and have minimum
variance when compared with all other unbiased estimators that are
linear combinations of 𝑦𝑦𝑖𝑖 .
• We often say the least-squares estimators are best linear unbiased
estimators (BLUE), where “best” implies minimum variance
The Simple Regression Model
• (Variances of the OLS estimators)
Under assumptions SLR.1 – SLR.6:
𝜎𝜎 2 𝜎𝜎 2
𝑉𝑉𝑉𝑉𝑉𝑉 𝛽𝛽̂1 = 𝑛𝑛 =
∑𝑖𝑖=1(𝑥𝑥𝑖𝑖 − 𝑥𝑥)̅ 2 𝑆𝑆𝑆𝑆𝑆𝑆𝑥𝑥

𝜎𝜎 2 𝑛𝑛−1 ∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖2 𝜎𝜎 2 𝑛𝑛−1 ∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖2

𝑉𝑉𝑉𝑉𝑉𝑉 𝛽𝛽̂0 = 𝑛𝑛 =
∑𝑖𝑖=1(𝑥𝑥𝑖𝑖 − 𝑥𝑥)̅ 2 𝑆𝑆𝑆𝑆𝑆𝑆𝑥𝑥

• Conclusion:
• The sampling variability of the estimated regression coefficients will be the
higher, the larger the variability of the unobserved factors, and the lower, the
higher the variation in the explanatory variable
Variances of the OLS estimators
𝑛𝑛 𝑛𝑛

𝑉𝑉𝑉𝑉𝑉𝑉 𝛽𝛽̂1 = 𝑉𝑉𝑉𝑉𝑉𝑉 � 𝑐𝑐𝑖𝑖 𝑦𝑦𝑖𝑖 = � 𝑐𝑐𝑖𝑖2 𝑉𝑉𝑉𝑉𝑉𝑉 𝑦𝑦𝑖𝑖

𝑛𝑛 𝑖𝑖=1 𝑖𝑖=1
𝑛𝑛
𝜎𝜎 ∑𝑖𝑖=1(𝑥𝑥𝑖𝑖 − 𝑥𝑥)̅ 2
2 𝜎𝜎 2
= 𝜎𝜎 2 � 𝑐𝑐𝑖𝑖2 = 2 =
𝑆𝑆𝑥𝑥𝑥𝑥 𝑆𝑆𝑥𝑥𝑥𝑥
𝑖𝑖=1

𝑉𝑉𝑉𝑉𝑉𝑉 𝛽𝛽̂0 = 𝑉𝑉𝑉𝑉𝑉𝑉 𝑦𝑦� − 𝛽𝛽̂1 𝑥𝑥̅

= 𝑉𝑉𝑉𝑉𝑉𝑉 𝑦𝑦� + 𝑥𝑥̅ 2 𝑉𝑉𝑉𝑉𝑉𝑉 𝛽𝛽̂1 − 2𝑥𝑥𝐶𝐶𝐶𝐶𝐶𝐶(
̅ � 𝛽𝛽̂1 )
𝑦𝑦,

1 𝑥𝑥̅ 2
𝑉𝑉𝑉𝑉𝑉𝑉 𝛽𝛽̂0 = 𝑉𝑉𝑉𝑉𝑉𝑉 𝑦𝑦� + 𝑥𝑥̅ 2 𝑉𝑉𝑉𝑉𝑉𝑉 𝛽𝛽̂1 = 𝜎𝜎 2 +
𝑛𝑛 𝑆𝑆𝑥𝑥𝑥𝑥
The Simple Regression Model
• Estimating the error variance
The variance of u does not depend on x, i.e. equal to
• 𝑉𝑉𝑉𝑉𝑉𝑉 𝑢𝑢𝑖𝑖 𝑥𝑥𝑖𝑖 = 𝜎𝜎 2 = 𝑉𝑉𝑉𝑉𝑉𝑉 𝑢𝑢𝑖𝑖 the unconditional variance

1 𝑛𝑛 1 𝑛𝑛
• 𝜎𝜎� 2 = ∑ (𝑢𝑢� − 𝑢𝑢� 𝑖𝑖 ) = ∑𝑖𝑖=1 𝑢𝑢� 𝑖𝑖2
2
𝑛𝑛 𝑖𝑖=1 𝑖𝑖
One could estimate the variance of the
𝑛𝑛 errors by calculating the variance of the
residuals in the sample; unfortunately this
estimate would be biased
𝑛𝑛
1
2
𝜎𝜎� = � 𝑢𝑢� 𝑖𝑖2
𝑛𝑛 − 2
𝑖𝑖=1
An unbiased estimate of the error variance can be
obtained by substracting the number of estimated
regression coefficients
from the number of observations
The Simple Regression Model
• (Unbiasedness of the error variance)
𝑆𝑆𝑆𝑆𝑆𝑆. 1 − 𝑆𝑆𝑆𝑆𝑆𝑆. 4 ⇒ 𝐸𝐸 𝜎𝜎� 2 = 𝜎𝜎 2
• Calculation of standard errors for regression coefficients
Plug in for
𝑠𝑠𝑠𝑠 𝛽𝛽̂1 = 𝑉𝑉𝑉𝑉𝑉𝑉 𝛽𝛽̂1 = 𝜎𝜎� 2 ⁄𝑆𝑆𝑆𝑆𝑆𝑆𝑥𝑥 the unknown

𝑛𝑛

𝑠𝑠𝑠𝑠 𝛽𝛽̂0 = 𝑉𝑉𝑉𝑉𝑉𝑉 𝛽𝛽̂0 = 𝜎𝜎� 2 𝑛𝑛−1 � 𝑥𝑥𝑖𝑖2 �𝑆𝑆𝑆𝑆𝑆𝑆𝑥𝑥

𝑖𝑖=1

The estimated standard deviations of the regression coefficients are called “standard errors.”
They measure how precisely the regression coefficients are estimated.
The Simple Regression Model
• Goodness-of-Fit
“How well does the explanatory variable explain the dependent variable?”

• Measures of Variation
𝑛𝑛 𝑛𝑛 𝑛𝑛

𝑆𝑆𝑆𝑆𝑆𝑆 = � 𝑦𝑦𝑖𝑖 − 𝑦𝑦� 2 𝑆𝑆𝑆𝑆𝑅𝑅 = � 𝑦𝑦�𝑖𝑖 − 𝑦𝑦� 2 𝑆𝑆𝑆𝑆𝐸𝐸 = �(𝑦𝑦𝑖𝑖 − 𝑦𝑦�𝑖𝑖 )2

𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1

Total sum of squares, regression sum of squares, Residual sum of squares,

represents total variation represents variation represents variation not
in the dependent variable explained by regression explained by regression
The Simple Regression Model
• Decomposition of total variation
𝑆𝑆𝑆𝑆𝑆𝑆 = 𝑆𝑆𝑆𝑆R + 𝑆𝑆𝑆𝑆𝐸𝐸

Total variation Explained part Unexplained part

• Goodness-of-fit measure (R-squared)

2
𝑆𝑆𝑆𝑆𝑅𝑅 𝑆𝑆𝑆𝑆𝐸𝐸
𝑅𝑅 = =1−
𝑆𝑆𝑆𝑆𝑆𝑆 𝑆𝑆𝑆𝑆𝑆𝑆 R-squared measures the fraction of the
total variation that is explained by the
regression
The Simple Regression Model
• CEO Salary and return on equity
� = 963.191 + 18.501𝑟𝑟𝑟𝑟𝑟𝑟
𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 The regression explains only 1.3%
of the total variation in salaries

𝑛𝑛 = 209, 𝑅𝑅2 = 0.0132

• Voting outcomes and campaign expenditures
� = 26.81 + 0.464𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠
𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣
The regression explains 85.6% of the total
variation in election outcomes
𝑛𝑛 = 173, 𝑅𝑅2 = 0.856
• Caution: A high R-squared does not necessarily
mean that the regression has a causal
interpretation!
The Simple Regression Model

• Regression through the origin

• There are certain relationships for which this is reasonable.
• For example, if income is zero, then income tax revenues must also be
zero.
• In addition, there are settings where a model that originally has a
nonzero intercept is transformed into a model without an intercept
• You will be asked to work through this on problem set 1.

Multple Linear Regression
No ratings yet
Multple Linear Regression
8 pages
STAT 445 Regression Analysis
No ratings yet
STAT 445 Regression Analysis
49 pages
Simple Linear Regression Model I
No ratings yet
Simple Linear Regression Model I
83 pages
Lecture 3 Simple Linear Regression
No ratings yet
Lecture 3 Simple Linear Regression
46 pages
Simple Linear Regression Model (1)
No ratings yet
Simple Linear Regression Model (1)
51 pages
StatLearning2r PDF
No ratings yet
StatLearning2r PDF
267 pages
Chapter Two: Bivariate Regression Mode
100% (1)
Chapter Two: Bivariate Regression Mode
54 pages
EECM3724 Unit 9 ch14 Slides 2023
No ratings yet
EECM3724 Unit 9 ch14 Slides 2023
57 pages
Complete Business Statistics: Simple Linear Regression and Correlation
No ratings yet
Complete Business Statistics: Simple Linear Regression and Correlation
50 pages
ch-02-wooldridge-5e-ppt20250307
No ratings yet
ch-02-wooldridge-5e-ppt20250307
51 pages
Lecture 2-2_Simple Linear Regression (One Regressor)
No ratings yet
Lecture 2-2_Simple Linear Regression (One Regressor)
22 pages
Chapter 2 SLRM
No ratings yet
Chapter 2 SLRM
40 pages
Ordinary Least Squares Linear Regression Review: Week 4
No ratings yet
Ordinary Least Squares Linear Regression Review: Week 4
10 pages
Chapter 1: The Nature of Econometrics and Economic Data
No ratings yet
Chapter 1: The Nature of Econometrics and Economic Data
19 pages
BST 32202 LINEAR REGRESSION 6 SLR ASSUMPTIONS LSE
No ratings yet
BST 32202 LINEAR REGRESSION 6 SLR ASSUMPTIONS LSE
20 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
42 pages
Ie Slide02
No ratings yet
Ie Slide02
30 pages
CH 02 Simple Regression TQT
No ratings yet
CH 02 Simple Regression TQT
61 pages
Notes On Applied Linear Regression
No ratings yet
Notes On Applied Linear Regression
47 pages
Bivariate Regression Analysis: The Beginning of Many Types of Regression
No ratings yet
Bivariate Regression Analysis: The Beginning of Many Types of Regression
40 pages
SLR Note
No ratings yet
SLR Note
5 pages
Chapter2 1
No ratings yet
Chapter2 1
55 pages
Lec2 ASE
No ratings yet
Lec2 ASE
86 pages
chapter 9
No ratings yet
chapter 9
44 pages
Simple Linear Regression Model Ordinary Least Square (OLS) Method
No ratings yet
Simple Linear Regression Model Ordinary Least Square (OLS) Method
18 pages
Topic3_SimpleLinearRegressionModels
No ratings yet
Topic3_SimpleLinearRegressionModels
97 pages
Chapter 9 Simple Linear Regression and Correlation (1) (1)
No ratings yet
Chapter 9 Simple Linear Regression and Correlation (1) (1)
56 pages
Slides Prepared by John S. Loucks St. Edward's University
No ratings yet
Slides Prepared by John S. Loucks St. Edward's University
48 pages
10 - Regression 1
No ratings yet
10 - Regression 1
58 pages
Chap010
No ratings yet
Chap010
45 pages
Classical Linear Regression Model (CLRM)
100% (1)
Classical Linear Regression Model (CLRM)
68 pages
Eco 3
No ratings yet
Eco 3
68 pages
WEEK2 Simple Regression
No ratings yet
WEEK2 Simple Regression
133 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
27 pages
Lecture 2 Simple Regression Model
100% (1)
Lecture 2 Simple Regression Model
47 pages
Simple_linear_regression-Presentation -Review-analysis -covariance
No ratings yet
Simple_linear_regression-Presentation -Review-analysis -covariance
10 pages
C1 English
No ratings yet
C1 English
26 pages
Linear Regression
No ratings yet
Linear Regression
47 pages
Regression Equations
No ratings yet
Regression Equations
94 pages
Lecture1 STAT4355
No ratings yet
Lecture1 STAT4355
59 pages
325unit 1 Simple Regression Analysis
No ratings yet
325unit 1 Simple Regression Analysis
10 pages
Chapter 7 - Linear Regression
No ratings yet
Chapter 7 - Linear Regression
3 pages
Lecture6 Regression
No ratings yet
Lecture6 Regression
42 pages
Pertemuan 3
No ratings yet
Pertemuan 3
23 pages
Linear Regression Analysis
100% (3)
Linear Regression Analysis
53 pages
Econometrics Unit 3 Tedy Best
No ratings yet
Econometrics Unit 3 Tedy Best
147 pages
qrm2 Session1 2
No ratings yet
qrm2 Session1 2
89 pages
Chapter 2 - 1907876925
No ratings yet
Chapter 2 - 1907876925
33 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
55 pages
Simple Regression Model CH02
No ratings yet
Simple Regression Model CH02
60 pages
CH - 02 - Simple Linear Regression - TQT
No ratings yet
CH - 02 - Simple Linear Regression - TQT
61 pages
Reg Analysis
No ratings yet
Reg Analysis
63 pages
Lecture 2-3
No ratings yet
Lecture 2-3
8 pages
regression2
No ratings yet
regression2
28 pages
03 Revisions L Regression
No ratings yet
03 Revisions L Regression
25 pages
CH 02
No ratings yet
CH 02
88 pages
L1 The SLR model
No ratings yet
L1 The SLR model
11 pages
Topic 3 SRM 1
No ratings yet
Topic 3 SRM 1
61 pages
Notes2
No ratings yet
Notes2
16 pages
FE5209 3 AY 2024
No ratings yet
FE5209 3 AY 2024
59 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Side_Notes
No ratings yet
Side_Notes
7 pages
STAT_453__Population_Statistics_Lecture_3
No ratings yet
STAT_453__Population_Statistics_Lecture_3
17 pages
Session_1
No ratings yet
Session_1
83 pages
SAMPLE SIZE CALCULATION
No ratings yet
SAMPLE SIZE CALCULATION
17 pages
MATH 123 Module 1 _Lecture Video 3
No ratings yet
MATH 123 Module 1 _Lecture Video 3
19 pages
Correlation Regression 1
No ratings yet
Correlation Regression 1
25 pages
1 - Stat-701 Regression
No ratings yet
1 - Stat-701 Regression
18 pages
Mekie Proposal
No ratings yet
Mekie Proposal
27 pages
Example For Final Exam
No ratings yet
Example For Final Exam
6 pages
Sharif Hossain - Econometric Analysis_ an Applied Approach to Business and Economics-Cambridge Scholars Publishing (2024)
No ratings yet
Sharif Hossain - Econometric Analysis_ an Applied Approach to Business and Economics-Cambridge Scholars Publishing (2024)
741 pages
The Impacts of Oil Price and Exchange Rate On Vietnamese Stock Market
No ratings yet
The Impacts of Oil Price and Exchange Rate On Vietnamese Stock Market
8 pages
Exploring Factors Influencing E-Hailing Services in Klang Valley, Malaysia
No ratings yet
Exploring Factors Influencing E-Hailing Services in Klang Valley, Malaysia
15 pages
How - To - Determine - Gestational - Age - of - An - Equine - Preg Mcdonell 2006
No ratings yet
How - To - Determine - Gestational - Age - of - An - Equine - Preg Mcdonell 2006
7 pages
Estad Istica II Chapter 5. Regression Analysis (Second Part)
No ratings yet
Estad Istica II Chapter 5. Regression Analysis (Second Part)
39 pages
02 Forecasting
No ratings yet
02 Forecasting
9 pages
Trends in Production and Export of Gesho/Rhamnus Prinoids in Ethiopia
No ratings yet
Trends in Production and Export of Gesho/Rhamnus Prinoids in Ethiopia
8 pages
inbound3991216296804003764
No ratings yet
inbound3991216296804003764
15 pages
CEA_ECE069_SAS-17-1
No ratings yet
CEA_ECE069_SAS-17-1
9 pages
Machine Learning Guidelines and Practical List - Tutorialsduniya
No ratings yet
Machine Learning Guidelines and Practical List - Tutorialsduniya
2 pages
Emissivity of U02 MATPRO PDF
No ratings yet
Emissivity of U02 MATPRO PDF
404 pages
Asean Drug Stability Guidance
100% (2)
Asean Drug Stability Guidance
37 pages
Determining The Area of Viti Levu Using Integration
No ratings yet
Determining The Area of Viti Levu Using Integration
10 pages
Introduction To Machine Learning IIT KGP Week 2
100% (1)
Introduction To Machine Learning IIT KGP Week 2
14 pages
Structural Modeling by Example Applications in Educational Sociological and Behavioral Research 1st Edition Peter Cuttance - Read the ebook online or download it to own the full content
100% (2)
Structural Modeling by Example Applications in Educational Sociological and Behavioral Research 1st Edition Peter Cuttance - Read the ebook online or download it to own the full content
47 pages
Ordinal Logistic Regression: 13.1 Background
No ratings yet
Ordinal Logistic Regression: 13.1 Background
15 pages
Raju Internship Report
No ratings yet
Raju Internship Report
27 pages
18 Regression Analysis of Index Properties of Soil As Strength Determinant For California Bearing Ratio
No ratings yet
18 Regression Analysis of Index Properties of Soil As Strength Determinant For California Bearing Ratio
12 pages
Statistics 02
No ratings yet
Statistics 02
8 pages
MA Economics CBCS 2023 24 With Objectives
No ratings yet
MA Economics CBCS 2023 24 With Objectives
34 pages
Logistic Regression Course Note
No ratings yet
Logistic Regression Course Note
23 pages
Comparison of Methods: Passing and Bablok Regression - Biochemia Medica
No ratings yet
Comparison of Methods: Passing and Bablok Regression - Biochemia Medica
5 pages
The Effect of Capital Structure On Profitability of The Zambia National Commercial Bank
No ratings yet
The Effect of Capital Structure On Profitability of The Zambia National Commercial Bank
26 pages
Name: Muhammad Siddique Class: B.Ed. Semester: Fifth Subject: Inferential Statistics Submitted To: Sir Sajid Ali
No ratings yet
Name: Muhammad Siddique Class: B.Ed. Semester: Fifth Subject: Inferential Statistics Submitted To: Sir Sajid Ali
6 pages
Eclectic Accounting Problems
No ratings yet
Eclectic Accounting Problems
2 pages

STAT 445-Lecture 1_2021

Uploaded by

STAT 445-Lecture 1_2021

Uploaded by

STAT 445

• Regression Analysis is applied in many fields including Engineering,

• A researcher who wants to study the effect of fertilizer on soybean

First, since there is never an exact relationship between two

Second, what is the functional relationship between 𝑦𝑦 and 𝑥𝑥?

Third, how can we be sure we are capturing a ceteris paribus

𝑦𝑦 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥 + 𝑢𝑢 ------------------(1)

• Equation (1) which is assumed to hold in the population of interest,

• It is also called the two-variable linear regression model or bivariate

Intercept Slope parameter

• We usually assume that the errors are uncorrelated.

• The simple linear regression model is rarely applicable in practice but

• Example: A simple wage equation

Population regression function

For individuals with 𝑥𝑥 = 𝑥𝑥2

𝑥𝑥3 , 𝑦𝑦3 Third observation

• Fit as good as possible a regression line through the data points:

For example, the i-th Fitted regression line

• The Least Squares criterion:

• Minimize sum of squared residuals

min � 𝑢𝑢� 𝑖𝑖2 → 𝛽𝛽̂0 , 𝛽𝛽̂1

Simplifying these two equations yields

𝑛𝑛𝛽𝛽̂0 + 𝛽𝛽̂1 � 𝑥𝑥𝑖𝑖 = � 𝑦𝑦𝑖𝑖

𝛽𝛽̂0 � 𝑥𝑥𝑖𝑖 + 𝛽𝛽̂1 � 𝑥𝑥𝑖𝑖2 = � 𝑦𝑦𝑖𝑖 𝑥𝑥𝑖𝑖

𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 = 𝛽𝛽0 + 𝛽𝛽1 𝑟𝑟𝑟𝑟𝑟𝑟 + 𝑢𝑢

If the return on equity increases by 1 percentage point,

Fitted regression line

Unknown population regression line

Hourly wage in cedis Years of education

In the sample, one more year of education is

Percentage of vote for candidate A

Intercept If candidate A‘s share of spending increases by one

𝐸𝐸 𝛽𝛽̂0 = ? , 𝐸𝐸 𝛽𝛽̂1 = ? 𝑉𝑉𝑉𝑉𝑉𝑉 𝛽𝛽̂0 = ? , 𝑉𝑉𝑉𝑉𝑉𝑉 𝛽𝛽̂1 = ?

• Standard assumptions for linear regression model

• Assumptions for the linear regression model (cont.)

• Assumption SLR.3 (Zero conditional mean)

• To show these desirable properties, we first need to note the following;

𝐸𝐸 𝛽𝛽̂1 = 𝐸𝐸(� 𝑐𝑐𝑖𝑖 𝑦𝑦𝑖𝑖 ) = � 𝑐𝑐𝑖𝑖 𝐸𝐸 𝑦𝑦𝑖𝑖

= � 𝑐𝑐𝑖𝑖 (𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥𝑖𝑖 ) = 𝛽𝛽0 � 𝑐𝑐𝑖𝑖 + 𝛽𝛽1 � 𝑐𝑐𝑖𝑖 𝑥𝑥𝑖𝑖

• Assumption SLR.4 (Homoskedasticity)

• Graphical illustration of homoskedasticity

The variability of the

The variance of the unobserved

• Under assumptions SLR.1 – SLR.4, Gauss-Markov theorem states that,

𝜎𝜎 2 𝑛𝑛−1 ∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖2 𝜎𝜎 2 𝑛𝑛−1 ∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖2

𝑉𝑉𝑉𝑉𝑉𝑉 𝛽𝛽̂1 = 𝑉𝑉𝑉𝑉𝑉𝑉 � 𝑐𝑐𝑖𝑖 𝑦𝑦𝑖𝑖 = � 𝑐𝑐𝑖𝑖2 𝑉𝑉𝑉𝑉𝑉𝑉 𝑦𝑦𝑖𝑖

𝑉𝑉𝑉𝑉𝑉𝑉 𝛽𝛽̂0 = 𝑉𝑉𝑉𝑉𝑉𝑉 𝑦𝑦� − 𝛽𝛽̂1 𝑥𝑥̅

𝑠𝑠𝑠𝑠 𝛽𝛽̂0 = 𝑉𝑉𝑉𝑉𝑉𝑉 𝛽𝛽̂0 = 𝜎𝜎� 2 𝑛𝑛−1 � 𝑥𝑥𝑖𝑖2 �𝑆𝑆𝑆𝑆𝑆𝑆𝑥𝑥

𝑆𝑆𝑆𝑆𝑆𝑆 = � 𝑦𝑦𝑖𝑖 − 𝑦𝑦� 2 𝑆𝑆𝑆𝑆𝑅𝑅 = � 𝑦𝑦�𝑖𝑖 − 𝑦𝑦� 2 𝑆𝑆𝑆𝑆𝐸𝐸 = �(𝑦𝑦𝑖𝑖 − 𝑦𝑦�𝑖𝑖 )2

Total sum of squares, regression sum of squares, Residual sum of squares,

Total variation Explained part Unexplained part

• Goodness-of-fit measure (R-squared)

𝑛𝑛 = 209, 𝑅𝑅2 = 0.0132

• Regression through the origin

You might also like