0% found this document useful (0 votes)

14 views

Lect2 Part1

Uploaded by

lynn871004

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views

Lect2 Part1

Uploaded by

lynn871004

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 64

Econometrics I: Fundamentals of Regression Analysis

Part 1

Javier Abellán, Màxim Ventura and Carlos Suárez

Universitat Pompeu Fabra

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 1 / 64
Fundamentals of Regression Analysis

Contents

1. The role of Econometrics: simple regression model

2. Different types of data
3. The Ordinary Least Squares estimator (OLS)
4. First order conditions
5. Interpretation of the coefficients
6. Measures of goodness of fit
7. The OLS estimator assumptions
8. The sampling distribution of the OLS estimator
9. Homoskedasticity and heteroskedasticity
10. Hypothesis test and confidence intervals

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 2 / 64
Fundamentals of Regression Analysis The role of Econometrics: simple regression model

The role of Econometrics: simple regression model

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 3 / 64
Fundamentals of Regression Analysis The role of Econometrics: simple regression model

Empirical problem: Size of an apartment and its price

• Let’s assume we work in a real estate firm and we are asked to set the
price of a property for which we only know its size
• We have information regarding the price and size of different apartments
from (1260) previous transactions

• Using all this information, want to know how the area (in squared
meters) of an apartment in Barcelona affects its price (in euros)
Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 4 / 64
Fundamentals of Regression Analysis The role of Econometrics: simple regression model

• We could then draw a line through the data points Pricei = β0 + βarea Areai

• Accordingly, the area of an apartment i is linearly related to its price,

where β0 is the ordinate to the origin while βarea is the slope of the line

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 5 / 64
Fundamentals of Regression Analysis The role of Econometrics: simple regression model

Stata code for the scatterplot

clear all
set more off
use habitatge_BCN_1920_12.dta, clear
twoway (scatter preu superf, msize(tiny) mcolor(edkblue)) ///
(lfit preu superf, lwidth(medium) lcolor(black))
corr superf preu

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 6 / 64
Fundamentals of Regression Analysis The role of Econometrics: simple regression model

• Of course, many other factors probably influence the price of an

apartment
• For instance, we know that location affects the price of a property
• For that reason, as we can see in the previous graphic, the line won’t
accurately predict the price of every property
• All those factors that are not explicitly included in the regression will be
gathered altogether in the error term
Pricei = β0 + βarea Areai + other factorsi

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 7 / 64
Fundamentals of Regression Analysis The role of Econometrics: simple regression model

• In general
Yi = β0 + β1 Xi + ui (1)
• This is the simple linear regression model with just one regressor, where
Yi is the dependent variable for unit i, Xi is the regressor or independent
variable for unit i and ui is the error term for unit i.
• The first part β0 + β1 Xi is the population regression line, that is, the
average relation between X and Y that we see in the population
• If we know the value of β0 and β1 , for a given X we could use the
corresponding population equation and predict its Y
• Last but not least, ui is the difference between Yi and the corresponding
regression line, and arises from all the other factors affecting Yi that have
not been included in our model

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 8 / 64
Fundamentals of Regression Analysis The role of Econometrics: simple regression model

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 9 / 64
Fundamentals of Regression Analysis The role of Econometrics: simple regression model

• What does it means that β0 and β1 are unknown population parameters?

• The problem is similar to others in statistics
• We don’t know the value of population parameters but we can estimate
them using a random sample and the appropriate technique

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 10 / 64
Fundamentals of Regression Analysis The role of Econometrics: simple regression model

• So, how do we estimate the slope of a line such that goes through the
scatterplot of Size and Price?
• Of course, there is no line that will go through all the data points and we
can actually draw an infinite number of different lines that go through the
data points
• So, which criteria should we use to pick one among all the possibilities?

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 11 / 64
Fundamentals of Regression Analysis Different types of data

Different types of data

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 12 / 64
Fundamentals of Regression Analysis Different types of data

Experimental vs Observational

• Experimental data are observations that comes from an

studies where the researcher has knows and controls the
assignment mechanism for each treatment value, for every
unit.
• Observational data are observations that comes from
studies where the assignment mechanism is not known or
not under the control of the researcher
• The effectiveness studies to approve different Covid
vaccines are an example of experimental data (people
where randomly assigned by the researchers to the vaccine
or to a placebo). The effect of Covid on the economy would
be an example of studies using observational data

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 13 / 64
Fundamentals of Regression Analysis Different types of data

Cross Sectional vs Time Series

• Cross-sectional data are observations that come from

different individuals or groups at a single point in time
• On the other hand, time-series data are observations
collected at spaced time intervals
• The expenditure on energy of families living in Catalunya
during 2020 or the grades obtained in Econometrics I by the
2019/2020 cohort would be an expample of cross sectional
data
• The daily closing price of a certain stock, weekly sales
figures of ice cream sold or the yearly number of students
registered for a Econometrics I at upf are examples of
time-series data

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 14 / 64
Fundamentals of Regression Analysis The Ordinary Least Squares (OLS) estimator

The Ordinary Least Squares (OLS) estimator

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 15 / 64
Fundamentals of Regression Analysis The Ordinary Least Squares (OLS) estimator

The OLS estimator

• If we remember from the probability and statistics review, among other
properties Ȳ was the minimum squares estimator of µY , and it solved
n
X
min (Yi − µY )2
m
i=1

• The ordinary least squares (OLS) extends this idea for the linear
regression model
• Let’s assume that b0 and b1 are some estimators of the unknown
parameters β0 and β1
• Based on those estimators, the regression line is b0 + b1 X
• From that estimation, the predicted value of Yi is Ŷi = b0 + b1 Xi
• Therefore, the residual from the ith prediction will be
ûi = Yi − Ŷi = Yi − (b0 + b1 Xi ) = Yi − b0 − b1 Xi
• Note: The residual ûi can be interpreted as the sample counterpart of ui
Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 16 / 64
Fundamentals of Regression Analysis The Ordinary Least Squares (OLS) estimator

• The OLS estimator of β0 and β1 solves the following minimisation

problem:
n
X
min û2i
(b0 ,b1 )
i=1
n
X
= min (Yi − b0 − b1 Xi )2
(b0 ,b1 )
i=1

• It looks for the pair (b0 , b1 ) that solves the aforementioned problem
• We will refer to the pair that minimises the sum of squares as (β̂0 , β̂1 )

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 17 / 64
Fundamentals of Regression Analysis The Ordinary Least Squares (OLS) estimator

Pn
(X − X̄)(Yi − Ȳ) sXY
β̂1 = Pn i
i=1
=
i=1 (Xi − X̄)
2 s2X
β̂0 = Ȳ − β̂1 X̄
ûi = Yi − Ŷi = Yi − β̂0 − β̂1 Xi

• The first two equations are the OLS estimators of the unknown
population parameters β0 and β1 ; the third is the residual from model
prediction (sample counterpart of the error term, but we should never
interpret them as equivalent)
• Different samples, will generate different estimands of β̂0 and β̂1 (that is,
their estimated value)
• That is, the estimators are random variables and their particular value
will depend on the sample

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 18 / 64
Fundamentals of Regression Analysis The Ordinary Least Squares (OLS) estimator

Notation

• β̂1 is the OLS estimator of β1

• β̂0 is the OLS estimator of β0
• The regression line is the sample regression line or sample regression
function
• The predicted value of Yi for a given Xi is Ŷi = β̂0 + β̂1 Xi
• The residual for the observation i is the difference between Yi and its
predicted value Ŷi : ûi = Yi − Ŷi
• Do not confuse the residual ûi with the error term ui ; they are two
different things!

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 19 / 64
Fundamentals of Regression Analysis First order conditions of the OLS estimators

First order conditions of the OLS estimators

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 20 / 64
Fundamentals of Regression Analysis First order conditions of the OLS estimators

Some interpretations of the first order conditions

From the minimization problem, the first order conditions are:

∂ X
= −2 (Yi − b0 − b1 Xi ) = 0
∂b0 i
∂ X
= −2 (Yi − b0 − b1 Xi )Xi = 0
∂b1 i

1. The prediction of Ŷi for X̄ is Ȳ

2. If the regression model includes a constant term, the mean of the
residuals (ûi ) is zero
3. The residuals ûi are uncorrelated with Xi
4. The residuals ûi are uncorrelated with Ŷi (the prediction of Yi )

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 21 / 64
Fundamentals of Regression Analysis First order conditions of the OLS estimators

Example of the first order conditions

• In the appendix of these notes you have the formal proof of these
properties, let’s take a look at those using the data from our example.
• In order to do so, we will:
1. Run the OLS regression of price on area
2. Predict the value of Ŷi using X̄: β̂0 + β̂1 X̄
3. Predict the OLS residuals ûi as the difference Yi − Ŷi , where
Ŷi = β̂0 + β̂P
1 Xi
n
4. Calculate i ûi
5. Calculate Corr(Xi , ûi )
6. Calculate Corr(Ŷi , ûi )

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 22 / 64
Fundamentals of Regression Analysis First order conditions of the OLS estimators

Stata code

sum superf
local msuperf = r(mean)
disp _b[_cons] + _b[superf]*‘msupfer’
sum preu
reg preu superf
predict resid, r
predict yhat, xb
sum resid
corr superf resid
corr yhat resid

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 23 / 64
Fundamentals of Regression Analysis First order conditions of the OLS estimators

Property # 1

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 24 / 64
Fundamentals of Regression Analysis First order conditions of the OLS estimators

Property # 2

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 25 / 64
Fundamentals of Regression Analysis First order conditions of the OLS estimators

Property # 3

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 26 / 64
Fundamentals of Regression Analysis First order conditions of the OLS estimators

Property # 4

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 27 / 64
Fundamentals of Regression Analysis Interpretation of the OLS coefficients

Interpretation of the OLS coefficients

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 28 / 64
Fundamentals of Regression Analysis Interpretation of the OLS coefficients

Interpretation of the estimated coefficients for β̂0 and β̂1

• Since the regression model is a linear model,

∆Y (Y | X + ∆X) − (Y | X)
• β1 = =
∆X ∆X
• Therefore, β1 is the slope of the line and the units of β1 are the units of Y
over the units of X
• β0 is the ordinate to the origin, that is, the value of Y when X = 0.
• The units of β0 are the units of Y

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 29 / 64
Fundamentals of Regression Analysis Interpretation of the OLS coefficients

• Similarly, we can interpret β̂1 as the predicted (or adjusted) change in Y

when we change X by one unit
• And we can interpret β̂0 as the predicted value of Y when X = 0. Beyond
the algebraic interpretation of β̂0 , we have to be careful how to interpret
it, particularly when the X cannot be equal to 0
• Let’s look at the example of housing price and area of the dwelling

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 30 / 64
Fundamentals of Regression Analysis Interpretation of the OLS coefficients

Stata output from the OLS regression

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 31 / 64
Fundamentals of Regression Analysis Interpretation of the OLS coefficients

Stata output interpretation

• In Stata, the command reg (or regress) will estimate linear model of Y
on X
• Let’s take a look at the result of using that command for the housing
example, where Y is the price of the dwelling in euros and X its area in
sq meters
• According to the results, β̂1 = 1641.242 euros/m2
• That is, according to our estimation, if the size of the dwelling is 1
squared meter larger, we expect the price of the dwelling to be 1641.24
euros higher

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 32 / 64
Fundamentals of Regression Analysis Interpretation of the OLS coefficients

Stata output interpretation

• β̂0 = −31353.4 euros

• Algebraically, that tells us that an apartment of 0 square meters costs
-31353.4 euros. Because no dwelling has 0 squared meters, this
coefficient does not make economic sense
• We should never interpret the estimated β̂0 when the X variable can not
take the value 0
• Remember both β̂1 and β̂0 are random variables. With a different sample
of dwellings we will have obtained different estimated values
• Other things to notice: the model was estimated using 1267
observations; we will go through the rest of the table later

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 33 / 64
Fundamentals of Regression Analysis Interpretation of the OLS coefficients

Interpretation of the estimated coefficient when X is a

binary variable (0/1)

• In many situations the regressor will be a binary variable:

▶ Sex: Man (X = 0), Woman (X = 1)
▶ Received some experimental treatment for mental health (X = 1),
control (X = 0)
▶ Small apartment (X = 0), Large apartment (X = 1)
• So far, we have interpreted β1 as the slope of a line; however this
interpretation is meaningless if X is binary
• How do we interpret the coefficient for a binary regressor?

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 34 / 64
Fundamentals of Regression Analysis Interpretation of the OLS coefficients

• Let’s estimate Yi = β0 + β1 Xi + ui but now Xi is equal to 1 if the dwelling

is 150 squared meters or more and 0 otherwise

• When X is a binary 0/1 variable, β̂1 is the difference of the sample mean
of the price of the dwelling between the two groups (large dwellings vs
small dwellings)
• That is, β̂1 = (Ȳ | X = 1) − (Ȳ | X = 0) = Ȳ1 − Ȳ0

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 35 / 64
Fundamentals of Regression Analysis Interpretation of the OLS coefficients

• As we can see, the OLS estimator is just a generalisation of the sample

mean difference between groups, when the X is allowed to be
continuous

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 36 / 64
Fundamentals of Regression Analysis Interpretation of the OLS coefficients

Things to bear in mind regarding the linear regression

model and the OLS estimator β̂1 and β̂0

1. By construction, ûi (the residuals) are not correlated with Xi . This can
be interpreted as if the OLS estimator is extracting all the linear
information from X that is useful to predict Y
2. ûi ̸= ui . The first is the difference between the value of Yi and the
predicted value Ŷi ; the second is the unknown error from the population
regression line
3. The units of β̂1 are the units of Y over the units of X, while the units of β̂0
are in Y’s units. Therefore, a different unit of X or a different unit of Y will
have consequences on the estimated coefficients (but not on the
underlying conclusions of the model)
4. The interpretation of β̂0 is meaningful only if the probability of X being 0
is larger than zero

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 37 / 64
Fundamentals of Regression Analysis Measures of goodness of fit

Measures of goodness of fit

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 38 / 64
Fundamentals of Regression Analysis Measures of goodness of fit

Measures of goodness of fit

• A natural question is how well the regression line "fits" or explains the
data
• How much of the observed variation in Y is explained by our model?
How close is the regression line to the observations?
• There are two regression statistics that provide complementary
measures of the quality of the fit:
▶ The R2 statistic measures the fraction of the variance of Y that is
explained by X; it is unitless and ranges between zero (no fit) and
one (perfect fit)
▶ The standard error of the regression (SER) measures the typical
size of a regression residual in the units of Y.

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 39 / 64
Fundamentals of Regression Analysis Measures of goodness of fit

• The R2 tells us the fraction of the variance of Yi that is explained or

predicted by the values of Xi .
var(Ŷi )
• Accordingly, R2 =
var(Yi )
1 Pn 1 Pn
• where var(Ŷi ) = (Ŷi − Ȳ)2 and var(Yi ) = (Yi − Ȳ)2
n i=1 n i=1
• Let’s define the Total Sum of Squared (TSS) as the sum of the square of
Pn
the deviation of Y from its the sample mean: i=1 (Yi − Ȳ)2
• Let’s define the Explained Sum Square (ESS) as the sum of the square
Pn ¯ 2
of the deviation of Ŷ from its mean: i=1 (Ŷi − Ŷ)
1
Pn
var(Ŷi ) (Ŷ − Ȳ)2 ESS
• R2 = = 1n Pni=1 =
var(Yi ) n (Y
i=1 i − Ȳ)2 TSS

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 40 / 64
Fundamentals of Regression Analysis Measures of goodness of fit

• R2 can also be written in term of the residuals

• Working out the expression for the TSS we have:
n
X
TSS = (Yi − Ŷi + Ŷi − Ȳ)2
i=1
n
X n
X n
X
= (Yi − Ŷi )2 + (Ŷi − Ȳ)2 + 2 (Yi − Ŷi )(Ŷi − Ȳ)
i=1 i=1 i=1
n
X
= RSS + ESS + 2 (ûi Ŷi ) = RSS + ESS
i=1

RSS
• Therefore R2 = 1 −
TSS

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 41 / 64
Fundamentals of Regression Analysis Measures of goodness of fit

Rewriting the equation for the residual, we have that Yi = Ŷi + ûi . Therefore,

var(Yi ) = var(Ŷi ) + var(ûi )

Pn
2 (ûi − ūi )2
R = 1 − Pi=1 n 2
i=1 (Yi − Ȳ)
RSS
R2 = 1 −
TSS
Xn
RSS = û2i
i=1

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 42 / 64
Fundamentals of Regression Analysis Measures of goodness of fit

• The R2 ranges from 0 to 1.

• If it is 0, it means that our regressor does not predict any of the observed
variance of Y. This is equivalent to a model where β̂1 = 0
• The "worst" model we can think of is one where we exclude X and we
predict the value of Y using Ȳ; this is similar to estimating the following
regression Yi = β0 + ui (check that this model does not explain any
fraction of the observed variance of Y)
• Be aware that a high or low R2 does not tell us anything in absolute
terms; it is only a relative.
• The worst model we can think of has an R2 of zero. Therefore, the
comparison we are doing is relative to this case

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 43 / 64
Fundamentals of Regression Analysis Measures of goodness of fit

R2 = Corr(X, Y)2
• One interesting thing is that the R2 is the same as the sample correlation
of X and Y squared
• That is because, in the end, the linear regression model is just scaling
the sample correlation between X and Y by the variance of X. So the
model is as good as it is the correlation between these two variables

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 44 / 64
Fundamentals of Regression Analysis Measures of goodness of fit

R2 : warning

• R2 is a measure of the goodness of the fit regarding the initial question

that motivated the statistical analysis
• That is, R2 does not judge whether the question is right or whether the
data is right, it just shows how adequate the answer is to the question
• Comparing statistical models based on the R2 is similar to comparing
cars based on the size
• "Every model is wrong, some of them are useful" (George Box)
• There is no way of saying whether the estimated R2 is high or low, unless
we make explicit what we expected from the model
• Finally, the R2 is a measure of goodness of fit in-the-sample; that is, it is
not useful to make out of sample predictions

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 45 / 64
Fundamentals of Regression Analysis Measures of goodness of fit

Standard Error of the Regression (SER)

The standard error of the regression is (almost) the sample standard

deviation of the OLS residuals
q Pn ¯ 2 RSS
• SER = sû = s2û , where s2û = n−2
1
i=1 (ûi − ûi ) =
n−2
• SER measures the spread of the distribution of u
• SER measures the average "size" of the OLS residual (the average
"mistake" made by the estimated OLS regression line)
• SER is computed using the sample residual û
• The root mean squared error (RMSE) is closely related to the SER:
q P
n
RMSE = 1n i=1 (ûi )2

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 46 / 64
Fundamentals of Regression Analysis Measures of goodness of fit

• Division by n − 2 is an adjustment to correct for the downward bias for

the two degrees of freedom that are lost when we estimate β̂0 and β̂1
• This is similar to the case of s2Y ; here we divide by n − 1 to compute the
standard deviation because we lost one degree of freedom to compute
the mean.
• If n is large enough, the difference between dividing by n − 2 or n is
negligible

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 47 / 64
Fundamentals of Regression Analysis Measures of goodness of fit

• In our case, R2 = 0.6867 which means that the area of the dwelling
explains 68.67% of the variance in the price of the dwelling
• According to the model, since the RSS is 1545445061347.875 and we
have 1267 observations, the SER is 34953 euros and the RMSE is
34925 euros
• Word of caution: Stata calculates the RMSE differently. It divides by
(n-2) instead of n. So what Stata puts as the RMSE is actually the SER

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 48 / 64
Fundamentals of Regression Analysis Appendix

Appendix

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 49 / 64
Fundamentals of Regression Analysis Appendix

Derivation of the OLS estimators for the simple regression

model

• Let’s go over the algebra to derive the OLS estimators

• The OLS estimators of the simple model are the (b0 , b1 ) that minimises
the following equation:

Minb0 ,b1 i û2i = Minb0 ,b1 i (Yi − b0 − b1 Xi )2

P P

• From the first order conditions

∂ X
= −2 (Yi − b0 − b1 Xi ) = 0
∂b0 i
∂ X
= −2 (Yi − b0 − b1 Xi )Xi = 0
∂b1 i

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 50 / 64
Fundamentals of Regression Analysis Appendix

• Working out with the first order condition

X
(Yi − b0 − b1 Xi ) = 0
i
X X X
Yi − b0 − b1 Xi ) = 0
i i i
X X X
b0 = Yi − b1 Xi
i i i
b0 = Ȳ − b1 X̄i

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 51 / 64
Fundamentals of Regression Analysis Appendix

• Working out with the second order condition

X
(Yi − b0 − b1 Xi )Xi = 0
i
X X X
Yi Xi − b0 Xi − b1 Xi2 = 0
i i i

• Substituting the previous result b0 = Ȳ − b1 X̄i

X X X
Yi Xi − (Ȳ − b1 X̄i ) Xi − b1 Xi2 = 0
i i i

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 52 / 64
Fundamentals of Regression Analysis Appendix

• Working out with the second order condition

X X X X
Yi Xi − Ȳ Xi − b1 X̄i Xi − b1 Xi2 = 0
i i i i
X X X X
b1 ( Xi2 − Xi X̄) = Yi Xi − Ȳ Xi
i i i i
X X
b1 Xi (Xi − X̄) = Xi (Yi − Ȳ)
i i

• Let’s show that i Xi (Xi − X̄) = (Xi − X̄)2 and

P P
P P
i Xi (Yi − Ȳ) = (Xi − X̄)(Yi − Ȳ)

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 53 / 64
Fundamentals of Regression Analysis Appendix

X X X X
(Xi − X̄)2 = Xi2 + X̄ 2 − 2X̄ Xi
X
= Xi2 + N X̄ 2 − N2X̄ 2
X
= Xi2 − N X̄ 2
X X
= Xi2 − X̄ 2
X
= Xi (Xi − X̄)
i

• And similar for

P P
i Xi (Yi − Ȳ) = (Xi − X̄)(Yi − Ȳ)

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 54 / 64
Fundamentals of Regression Analysis Appendix

P
(Xi − X̄)(Yi − Ȳ)
b1 = P
(Xi − X̄)2

• Therefore, the parameters that minimize the sum of residuals square in

the simple model are (OLS estimators)
P
(Xi − X̄)(Yi − Ȳ)
β̂1 = P
(Xi − X̄)2
β̂0 = Ȳ − β̂1 X̄

• And using the definition of sample covariance and variance we can

rewrite the previous equation as:

SXY
β̂1 =
SX2
β̂0 = Ȳ − β̂1 X̄

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 55 / 64
Fundamentals of Regression Analysis Appendix

Algebraic proof of the results from the first order conditions

of the OLS estimator

1. The prediction of Ŷi using X̄ is Ȳ

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 56 / 64
Fundamentals of Regression Analysis Appendix

Proof #1

Ŷi = β̂0 + β̂1 Xi

Ŷi = (Ȳ − β̂1 X̄) + β̂1 Xi
n n n n
1 X 1X 1X 1X
Ŷi = Ȳ − β̂1 X̄ + β̂1 Xi
n n n n
i=1 i=1 i=1 i=1

Ȳˆ = Ȳ − β̂1 X̄ + β̂1 X̄

Ȳˆ = Ȳ

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 57 / 64
Fundamentals of Regression Analysis Appendix

Proof #2

ûi = Yi − β̂0 − β̂1 Xi

β̂0 = Ȳ − β̂1 X̄
ûi = (Yi − Ȳ) − β̂1 (Xi − X̄)
n n n
1 X 1X 1X
ûi = (Yi − Ȳ) − β̂1 (Xi − X̄)
n n n
i=1 i=1 i=1
n n n
1X 1X 1X
Since (Yi − Ȳ) = Yi − Ȳ = Ȳ − Ȳ = 0
n n n
i=1 i=1 i=1
n
1X
And the same with β̂1 (Xi − X̄)
n
i=1
n
1X
We have that ûi = 0
n
i=1

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 58 / 64
Fundamentals of Regression Analysis Appendix

Proof #3 (similar for #4)

n
X n
X
ûi Xi = ûi (Xi − X̄)
i=1 i=1
Xn
= ((Yi − Ȳ) − β̂1 (Xi − X̄))(Xi − X̄)
i=1
Xn n
X
= (Yi − Ȳ)(Xi − X̄) − β̂1 (Xi − X̄)2
i=1 i=1
Pn
(Y − Ȳ)(Xi − X̄)
Como β̂1 = Pn i
i=1
2
i=1 (Xi − X̄)
n n Pn n
(Y − Ȳ)(Xi − X̄) X
Pn i
X X
ûi Xi = (Yi − Ȳ)(Xi − X̄) − i=1
2
(Xi − X̄)2 = 0
i=1 i=1 i=1 (Xi − X̄) i=1

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 59 / 64
Fundamentals of Regression Analysis Appendix

Proof #5

n
X n
X
2
STC = (Yi − Ȳ) = (Yi − Ŷi + Ŷi − Ȳ)2
i=1 i=1
Xn Xn n
X
= (Yi − Ŷi )2 + (Ŷi − Ȳ)2 + 2 (Yi − Ŷi )(Ŷi − Ȳ)
i=1 i=1 i=1
n
X
= SRC + SEC + 2 ûi Ŷi = SRC + SEC
i=1

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 60 / 64
Fundamentals of Regression Analysis Appendix

Formal derivation of the OLS estimator when X is a

dichotomous variable

• OLS estimator:
P P
i (X − X̄)(Yi − Ȳ) i Xi (Yi − Ȳ)
β̂1 = Pi 2
= P
i (Xi − X̄) i Xi (Xi − X̄)

• Now X will be a variable that takes only two values

(
1 if individual i receives the treatment (T)
Xi =
0 case contrary (NT)

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 61 / 64
Fundamentals of Regression Analysis Appendix

• Let’s rewrite the OLS estimator in terms of this binary or dummy variable
X
P
• β̂1 = i 1(i∈T)(Yi −Ȳ)
N
1(i∈T)[1(i∈T)− NT ]
P
i

• NT
N is the result of :

1 X 1 X X
X̄ = 1(i ∈ T) = ( NT × 1 + NNT × 0)
N i N
i=1 i=NT +1

.
• Thus, X̄ = NT
N

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 62 / 64
Fundamentals of Regression Analysis Appendix

• Let’s see what the denominator and numerator look like

• The denominator is
X NT X NT X
1(i ∈ T)[1(i ∈ T) − ]= 1(i ∈ T)2 − 1(i ∈ T)
i
N i
N i
NT2
= NT −
N
NT
= NT (1 − )
N
• The numerator is
X X X
1(i ∈ T)(Yi − Ȳ) = 1(i ∈ T)Yi − Ȳ 1(i ∈ T)
i i i

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 63 / 64
Fundamentals of Regression Analysis Appendix

• Rewriting Ȳ as the weighted average in the different groups, we have

1 X NT NNT
NT [ 1(i ∈ T)Yi ] − NT [ E[Yi |Xi = 1] + E[Yi |Xi = 0]
NT i N N

• NT (1 − NT
N )[E[Yi |Xi = 1] − E[Yi |Xi = 0]]
• Therefore,
NT
NT (1 − N )[E[Yi |Xi = 1] − E[Yi |Xi = 0]]
β̂1OLS =
NT (1 − NNT )

• β̂1 = Ȳ T − Ȳ NT

Javier Abellán, Màxim Ventura and Carlos Suárez (UPF) Topic 1 April 3, 2024 64 / 64

Answers Review Questions Econometrics PDF
93% (14)
Answers Review Questions Econometrics PDF
59 pages
Schaum's Outline of Elements of Statistics II: Inferential Statistics
From Everand
Schaum's Outline of Elements of Statistics II: Inferential Statistics
Ruth Bernstein
2/5 (1)
Econometrics Simple Linear Regression
No ratings yet
Econometrics Simple Linear Regression
22 pages
Student Notes Madule 2
No ratings yet
Student Notes Madule 2
12 pages
Fet402 Lec02 2023 Econometrics
No ratings yet
Fet402 Lec02 2023 Econometrics
60 pages
Lecture 3 Classical Linear Regression Model
No ratings yet
Lecture 3 Classical Linear Regression Model
55 pages
Introductory Econometrics For Finance Chris Brooks Solutions To Review - Chapter 3
100% (2)
Introductory Econometrics For Finance Chris Brooks Solutions To Review - Chapter 3
7 pages
Answers Review Questions Econometrics
84% (25)
Answers Review Questions Econometrics
59 pages
Econometrics Test Prep
100% (2)
Econometrics Test Prep
7 pages
Outline _ Simple Regression
No ratings yet
Outline _ Simple Regression
51 pages
Lect2 Part2
No ratings yet
Lect2 Part2
73 pages
Ch2 - Econometrics For Finance (Regression Part)
No ratings yet
Ch2 - Econometrics For Finance (Regression Part)
34 pages
Chapter One Part 1
No ratings yet
Chapter One Part 1
20 pages
Lec3 2019 PDF
No ratings yet
Lec3 2019 PDF
43 pages
Week 5 Notes
No ratings yet
Week 5 Notes
175 pages
The Linear Regression Model
No ratings yet
The Linear Regression Model
25 pages
Ch2 Slides Edited
No ratings yet
Ch2 Slides Edited
66 pages
Midterm 2 Nem Veg Leges
No ratings yet
Midterm 2 Nem Veg Leges
9 pages
Ec 384 Applied Econometrics Topic 1 - 2023
No ratings yet
Ec 384 Applied Econometrics Topic 1 - 2023
99 pages
Econometrics
No ratings yet
Econometrics
84 pages
Unit 1 - Part 1
No ratings yet
Unit 1 - Part 1
105 pages
M2L2 CLRM & Simple Linear Regression Analysis
No ratings yet
M2L2 CLRM & Simple Linear Regression Analysis
13 pages
A Brief Overview of The Classical Linear Regression Model: Introductory Econometrics For Finance' © Chris Brooks 2013 1
No ratings yet
A Brief Overview of The Classical Linear Regression Model: Introductory Econometrics For Finance' © Chris Brooks 2013 1
80 pages
Econometrics: Domodar N. Gujarati
No ratings yet
Econometrics: Domodar N. Gujarati
29 pages
brooks-answers-introductory-econometrics-for-finance
No ratings yet
brooks-answers-introductory-econometrics-for-finance
98 pages
Linear Regression Models
No ratings yet
Linear Regression Models
42 pages
Econometrics: Domodar N. Gujarati
No ratings yet
Econometrics: Domodar N. Gujarati
36 pages
Ch2 Slides
No ratings yet
Ch2 Slides
80 pages
A Brief Overview of The Classical Linear Regression Model: Introductory Econometrics For Finance' © Chris Brooks 2002 1
No ratings yet
A Brief Overview of The Classical Linear Regression Model: Introductory Econometrics For Finance' © Chris Brooks 2002 1
105 pages
qrm2 Session1 2
No ratings yet
qrm2 Session1 2
89 pages
ECS3706-econometric Techniques Discussion Class 2 15-09-2010
No ratings yet
ECS3706-econometric Techniques Discussion Class 2 15-09-2010
33 pages
Linear Regression Models
No ratings yet
Linear Regression Models
41 pages
CH 2
No ratings yet
CH 2
31 pages
Bahan Univariate Linear Regression
No ratings yet
Bahan Univariate Linear Regression
64 pages
Lecture 4
No ratings yet
Lecture 4
18 pages
Econometric Theory: Module - Iii
No ratings yet
Econometric Theory: Module - Iii
10 pages
125.785 Module 2.1
No ratings yet
125.785 Module 2.1
94 pages
Econ20222 MJAbackgr
No ratings yet
Econ20222 MJAbackgr
164 pages
CHAPTER 2
No ratings yet
CHAPTER 2
17 pages
Economics 308: Econometrics Professor Moody: Describing The Relationship Between Two Variables
No ratings yet
Economics 308: Econometrics Professor Moody: Describing The Relationship Between Two Variables
8 pages
ECON3049 Lecture Notes 1
No ratings yet
ECON3049 Lecture Notes 1
32 pages
Business Statistics II
100% (2)
Business Statistics II
100 pages
Lecture 3 - Econometria I
No ratings yet
Lecture 3 - Econometria I
46 pages
Chapter 3 Econometrics
No ratings yet
Chapter 3 Econometrics
67 pages
Eco 3
No ratings yet
Eco 3
68 pages
Classical Linear Regression Model (CLRM)
100% (1)
Classical Linear Regression Model (CLRM)
68 pages
Lec2 ASE
No ratings yet
Lec2 ASE
86 pages
Stata Lecture2
No ratings yet
Stata Lecture2
134 pages
Ch3 Slides Ed4 2024
No ratings yet
Ch3 Slides Ed4 2024
72 pages
Econometric Lec1
No ratings yet
Econometric Lec1
72 pages
Week01 Lecture BB
No ratings yet
Week01 Lecture BB
70 pages
Business Analytics
No ratings yet
Business Analytics
19 pages
Econometrics Chap - 2
No ratings yet
Econometrics Chap - 2
57 pages
Econometrics Notes
No ratings yet
Econometrics Notes
95 pages
Pertemuan 3
No ratings yet
Pertemuan 3
23 pages
Linear Regression
100% (2)
Linear Regression
228 pages
Chapter 2
No ratings yet
Chapter 2
18 pages
Chapter 3 - Classical Simple Linear Regression
No ratings yet
Chapter 3 - Classical Simple Linear Regression
52 pages
BST 32202 LINEAR REGRESSION 6 SLR ASSUMPTIONS LSE
No ratings yet
BST 32202 LINEAR REGRESSION 6 SLR ASSUMPTIONS LSE
20 pages
Learn Econometrics Fast
From Everand
Learn Econometrics Fast
Hesbon R.M
No ratings yet
Seminar 1 Portfolio Theory Solutions
No ratings yet
Seminar 1 Portfolio Theory Solutions
6 pages
Seminar 4 Bonds Solutions
No ratings yet
Seminar 4 Bonds Solutions
12 pages
SEMINAR 6 - Solutions
No ratings yet
SEMINAR 6 - Solutions
3 pages
Seminar 3 Midterm SOLUTIONS
No ratings yet
Seminar 3 Midterm SOLUTIONS
10 pages
SEMINAR 5 - Solutions
No ratings yet
SEMINAR 5 - Solutions
4 pages
Seminar 2 Pricing of Risky Assets Solutions
No ratings yet
Seminar 2 Pricing of Risky Assets Solutions
8 pages
Financial Economics2024 Model 1 Solutions July
No ratings yet
Financial Economics2024 Model 1 Solutions July
9 pages
Financial Economics2024 - Model 2 - Solutions
No ratings yet
Financial Economics2024 - Model 2 - Solutions
9 pages
Finantial Economic PS
No ratings yet
Finantial Economic PS
43 pages
M Phil Research Methodology Model Question Papers
100% (1)
M Phil Research Methodology Model Question Papers
2 pages
Multiple Linear Regression 2021
No ratings yet
Multiple Linear Regression 2021
45 pages
Design of Experiments With Minitab by DR Alvin Ang
No ratings yet
Design of Experiments With Minitab by DR Alvin Ang
99 pages
My Test
No ratings yet
My Test
9 pages
Lesson 12 Practice Problem #5 3T AY1920-1
No ratings yet
Lesson 12 Practice Problem #5 3T AY1920-1
2 pages
Solution To Tutorial 5
No ratings yet
Solution To Tutorial 5
8 pages
Dixon-MacCallum Graham MSC 2013
No ratings yet
Dixon-MacCallum Graham MSC 2013
76 pages
+part 04 - AMEFA - 2024 - Introduction and Repetition
No ratings yet
+part 04 - AMEFA - 2024 - Introduction and Repetition
69 pages
04lecture CEE106 Measurements-Errors
No ratings yet
04lecture CEE106 Measurements-Errors
27 pages
Precision Statements in UOP Methods
No ratings yet
Precision Statements in UOP Methods
15 pages
Worksheet Statistics
No ratings yet
Worksheet Statistics
12 pages
Appendix A Statistical Tables and Proofs
No ratings yet
Appendix A Statistical Tables and Proofs
28 pages
Hypothesis Testing - Interview Questions in Business Analytics
No ratings yet
Hypothesis Testing - Interview Questions in Business Analytics
4 pages
Stress Levels Among University Students
No ratings yet
Stress Levels Among University Students
6 pages
Pickleball Sport Sustainability Strategies: The Philippine Higher Education Institutions Context
No ratings yet
Pickleball Sport Sustainability Strategies: The Philippine Higher Education Institutions Context
14 pages
Drummond Et Al. (2020)
No ratings yet
Drummond Et Al. (2020)
13 pages
Transforming Normal To Standard Normal
No ratings yet
Transforming Normal To Standard Normal
14 pages
Exercise 5E: R S S S
No ratings yet
Exercise 5E: R S S S
4 pages
Uniglobe College: Lesson Plan: Business Statistics
No ratings yet
Uniglobe College: Lesson Plan: Business Statistics
6 pages
Hubungan Antara Kecerdasan Emosional Dengan Hasil Belajar Ips Siswa Kelas V11 SMP Negeri 10 Kendari
No ratings yet
Hubungan Antara Kecerdasan Emosional Dengan Hasil Belajar Ips Siswa Kelas V11 SMP Negeri 10 Kendari
13 pages
The Statistics Tutor's: Quick Guide To Commonly Used Statistical Tests
No ratings yet
The Statistics Tutor's: Quick Guide To Commonly Used Statistical Tests
53 pages
Linear Regression
No ratings yet
Linear Regression
29 pages
Reg Analysis
No ratings yet
Reg Analysis
63 pages
Exp Design 1
No ratings yet
Exp Design 1
4 pages
Probability Distributions
No ratings yet
Probability Distributions
4 pages
Height and Basketball True Shooting Percentage
No ratings yet
Height and Basketball True Shooting Percentage
15 pages
Anova and Design of Experiments
No ratings yet
Anova and Design of Experiments
35 pages
Statistics and Probability Module 5
No ratings yet
Statistics and Probability Module 5
53 pages
BR Chapter 6 - Sample Design and Sampling Procedure
No ratings yet
BR Chapter 6 - Sample Design and Sampling Procedure
18 pages

Lect2 Part1

Uploaded by

Lect2 Part1

Uploaded by

Econometrics I: Fundamentals of Regression Analysis

Javier Abellán, Màxim Ventura and Carlos Suárez

Universitat Pompeu Fabra

1. The role of Econometrics: simple regression model

The role of Econometrics: simple regression model

Empirical problem: Size of an apartment and its price

• Accordingly, the area of an apartment i is linearly related to its price,

Stata code for the scatterplot

• Of course, many other factors probably influence the price of an

• What does it means that β0 and β1 are unknown population parameters?

Different types of data

• Experimental data are observations that comes from an

Cross Sectional vs Time Series

• Cross-sectional data are observations that come from

The Ordinary Least Squares (OLS) estimator

The OLS estimator

• The OLS estimator of β0 and β1 solves the following minimisation

• β̂1 is the OLS estimator of β1

First order conditions of the OLS estimators

Some interpretations of the first order conditions

From the minimization problem, the first order conditions are:

1. The prediction of Ŷi for X̄ is Ȳ

Example of the first order conditions

Interpretation of the OLS coefficients

Interpretation of the estimated coefficients for β̂0 and β̂1

• Since the regression model is a linear model,

• Similarly, we can interpret β̂1 as the predicted (or adjusted) change in Y

Stata output from the OLS regression

Stata output interpretation

Stata output interpretation

• β̂0 = −31353.4 euros

Interpretation of the estimated coefficient when X is a

• In many situations the regressor will be a binary variable:

• Let’s estimate Yi = β0 + β1 Xi + ui but now Xi is equal to 1 if the dwelling

• As we can see, the OLS estimator is just a generalisation of the sample

Things to bear in mind regarding the linear regression

Measures of goodness of fit

Measures of goodness of fit

• The R2 tells us the fraction of the variance of Yi that is explained or

• R2 can also be written in term of the residuals

var(Yi ) = var(Ŷi ) + var(ûi )

• The R2 ranges from 0 to 1.

• R2 is a measure of the goodness of the fit regarding the initial question

Standard Error of the Regression (SER)

The standard error of the regression is (almost) the sample standard

• Division by n − 2 is an adjustment to correct for the downward bias for

Derivation of the OLS estimators for the simple regression

• Let’s go over the algebra to derive the OLS estimators

Minb0 ,b1 i û2i = Minb0 ,b1 i (Yi − b0 − b1 Xi )2

• From the first order conditions

• Working out with the first order condition

• Working out with the second order condition

• Substituting the previous result b0 = Ȳ − b1 X̄i

• Working out with the second order condition

• Let’s show that i Xi (Xi − X̄) = (Xi − X̄)2 and

• And similar for

• Therefore, the parameters that minimize the sum of residuals square in

• And using the definition of sample covariance and variance we can

Algebraic proof of the results from the first order conditions

1. The prediction of Ŷi using X̄ is Ȳ

Ŷi = β̂0 + β̂1 Xi

Ȳˆ = Ȳ − β̂1 X̄ + β̂1 X̄

ûi = Yi − β̂0 − β̂1 Xi

Proof #3 (similar for #4)

Formal derivation of the OLS estimator when X is a

• Now X will be a variable that takes only two values

• Let’s see what the denominator and numerator look like

• Rewriting Ȳ as the weighted average in the different groups, we have

You might also like