0% found this document useful (0 votes)
15 views

MLR Note

The document discusses multiple linear regression models. It notes that multiple regression allows incorporating more explanatory factors into the model while holding other factors fixed. It estimates coefficients using ordinary least squares to minimize the sum of squared residuals. The key assumption is that the error term is uncorrelated with the explanatory variables and all factors are correctly accounted for.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

MLR Note

The document discusses multiple linear regression models. It notes that multiple regression allows incorporating more explanatory factors into the model while holding other factors fixed. It estimates coefficients using ordinary least squares to minimize the sum of squared residuals. The key assumption is that the error term is uncorrelated with the explanatory variables and all factors are correctly accounted for.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Multiple Regression Model


Multiple Linear Regression
.
Estimation
• the model

Y =
Bo t
B, X ,
t . . . +
BKXK 1- U

with

Bo :
intercept and BK :
parameter associated with XK

• advantages of MLR

\ > incorporate more explanatory factors into the model

\ explicitly
I
> hold fixed other factors that otherwise would be in u

> allow for more flexible functional forms

'

key assumption
• the key assumption for the general multiple regression model is :

E ( UIX ,
, Xz ,
. . . , XK ) = 0 > Zero conditional mean assumption
×-

\ all factors in the unobserved term uncorrelated with the explanatory variables

I
> error are

between the explanatory variables


correctly accounted functional relationships explained

I
> for the s

that causes to be correlated with


any independent FAILS
> any problem
U variables >
key assumption


Estimation

Sample regression function ( SRF )
=
Rot Bix ,
t . . .
+
pink
↳ OLS estimates are chosen to minimize the residuals
sum of squared

.(
'
ñi )
minpi.pl ,
. . .
. pic

pi;§ 5;)
'

pi pi
min ( Yi -

, ,
. . .
,

'

minsi.pe?....pi.&lyi-pi-pix ,
-
. . .
-

pick ) .

• 015 first order conditions

↳ Closed form solution for § ( pi pi


=
,
,
. . .
, pic ) is cumbersome without the use of matrix algebra

Interpretation
general case :

D= pi pix +
,
+
pixzt . . - +
pic ✗k


p^o represents the average value
of y^ when ✗ 1=0 ,
✗ 2=0 . . . XK = 0 > all explanatory variables =
0


pic for K >0 have partial effect interpretations

iy =p? ×, + pi xzt . . .
+ pi k

↳ if y^ =p
? ceteris paribus interpretation
(
fixed
>
we hold ✗k , ×,
= 0 > ×
,

we have controlled for the variables Xz ,


✗}
,
- . .

, XK When estimating the effect of ✗, on


Y


example

wage equation
:
Wa^ge =
pi pieductpiexper
+

pi measures effect of education on hourly wage explicitly holding experience fixed


↳ if that Elul educ expert

(
the assumption ,
> this estimator is quite powerful

• able to estimate the effect of a variable "^ " "" "


" " me "" environment

• controlled laboratory setting is used to keep other factors fixed

> can we change more than one independent variable simultaneously ? Yes ,
but we wouldnt be able to identify the effect that comes from a
specific variable

-
Goodness of fit

decomposition of total variation

SST = SSE + SSR


SSE SSR
Ñ :
=
1 -

Sst
£7 , ( Yi IT
Sst
↳ SST =
.
-

SSE =L? ( Yi F) -
' ( > doesn't reflect on a causal estimate
= ,

of the model
( Yi Yi ) :{Intuit
"

SSR =
.
Eli : ,
-
Multiple Regression Model
· R-squared

I high R-squared/c

I
there is a causal interpretation

low R-squared/> preclude precise estimation of partial effects

adding more variables (relevant or irrelevant) will increase R-squared

· adjusted R-squared

= SSR/(n-x 1) -

R 1 -

adjusted R2
=

SS/cn --)
↳ adjusted R-squared imposes penalty for adding new
regressors
I testatistic than

1.
increases ifand only if the of a newly added regressor is greater one in absolute value

could take negative value

·
When there is no
explanatory power atall due the
to
adding of new datas

Gauss-Markov assumption
·
assumption linearin parameters

the DGP can be written as

Y Bu iX1
= +
...
+

BxXx
+ +
u

·
assumption random sampling

random sample ((Yi, Xi, . .


.,
Xxi) (i 1,2,
=
. .
., n3 with n2K + 1

·
assumption no perfectcolinearity

↳ there is variation in each X and no X can be written as a linear combination ofother


regressors

·
example

<ShareA:1- ShareB <linear function, ShareA+share > (perfectcolinearity) OLS fails


=

>

I didn'tviolate the no perfectcolinearity


assumption

non-linear relationship

s
equals to
2pzlog(inc) > linear function Beloglinc)
to >VIOLATION

I needed full attention on when we're


working on small data set

· assumption zero conditional mean (MostImportant) Note:


*

error term c E(ulX, .


. .
.,
xx) 0
=

endogenous is when explanatory variables are correlated with the error

more likely hold


to a fewer things end up in the error (X is
exogenous to the unobservable variables term endogeneity is a violation ECM assumption
of

I
>much

exogeneity is the key assumption for interpretation of ofthe OLS estimators


a causal the regressions for the unbiasedness

·
assumption homoskedasticity

Var(ulx, xx)
2
.
. .
., 8
=

> doesn'tmake sense realistically

Simple us multiple linear


regression
Y Bo BX,
= +

compared Y B+Bx Baxz


to
= +

in general Bet, except,


B 0
=
or X, Xe are c orrelated
not

Cov(X1,x2) 0
=

·
example

bright o+ B: cigs
=

maybe assumed

bright Bo+ Bicig's Befamine >Since famine correlated, I


+
cigs s are
=

correlated -
B2 f0
Multiple Regression Model
• scenarios

↳ an independent variable X , , is said to be irrelevant if its coefficient equals zero pi =


0

true model > Y Bo Bix ,


-
-
+ + Bzxz + U > in the population

Y Bi pi × irrelevant variable > Bi Spi will not be bias


pi pi
>
estimation of the model > : +
,
+ ×, + ×,
unbiased > if it still fulfills the assumptions
undesirable effects variances confident in the estimate
> can have on the of the OLS estimators > variances will be higher > not

> omitted variable bias or


excluding a relevant variable from a
regression will lead to a bias

"

true model > Y Bot Bix,


-
- + Bzxztu
1-

important variables to explain Y


> Xz is excluded

simple regression model > Y =


Bo +
Bix, + v

> Bi is biased


example

✗, s Xz are correlated s assume


they have a linear relationship

✗z =
Jot 8, X, + V

true model > Y =


pot p, Xi +
Pz ( 80+8 , ×, + v ) + u

Y :( But Piso ) + ( , + pas, )X,


p
+ ( pzvtu ) > Bisi =
+ or pigj = + > overestimate

Positive
estimate model > E- ( pi ) = Elfin + Bisi ) =
Elpi ) +
Elpisi ) =p ,
+
past > omitted variable bias because we exclude Xz to the
regression model
negative
} determined by the value of Bz S 81

with pi pi pisi
-
- + where Xi I = + six , > pi si = -
,
vice versa > underestimate

> relationship between the variable that is omitted s the outcome variable ( relationship between ✗ S Y ) > effect of Xz on Y

> relation Sip between ×, and Xz > Correlation of X, and Xz

*
if there omitted variables the most
Note : are more than I >
focus on

important variable in the determination of the outcome variables correlated to X,

Homoskedasticity
if assumptions 1- 5 are
fulfilled
> variance of the error term
2
g-
var (pi ) =

sstj 11 RF ) -

> R squared
-
from a
regression of explanatory variable
Xj on all other independent variables ( including a constant )

&? ( Xij
'
> - I;)
= ,


Components of OLS variances

> the error variance :O


'

high sampling
" "
• error variances increases the variance > due to more noise in the equation


large error variance necessarily makes estimates imprecise


error variance does not decrease with sample site

>
the total sample variation in the
explanatory variable : SST;

• more sample variation leads to more precise estimates

• total sample variation automatically increases with the sample size

>
increasing sample size > more precise estimates > lesser sample variation

:(1- Rj ) 1)
'
( Rj
'

> linear relationships among the independent variables >


Perfect Caine arity = > 1- Rj :O > Vari ace will be very large low precision
,


sampling variance
of pi > the higher the better >
explanatory variable ( Xj ) can be linearly explained by other
independent variables

• multi co linearity is the problem of almost


linearly dependent explanatory variables ( Rj > 1 for some j)

key takeaway

full fills MLR I .
-
4 : unbiased estimate pi
fulfills MLR I .
-
5 =
best linear unbiased estimate

• OLS estimator have the smallest variance

You might also like