L2 SLR model 2
L2 SLR model 2
n n n
∂S X X X X
= −2[yi −β̂0 −β̂1 xi ]xi = 0 ⇒ xi yi = β̂0 xi +β̂1 xi2 (2)
∂β1
i i i
Eco 221 students of 2024 asked how many samosas they would buy
per month at various prices. Here are some summary statistics:
------------------------------------------------------------------------------
quantity | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
price | -.2929574 .062975 -4.65 0.000 -.416835 -.1690799
_cons | 11.99239 1.012088 11.85 0.000 10.00152 13.98326
------------------------------------------------------------------------------
. disp -11.00085*336 // This is the sum of q1p1 (the numerator of the OLS estimator)
-3696.2856
. disp 37.5512*336 // This is the sum of p1sq (the denominator of the OLS estimator)
12617.203
. disp -3696.2856/12617.203
-.29295602 // This is the estimated slope coefficient (with some rounding error)
Algebraic properties of the SLR model
n
X n
X
2
SST = (yi − ȳ ) = [(yi − ŷi ) + (ŷi − ȳ )]2
i=1 i=1
n
X
[ûi + (ŷi − ȳ )]2
i=1
n
X n
X n
X
= ûi2 +2 ûi (ŷi − ȳ ) + (ŷi − ȳ )2
i=1 i=1 i=1
= SSR + 0 + SSE
The second term is zero by application of algebraic property 3 as
noted above
Goodness of fit
Clearly, 0 ≤ R 2 ≤ 1
It is the proportion of variation in y that is explained by x It
provides a visual assessment of how good a job we did Pin 2
minimizing the (scaled by y) distance represented by ûi . The
2
higher the R the better the fit.
SST, SSR, SSE and R 2 in our model
------------------------------------------------------------------------------
quantity | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
price | -.2929574 .062975 -4.65 0.000 -.416835 -.1690799
_cons | 11.99239 1.012088 11.85 0.000 10.00152 13.98326
------------------------------------------------------------------------------
The OLS estimators of β̂0 and β̂1 again
Note that while the β0 and β1 are population parameters and are
therefore given scalars, the OLS estimators β̂0 and β̂1 are random
variables. The actual magnitudes of the estimated slopes and
constants depend on the given sample that happens to be drawn.
To see this, let’s draw 15 random samples from our data set of 336
observations, and run the same OLS regression on each of these 15
samples.
. splitsample, generate(samp) nsplit(15)
. tab samp
------------------------------------------------------------------------------
quantity | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
price | -.4074795 .2300374 -1.77 0.092 -.8873291 .0723702
_cons | 13.16623 4.239979 3.11 0.006 4.321793 22.01068
------------------------------------------------------------------------------
• Unbiasedness
• Efficiency (minimum variance) [we will cover this later]
These are finite sample properties
In the SLR case, we want β̂0 and β̂1 to be unbiased. I.e. that
E[β̂1 ] = β1 and E[β̂0 ] = β0