0% found this document useful (0 votes)
21 views30 pages

Lec4 ppt2019

The document discusses linear regression models and statistical concepts related to linear regression including sampling distributions of least squares estimators, confidence intervals, t-tests, estimation of mean response, prediction of new observations, prediction intervals, extrapolation, and analysis of variance approach.

Uploaded by

lcaccompany
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views30 pages

Lec4 ppt2019

The document discusses linear regression models and statistical concepts related to linear regression including sampling distributions of least squares estimators, confidence intervals, t-tests, estimation of mean response, prediction of new observations, prediction intervals, extrapolation, and analysis of variance approach.

Uploaded by

lcaccompany
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Stat 206: Linear Models

Lecture 4

October 7, 2019
ReCap: Sampling Distributions of LS Estimators

Under the Normal error model:


• β̂0 , β̂1 are normally distributed:

β̂0 ∼ N (β0 , σ2 {β̂0 }), β̂1 ∼ N (β1 , σ2 {β̂1 }).


• SSE /σ2 follows a χ2 distribution with n − 2 degrees of
freedom, denoted by χ2(n−2) .
• Moreover, SSE is independent with both β̂0 and β̂1 (because
residuals ei ’s are independent with β̂0 and β̂1 ).
Recap: Confidence Interval

(1 − α)-Confidence interval of β1 :

β̂1 ± t (1 − α/2; n − 2)s {β̂1 },


where t (1 − α/2; n − 2) is the (1 − α/2)th percentile of t(n−2) .

How to construct confidence intervals for β0 ?


Interpretation of Confidence Intervals

Figure: A Simulation Study

90% CIs for beta_1

0 1 2 3

beta_1
Heights

• Recall n = 928, X = 68.316, Xi2 = 4334058,


P
i
− n(X ) = 3038.761. Also
Pn 2 Pn 2 2
i =1 (Xi − X) = i =1 Xi

β̂0 = 24.54, β̂1 = 0.637, MSE = 5.031.

So r
5.031
s {β̂1 } = = 0.0407.
3038.761
• 95%-confidence interval of β1 :

0.637 ± t (0.975; 926) × 0.0407 = 0.637 ± 1.963 × 0.0407


= [0.557, 0.717].

We are 95% confident that the regression slope is in between


0.557 and 0.717.
T-tests

(0) (0)
• Null hypothesis: H0 : β1 = β1 , where β1 is a given constant.
• T-statistic:

• Null distribution of the T-statistic:


(0)
Under H0 : β1 = β1 , T ∗ ∼ t(n−2) .
Decision rule at significance level α.
(0)
• Two-sided alternative Ha : β1 , β1 : Reject H0 if and only if
|T ∗ | > t (1 − α/2; n − 2), or equivalently, reject H0 if and only if
pvalue:= P (|t(n−2) | > |T ∗ |) < α.
(0)
• Left-sided alternative Ha : β1 < β1 : Reject H0 if and only if
T ∗ < t (α; n − 2), or equivalently, reject H0 if and only if
pvalue:= P (t(n−2) < T ∗ ) < α.
(0)
• Right-sided alternative Ha : β1 > β1 : Reject H0 if and only
if T ∗ > t (1 − α; n − 2), or equivalently, reject H0 if and only if
pvalue:= P (t(n−2) > T ∗ ) < α.
The decision rule depends on the form of
Heights

Test whether there is a linear association between parent’s height


and child’s height. Use significance level α = 0.01.
• The hypotheses: H0 : β1 = 0 vs . Ha : β1 , 0.
• T statistic: T ∗ = β̂1 −0 = 00.0407
.637
= 15.7.
s {β̂1 }
• Critical value: t (1 − 0.01/2; 928 − 2) = 2.58. Since the
observed T ∗ = |15.7| > 2.58, reject the null hypothesis at level
0.01.
• Or the pvalue = P (|t(926) | > |15.7|) ≈ 0. Since
pvalue < α = 0.01, reject the null hypothesis at level 0.01.
• Conclude that there is a significant association between
parent’s height and child’s height.
Estimation of Mean Response
Given X = Xh , the mean response is E (Yh ) =
• An unbiased point estimator for E (Yh ) is :

• Variance of Y
bh is:

Notes: Use the fact that Y and β̂1 are uncorrelated.


• The larger the sample size and/or the larger the dispersion of
Xi s, the
the variance of Y
bh .
Figure: Effects of the distance of Xh from X on variability of Y
bh .

From Applied Linear Statistical Models by Kutner, Nachtsheim, Neter and Li

The further is Xh from X , the

bh : The variability in the estimated slope β̂1 has a


is the variance of Y
effect on Yh when Xh is further away from the sample
b
mean X .
• Standard error of Y
bh :
v
t  
 1 (Xh − X )2 
bh } =
s {Y MSE  + P  .
n n 2
i =1 (X i − X )

• Under the Normal error model, Y


bh is normally distributed.
• Studentized quantity:

bh − E (Yh )
Y
∼ t(n−2) .
s (Y
bh )

• (1 − α)- C.I.
bh ± t (1 − α/2; n − 2)s (Y
Y bh ).

• The half-width of (1 − α)- C.I., t (1 − α/2; n − 2)s (Y


bh ),
with the confidence coefficient (1 − α) and the standard error
s (Y
bh ).
Heights
Estimate the average height of children of 70in tall parents.
• Recall: n = 928, X = 68.316, − X )2 = 3038.761,
Pn
i =1 (Xi
E[(Y ) = 24.54 + 0.637X and MSE = 5.031.
bh = 24.54 + 0.637 × 70 = 69.2.
• Y
• Standard error:
s
(70 − 68.316)2
( )
1
bh } =
s {Y 5.031 × + = 0.1.
928 3038.761

• 95%-confidence interval of E (Yh ):

69.2 ± 1.8831 × 0.1 = [68.96, 69.35], t (0.975; 926) = 1.8831.

• We are 95% confident that the average height of children of


70in parents is between [68.96in, 69.35in].
Prediction of New Observation

Predict a Yh (new ) of the response variable


corresponding to a given level of the predictor variable X = Xh .
• Yh (new ) = β0 + β1 Xh + h .
• This is a new observation, so h is assumed to be with
i s.
• Consequently, Yh (new ) is
with the observed Yi s.
• The predicted value for Yh (new )
Distinction between prediction and mean estimation.
• Yh (new ) is a “moving target” as it is a random variable. On the
contrary, E (Yh ) is a fixed non-random quantity.
• There are sources in of variations in the prediction
process:

Note the difference between s 2 {Y


bh } and s 2 {predh }.
Prediction Intervals
• Studentized quantity:
bh − Yh (new )
Y
.
s (predh )
• Under the Normal error model, it follows a t(n−2) distribution.
• (1 − α)− prediction interval of Yh (new ) :

bh ± t (1 − α/2; n − 2)s (predh ).


Y

• Prediction interval is

than the corresponding confidence interval of the mean


response.
• With sample size becomes very large, the width of the
confidence interval tend to
but this would not happen for the prediction
interval.
Heights

What would be the predicted height of the child of a 70in tall


couple?
bh = 24.54 + 0.637 × 70 = 69.2. Standard
• Predicted height: Y
error:
s
(70 − 68.316)2
( )
1
s {predh } = 5.031 × 1 + + = 2.25.
928 3038.761

• 95% prediction interval:

69.2 ± 1.8831 × 2.25 = [64.75, 73.56], t (0.975; 926) = 1.8831.

• We are 95% confident that the child’s height will be in


between [64.75in, 73.56in].
Extrapolation
Extrapolation occurs when predicting the response variable for
values of the predictor variable lying
the range of the observed data.
• Every model has a range of validity. Particularly, a model may
be inappropriate when it is extended outside of the range of
the observations upon which it was built.
• Extrapolations are often much
reliable than interpolation and need to be
handled with caution, even though they can be of more
interests to us (e.g. fortune telling).
• In the Heights example: Extrapolation would happen if we use
the fitted regression line to predict heights of children of

parents.
Analysis of Variance Approach

The basic idea of ANOVA is to attributing variation in the data to


different sources.
• In regression, the variation in the observations Yi is attributed
to:


• ANOVA is performed through:
• Partitioning sums of squares;
• Partitioning degrees of freedoms;
Partition of Total Deviations

• Total deviations: Difference between Yi and the sample


mean Y :
Yi − Y , i = 1, · · · , n.7
• Total deviations can be decomposed into the sum of two
terms:

i.e., the deviation of observed value around the fitted


regression line – and the deviation of fitted value from
the mean.
Figure: Partition of total deviation.

From Applied Linear Statistical Models by Kutner, Nachtsheim, Neter and Li


Decomposition of Total Variation
Sum of Squares
• Total sum of squares (SSTO):

This is the variation of the observed Yi s around their sample


mean.
• Error sum of squares (SSE):

This is the variation of the observed Yi s around the fitted


regression line.
• Regression sum of squares (SSR):

This is the variation of the fitted values around the sample


mean. The the regression slope and the
dispersion in Xi s, the larger is SSR.
• SSR = SSTO − SSE is the effect of X in the
variation in Y through linear regression.
• In other words, SSR is the in
predicting Y by utilizing the predictor X through a linear
regression model.
1 Pn
What is n i =1 Y
bi ?
Expected Values of SS
• Expected values of SS:

What is E (SSTO )?
• Mean squares (MS): = SS / df(SS)

SSE SSE SSR SSR


MSE = = , MSR = = .
d.f.(SSE ) n−2 d.f.(SSR ) 1

• Expected values of MS:


n
X
E (MSE ) = σ2 , E (MSR ) = σ2 + β21 (Xi − X )2 .
i =1
Under Normal error model.
• SSE ∼ σ2 χ2(n−2)
• SSE and SSR are independent.
Notes: Recall SSE and β̂1 are independent.
F Test

• H0 : β1 = 0 versus Ha : β1 , 0.
• F ratio:
MSR SSR /1
F∗ = = .
MSE SSE /(n − 2)
β2
Pn
(X −X )2
• F ∗ fluctuates around 1 + 1 i=1σ2 i .
• A large value of F ∗ means evidence against H0 .
• Null distribution of F ∗ :

F∗ ∼ F1,n−2 .
H0 :β1 =0

Notes: Use the fact that if Z1 ∼ χ2(df ) , Z2 ∼ χ2(df ) and Z1 , Z2


1 2
Z1 /df1
independent, then Z2 /df2 ∼ Fdf1 ,df2 .
• Decision rule at level α:

reject H0 if F ∗ > F (1 − α; 1, n − 2),

where F (1 − α; 1, n − 2) is the (1 − α)-percentile of the F1,n−2


distribution.
• In simple linear regression, the F-test is equivalent to the
t-test for testing H0 : β1 = 0 versus Ha : β1 , 0.
Check the following.
β̂1
• F ∗ = (T ∗ )2 where T ∗ = is the T -statistic.
s (β̂1 )
• F (1 − α; 1, n − 2) = t (1 − α/2; n − 2).
2
ANOVA Table

ANOVA table for simple linear regression.


Source SS d.f. MS=SS/d.f. F∗
of Variation
SSR = ni=1 (Y
bi − Y )2 MSR = SSR /1 F ∗ = MSR/MSE
P
Regression d.f.(SSR ) = 1
MSE = SSE /(n − 2)
Pn bi )2
Error SSE = i =1 (Yi − Y d.f.(SSE ) = n − 2
SSTO = ni=1 (Yi − Y )2 MSTO = SSTO /(n − 1)
P
Total d.f.(SSTO ) = n − 1
Heights

n = 928, X = 68.31578, Y = 68.08227, 2 P


i Xi =
4334058, i Yi = 4307355, i Xi Yi = 4318152, β̂1 =
P 2 P

0.637, β̂0 = 24.54.


n
X n
X
SSTO = (Yi − Y )2 = Yi2 − n(Y )2
i =1 i =1
= 4307355 − 928 × 68.082272 = 5893.
Xn n
X
SSR = bi − Y )2 = β̂2
(Y (Xi − X )2
1
i =1 i =1
h i
= 0.637 × 4334058 − 928 × 68.315782 = 1234.
2

SSE = SSTO − SSR = 4659.


Heights (Cont’d)

Source SS d.f. MS=SS/d.f. F∗


of Variation
Regression SSR = 1234 d.f.(SSR ) = 1 MSR = 1234 F ∗ = MSR /MSE = 245
Error SSE = 4659 d.f.(SSE ) = 926 MSE = 5.03
Total SSTO = 5893 d.f.(SSTO ) = 927 MSTO = 6.36

• Test whether there is a linear association between parent’s


height and child’s height. Use significance level α = 0.01.
• F (0.99; 1, 926) = 6.66 < F ∗ = 245, so reject H0 : β1 = 0 and
conclude that there is a significant linear association between
parent’s height and child’s height.
• Recall T ∗ = 15.66, t (0.995; 926) = 2.58 and check:

15.662 = 245, 2.582 = 6.66.

You might also like