0% found this document useful (0 votes)

21 views30 pages

Lec4 ppt2019

The document discusses linear regression models and statistical concepts related to linear regression including sampling distributions of least squares estimators, confidence intervals, t-tests, estimation of mean response, prediction of new observations, prediction intervals, extrapolation, and analysis of variance approach.

Uploaded by

lcaccompany

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views30 pages

Lec4 ppt2019

Uploaded by

lcaccompany

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Stat 206: Linear Models

Lecture 4

October 7, 2019
ReCap: Sampling Distributions of LS Estimators

Under the Normal error model:

• β̂0 , β̂1 are normally distributed:

β̂0 ∼ N (β0 , σ2 {β̂0 }), β̂1 ∼ N (β1 , σ2 {β̂1 }).

• SSE /σ2 follows a χ2 distribution with n − 2 degrees of
freedom, denoted by χ2(n−2) .
• Moreover, SSE is independent with both β̂0 and β̂1 (because
residuals ei ’s are independent with β̂0 and β̂1 ).
Recap: Confidence Interval

(1 − α)-Confidence interval of β1 :

β̂1 ± t (1 − α/2; n − 2)s {β̂1 },

where t (1 − α/2; n − 2) is the (1 − α/2)th percentile of t(n−2) .

How to construct confidence intervals for β0 ?

Interpretation of Confidence Intervals

Figure: A Simulation Study

90% CIs for beta_1

0 1 2 3

beta_1
Heights

• Recall n = 928, X = 68.316, Xi2 = 4334058,

P
i
− n(X ) = 3038.761. Also
Pn 2 Pn 2 2
i =1 (Xi − X) = i =1 Xi

β̂0 = 24.54, β̂1 = 0.637, MSE = 5.031.

So r
5.031
s {β̂1 } = = 0.0407.
3038.761
• 95%-confidence interval of β1 :

0.637 ± t (0.975; 926) × 0.0407 = 0.637 ± 1.963 × 0.0407

= [0.557, 0.717].

We are 95% confident that the regression slope is in between

0.557 and 0.717.
T-tests

(0) (0)
• Null hypothesis: H0 : β1 = β1 , where β1 is a given constant.
• T-statistic:

• Null distribution of the T-statistic:

(0)
Under H0 : β1 = β1 , T ∗ ∼ t(n−2) .
Decision rule at significance level α.
(0)
• Two-sided alternative Ha : β1 , β1 : Reject H0 if and only if
|T ∗ | > t (1 − α/2; n − 2), or equivalently, reject H0 if and only if
pvalue:= P (|t(n−2) | > |T ∗ |) < α.
(0)
• Left-sided alternative Ha : β1 < β1 : Reject H0 if and only if
T ∗ < t (α; n − 2), or equivalently, reject H0 if and only if
pvalue:= P (t(n−2) < T ∗ ) < α.
(0)
• Right-sided alternative Ha : β1 > β1 : Reject H0 if and only
if T ∗ > t (1 − α; n − 2), or equivalently, reject H0 if and only if
pvalue:= P (t(n−2) > T ∗ ) < α.
The decision rule depends on the form of
Heights

Test whether there is a linear association between parent’s height

and child’s height. Use significance level α = 0.01.
• The hypotheses: H0 : β1 = 0 vs . Ha : β1 , 0.
• T statistic: T ∗ = β̂1 −0 = 00.0407
.637
= 15.7.
s {β̂1 }
• Critical value: t (1 − 0.01/2; 928 − 2) = 2.58. Since the
observed T ∗ = |15.7| > 2.58, reject the null hypothesis at level
0.01.
• Or the pvalue = P (|t(926) | > |15.7|) ≈ 0. Since
pvalue < α = 0.01, reject the null hypothesis at level 0.01.
• Conclude that there is a significant association between
parent’s height and child’s height.
Estimation of Mean Response
Given X = Xh , the mean response is E (Yh ) =
• An unbiased point estimator for E (Yh ) is :

• Variance of Y
bh is:

Notes: Use the fact that Y and β̂1 are uncorrelated.

• The larger the sample size and/or the larger the dispersion of
Xi s, the
the variance of Y
bh .
Figure: Effects of the distance of Xh from X on variability of Y
bh .

From Applied Linear Statistical Models by Kutner, Nachtsheim, Neter and Li

The further is Xh from X , the

bh : The variability in the estimated slope β̂1 has a

is the variance of Y
effect on Yh when Xh is further away from the sample
b
mean X .
• Standard error of Y
bh :
v
t  
 1 (Xh − X )2 
bh } =
s {Y MSE  + P  .
n n 2
i =1 (X i − X )

• Under the Normal error model, Y

bh is normally distributed.
• Studentized quantity:

bh − E (Yh )
Y
∼ t(n−2) .
s (Y
bh )

• (1 − α)- C.I.
bh ± t (1 − α/2; n − 2)s (Y
Y bh ).

• The half-width of (1 − α)- C.I., t (1 − α/2; n − 2)s (Y

bh ),
with the confidence coefficient (1 − α) and the standard error
s (Y
bh ).
Heights
Estimate the average height of children of 70in tall parents.
• Recall: n = 928, X = 68.316, − X )2 = 3038.761,
Pn
i =1 (Xi
E[(Y ) = 24.54 + 0.637X and MSE = 5.031.
bh = 24.54 + 0.637 × 70 = 69.2.
• Y
• Standard error:
s
(70 − 68.316)2
( )
1
bh } =
s {Y 5.031 × + = 0.1.
928 3038.761

• 95%-confidence interval of E (Yh ):

69.2 ± 1.8831 × 0.1 = [68.96, 69.35], t (0.975; 926) = 1.8831.

• We are 95% confident that the average height of children of

70in parents is between [68.96in, 69.35in].
Prediction of New Observation

Predict a Yh (new ) of the response variable

corresponding to a given level of the predictor variable X = Xh .
• Yh (new ) = β0 + β1 Xh + h .
• This is a new observation, so h is assumed to be with
i s.
• Consequently, Yh (new ) is
with the observed Yi s.
• The predicted value for Yh (new )
Distinction between prediction and mean estimation.
• Yh (new ) is a “moving target” as it is a random variable. On the
contrary, E (Yh ) is a fixed non-random quantity.
• There are sources in of variations in the prediction
process:

Note the difference between s 2 {Y

bh } and s 2 {predh }.
Prediction Intervals
• Studentized quantity:
bh − Yh (new )
Y
.
s (predh )
• Under the Normal error model, it follows a t(n−2) distribution.
• (1 − α)− prediction interval of Yh (new ) :

bh ± t (1 − α/2; n − 2)s (predh ).

• Prediction interval is

than the corresponding confidence interval of the mean

response.
• With sample size becomes very large, the width of the
confidence interval tend to
but this would not happen for the prediction
interval.
Heights

What would be the predicted height of the child of a 70in tall

couple?
bh = 24.54 + 0.637 × 70 = 69.2. Standard
• Predicted height: Y
error:
s
(70 − 68.316)2
( )
1
s {predh } = 5.031 × 1 + + = 2.25.
928 3038.761

• 95% prediction interval:

69.2 ± 1.8831 × 2.25 = [64.75, 73.56], t (0.975; 926) = 1.8831.

• We are 95% confident that the child’s height will be in

between [64.75in, 73.56in].
Extrapolation
Extrapolation occurs when predicting the response variable for
values of the predictor variable lying
the range of the observed data.
• Every model has a range of validity. Particularly, a model may
be inappropriate when it is extended outside of the range of
the observations upon which it was built.
• Extrapolations are often much
reliable than interpolation and need to be
handled with caution, even though they can be of more
interests to us (e.g. fortune telling).
• In the Heights example: Extrapolation would happen if we use
the fitted regression line to predict heights of children of

parents.
Analysis of Variance Approach

The basic idea of ANOVA is to attributing variation in the data to

different sources.
• In regression, the variation in the observations Yi is attributed
to:
•
•
• ANOVA is performed through:
• Partitioning sums of squares;
• Partitioning degrees of freedoms;
Partition of Total Deviations

• Total deviations: Difference between Yi and the sample

mean Y :
Yi − Y , i = 1, · · · , n.7
• Total deviations can be decomposed into the sum of two
terms:

i.e., the deviation of observed value around the fitted

regression line – and the deviation of fitted value from
the mean.
Figure: Partition of total deviation.

From Applied Linear Statistical Models by Kutner, Nachtsheim, Neter and Li

Decomposition of Total Variation
Sum of Squares
• Total sum of squares (SSTO):

This is the variation of the observed Yi s around their sample

mean.
• Error sum of squares (SSE):

This is the variation of the observed Yi s around the fitted

regression line.
• Regression sum of squares (SSR):

This is the variation of the fitted values around the sample

mean. The the regression slope and the
dispersion in Xi s, the larger is SSR.
• SSR = SSTO − SSE is the effect of X in the
variation in Y through linear regression.
• In other words, SSR is the in
predicting Y by utilizing the predictor X through a linear
regression model.
1 Pn
What is n i =1 Y
bi ?
Expected Values of SS
• Expected values of SS:

What is E (SSTO )?
• Mean squares (MS): = SS / df(SS)

SSE SSE SSR SSR

MSE = = , MSR = = .
d.f.(SSE ) n−2 d.f.(SSR ) 1

• Expected values of MS:

n
X
E (MSE ) = σ2 , E (MSR ) = σ2 + β21 (Xi − X )2 .
i =1
Under Normal error model.
• SSE ∼ σ2 χ2(n−2)
• SSE and SSR are independent.
Notes: Recall SSE and β̂1 are independent.
F Test

• H0 : β1 = 0 versus Ha : β1 , 0.
• F ratio:
MSR SSR /1
F∗ = = .
MSE SSE /(n − 2)
β2
Pn
(X −X )2
• F ∗ fluctuates around 1 + 1 i=1σ2 i .
• A large value of F ∗ means evidence against H0 .
• Null distribution of F ∗ :

F∗ ∼ F1,n−2 .
H0 :β1 =0

Notes: Use the fact that if Z1 ∼ χ2(df ) , Z2 ∼ χ2(df ) and Z1 , Z2

1 2
Z1 /df1
independent, then Z2 /df2 ∼ Fdf1 ,df2 .
• Decision rule at level α:

reject H0 if F ∗ > F (1 − α; 1, n − 2),

where F (1 − α; 1, n − 2) is the (1 − α)-percentile of the F1,n−2

distribution.
• In simple linear regression, the F-test is equivalent to the
t-test for testing H0 : β1 = 0 versus Ha : β1 , 0.
Check the following.
β̂1
• F ∗ = (T ∗ )2 where T ∗ = is the T -statistic.
s (β̂1 )
• F (1 − α; 1, n − 2) = t (1 − α/2; n − 2).
2
ANOVA Table

ANOVA table for simple linear regression.

Source SS d.f. MS=SS/d.f. F∗
of Variation
SSR = ni=1 (Y
bi − Y )2 MSR = SSR /1 F ∗ = MSR/MSE
P
Regression d.f.(SSR ) = 1
MSE = SSE /(n − 2)
Pn bi )2
Error SSE = i =1 (Yi − Y d.f.(SSE ) = n − 2
SSTO = ni=1 (Yi − Y )2 MSTO = SSTO /(n − 1)
P
Total d.f.(SSTO ) = n − 1
Heights

n = 928, X = 68.31578, Y = 68.08227, 2 P

i Xi =
4334058, i Yi = 4307355, i Xi Yi = 4318152, β̂1 =
P 2 P

0.637, β̂0 = 24.54.

n
X n
X
SSTO = (Yi − Y )2 = Yi2 − n(Y )2
i =1 i =1
= 4307355 − 928 × 68.082272 = 5893.
Xn n
X
SSR = bi − Y )2 = β̂2
(Y (Xi − X )2
1
i =1 i =1
h i
= 0.637 × 4334058 − 928 × 68.315782 = 1234.
2

SSE = SSTO − SSR = 4659.

Heights (Cont’d)

Source SS d.f. MS=SS/d.f. F∗

of Variation
Regression SSR = 1234 d.f.(SSR ) = 1 MSR = 1234 F ∗ = MSR /MSE = 245
Error SSE = 4659 d.f.(SSE ) = 926 MSE = 5.03
Total SSTO = 5893 d.f.(SSTO ) = 927 MSTO = 6.36

• Test whether there is a linear association between parent’s

height and child’s height. Use significance level α = 0.01.
• F (0.99; 1, 926) = 6.66 < F ∗ = 245, so reject H0 : β1 = 0 and
conclude that there is a significant linear association between
parent’s height and child’s height.
• Recall T ∗ = 15.66, t (0.995; 926) = 2.58 and check:

15.662 = 245, 2.582 = 6.66.

Excel Cheat Sheet
No ratings yet
Excel Cheat Sheet
36 pages
Cheat Sheet
No ratings yet
Cheat Sheet
4 pages
Applied Linear Regression Models 4th Ed Note
No ratings yet
Applied Linear Regression Models 4th Ed Note
46 pages
Regression Equation
No ratings yet
Regression Equation
56 pages
Stats101A - Chapter 2
No ratings yet
Stats101A - Chapter 2
59 pages
Regression Equation For SI
No ratings yet
Regression Equation For SI
12 pages
Correlation and Regression: Fathers' and Daughters' Heights
No ratings yet
Correlation and Regression: Fathers' and Daughters' Heights
43 pages
PE Civil: Transportation e-book Practice Exam
No ratings yet
PE Civil: Transportation e-book Practice Exam
41 pages
Standard Errors For Regression Equations
No ratings yet
Standard Errors For Regression Equations
4 pages
Notes 516 Summer 09 Part 2
No ratings yet
Notes 516 Summer 09 Part 2
15 pages
Statistical Inference
No ratings yet
Statistical Inference
14 pages
Cheat Sheet - Test 3
No ratings yet
Cheat Sheet - Test 3
2 pages
Review Lecture
No ratings yet
Review Lecture
44 pages
Inference For Regression
No ratings yet
Inference For Regression
24 pages
Formulas
No ratings yet
Formulas
3 pages
Chapter 03 Inferences
No ratings yet
Chapter 03 Inferences
32 pages
AP Statistics Tutorial
No ratings yet
AP Statistics Tutorial
3 pages
1. Basic Summation Notation
No ratings yet
1. Basic Summation Notation
16 pages
STATS250 Full Help Card
No ratings yet
STATS250 Full Help Card
6 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
21 pages
Is The Dependent Variable Related To The Independent Variable?
No ratings yet
Is The Dependent Variable Related To The Independent Variable?
10 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
23 pages
IISER Biostat
No ratings yet
IISER Biostat
87 pages
Lecture 6: Classical Normal Linear Regression Model Some Basic Ideas
No ratings yet
Lecture 6: Classical Normal Linear Regression Model Some Basic Ideas
9 pages
Lesson 2 Statistical Inference
No ratings yet
Lesson 2 Statistical Inference
45 pages
Inference in The Regression Model
No ratings yet
Inference in The Regression Model
4 pages
Chap02-5 (Autosaved)
No ratings yet
Chap02-5 (Autosaved)
66 pages
Linear Regression
100% (2)
Linear Regression
228 pages
Stat 353 Study Guide
No ratings yet
Stat 353 Study Guide
44 pages
FormulaSheet FinalExam
No ratings yet
FormulaSheet FinalExam
8 pages
Formula Sheet For Statistics
No ratings yet
Formula Sheet For Statistics
43 pages
Regression Analysis
100% (1)
Regression Analysis
280 pages
regression4
No ratings yet
regression4
19 pages
ExamFinal Topics
No ratings yet
ExamFinal Topics
9 pages
Probability and Statistics - 3
No ratings yet
Probability and Statistics - 3
59 pages
Simple Regression
No ratings yet
Simple Regression
46 pages
Lec3 ppt2019
No ratings yet
Lec3 ppt2019
18 pages
CUHK STAT5102 Ch1
No ratings yet
CUHK STAT5102 Ch1
54 pages
Module 5
No ratings yet
Module 5
24 pages
Unit 5 Mba 1ST
No ratings yet
Unit 5 Mba 1ST
197 pages
L1 QM07 High Yield Notes
No ratings yet
L1 QM07 High Yield Notes
4 pages
BIOSTATISTICS FORMULAES
No ratings yet
BIOSTATISTICS FORMULAES
9 pages
Statistics Help Card Full
No ratings yet
Statistics Help Card Full
6 pages
Data Analysis, Standard Error, and Confidence Limits: Mean of A Set of Measurements
No ratings yet
Data Analysis, Standard Error, and Confidence Limits: Mean of A Set of Measurements
5 pages
课本附录 (二) - 公式表 Formula Sheet - final
No ratings yet
课本附录 (二) - 公式表 Formula Sheet - final
2 pages
Chapter 12
No ratings yet
Chapter 12
12 pages
FCDS - RA ch2 Sp21
No ratings yet
FCDS - RA ch2 Sp21
17 pages
Statistics 221 Summary of Material
No ratings yet
Statistics 221 Summary of Material
5 pages
Statistics Cheatsheet
100% (1)
Statistics Cheatsheet
2 pages
CUHK STAT5102 Ch3
No ratings yet
CUHK STAT5102 Ch3
73 pages
Simple Linear
No ratings yet
Simple Linear
10 pages
Chapter 12 11
No ratings yet
Chapter 12 11
15 pages
STAB27
No ratings yet
STAB27
51 pages
Statistics-Help-Card-Formulas
No ratings yet
Statistics-Help-Card-Formulas
3 pages
Statistics Help Card Formulas
No ratings yet
Statistics Help Card Formulas
3 pages

Lec4 ppt2019

Uploaded by

Lec4 ppt2019

Uploaded by

Stat 206: Linear Models

Under the Normal error model:

β̂0 ∼ N (β0 , σ2 {β̂0 }), β̂1 ∼ N (β1 , σ2 {β̂1 }).

β̂1 ± t (1 − α/2; n − 2)s {β̂1 },

How to construct confidence intervals for β0 ?

Figure: A Simulation Study

90% CIs for beta_1

• Recall n = 928, X = 68.316, Xi2 = 4334058,

β̂0 = 24.54, β̂1 = 0.637, MSE = 5.031.

0.637 ± t (0.975; 926) × 0.0407 = 0.637 ± 1.963 × 0.0407

We are 95% confident that the regression slope is in between

• Null distribution of the T-statistic:

Test whether there is a linear association between parent’s height

Notes: Use the fact that Y and β̂1 are uncorrelated.

From Applied Linear Statistical Models by Kutner, Nachtsheim, Neter and Li

The further is Xh from X , the

bh : The variability in the estimated slope β̂1 has a

• Under the Normal error model, Y

• The half-width of (1 − α)- C.I., t (1 − α/2; n − 2)s (Y

• 95%-confidence interval of E (Yh ):

69.2 ± 1.8831 × 0.1 = [68.96, 69.35], t (0.975; 926) = 1.8831.

• We are 95% confident that the average height of children of

Predict a Yh (new ) of the response variable

Note the difference between s 2 {Y

bh ± t (1 − α/2; n − 2)s (predh ).

than the corresponding confidence interval of the mean

What would be the predicted height of the child of a 70in tall

• 95% prediction interval:

69.2 ± 1.8831 × 2.25 = [64.75, 73.56], t (0.975; 926) = 1.8831.

• We are 95% confident that the child’s height will be in

The basic idea of ANOVA is to attributing variation in the data to

• Total deviations: Difference between Yi and the sample

i.e., the deviation of observed value around the fitted

From Applied Linear Statistical Models by Kutner, Nachtsheim, Neter and Li

This is the variation of the observed Yi s around their sample

This is the variation of the observed Yi s around the fitted

This is the variation of the fitted values around the sample

SSE SSE SSR SSR

• Expected values of MS:

Notes: Use the fact that if Z1 ∼ χ2(df ) , Z2 ∼ χ2(df ) and Z1 , Z2

reject H0 if F ∗ > F (1 − α; 1, n − 2),

where F (1 − α; 1, n − 2) is the (1 − α)-percentile of the F1,n−2

ANOVA table for simple linear regression.

n = 928, X = 68.31578, Y = 68.08227, 2 P

0.637, β̂0 = 24.54.

SSE = SSTO − SSR = 4659.

Source SS d.f. MS=SS/d.f. F∗

• Test whether there is a linear association between parent’s

15.662 = 245, 2.582 = 6.66.

You might also like