Chap02-5 (Autosaved)
Chap02-5 (Autosaved)
1
Introduction
In this chapter, we will:-
• Make inferences concerning the regression parameters
0 1 and - interval estimation and tests about
them
• Interval estimation of the mean E{Y} of the p.d. of Y, for
given X
• Prediction intervals for a new obs. Y
• Confidence bands for the regression line
• The analysis of variance approach to regression analysis
• The general linear test approach
• Correlation coefficient, a measure of association when
both X and Y are random variables
2
Normal regression model
• We assume the normal regression model:
Yi 0 1 X i i
where:
0 and 1 are parameters
Xi are known constants
i are independent N (0, 2 )
3
2.1 Inferences concerning β1
• β1 is the slope of the regression line
Yi 0 1 xi i
b1
( X X )(Y Y )
i i
(X X )
i
2
5
Sampling Distribution of b1
• For normal regression model Yi 0 1 X i i
the sampling distribution of b1 is normal with
mean and variance defined as follows:
E{b1} 1
2
2 {b1}
i
( X X ) 2
6
b1
( X X )(Y Y )
i i
(X X )i
2
where :
Xi X
ki
i
( X X ) 2
k 0
i
k X 1
i i
1
k ( X X )2
2
i
i
MSE
s 2 {b1}
i
( X X ) 2
9
Sampling Distribution of
(b1- β1)/s{b1}
• Since b1 is normally dist., the standardized statistic (b1-
β1)/σ{b1} is a standard normal variable.
• Normally, we need to estimate σ{b1} by s{b1}, hence we
are interested in the dist. of the statistic (b1- β1)/s{b1}
• When a statistic is standardized but an estimated s.d. is
used, it is called studentized statistic
• An impt. theorem in statistics states that:
b1 1
s{b1} is dist. as t(n - 2) for the normal regression model
10
Confidence interval for β1
• Since (b1- β1)/s{b1} follows a t dist.:
P{t ( / 2; n 2) (b1 1 / s{b1} t (1 / 2; n 2)} 1 (*)
s{b1} .3470
One-sided test:
• To test whether or not β1 is positive, at the level of sig.
α = .05, the alternatives are:
H 0 : 1 0
H a : 1 0
15
• For the Toluca Co., for α = 0.05, we require
t(.95; 23) = 1.714. Since t* = 10.29 > 1.714, we conclude
Ha (i.e. 1 0)
16
Comments
• The P-value is sometimes called the observed
level of significance
17
Exercise
Grade point average. The director of admissions of a small college
selected 120 students at random from new freshman class in a
study to determine whether a student’s grade point average
(GPA) at the end of the freshman year (Y) can be predicted from
the ACT test score (X). Assume that first-order regression model
is appropriate and also given that b1 = 0.03883, t(.995, 118) =
2.61814, s(b1)=.01277);
a) Obtain a 99 % C.I. for β1. Interpret the C.I. Does it include zero?
Why might the director of admissions be interested in whether the
C.I. includes zero.
b) Test, using the test statistic t*, whether or not a linear association
exists between student’s ACT score (X) and GPA at the end of
the freshman year (Y). Use a level of significance of .01. State the
alternatives, decision rule, and conclusion.
c) What is the P-value of your test in part (b)? How does it support
the conclusion reached in part (b)?
18
2.2 Inferences Concerning β0
• Inferences concerning β0 is rare
• It only occurs when the scope of the model
includes X=0
• β0 is the intercept of the regression line
19
Sampling Distribution of b0
• Point estimator b0 is given as follows:
b0 =Y b1 X
20
Sampling Distribution of b0
• For normal regression model, the sampling dist. of b0 is
normal, with mean and variance:
E{b0 } 0
1 X2
{b0 }
2 2
n X i X
2
21
Sampling Distribution of (b0- β0)/s{b0}
1 X2 1 (70.00) 2
s {b0 } MSE
2
2
2,384 685.34
n ( X i X ) 25 19,800
s{b0 } 26.18
23
2.3 Some Considerations on Making
Inferences Concerning β0 and β1
• Effects of departures from normality
– If the prob. dist. of Y are not exactly normal
but do not depart seriously, the sampling
distributions of b0 and b1 will be approx.
normal and the use of the t dist. will provide
approx. the specified confidence coefficient or
level of significance.
– For large samples, the t value is replaced by
the z value for the standard normal dist.
24
Some considerations on making
inferences concerning β0 and β1 (cont.)
• Interpretation of Confidence Coefficient
– 95% C.I. is interpreted to mean that if many
indep. samples are taken where the levels of
X are the same and a 95% C.I. is constructed
for each sample, 95% of the intervals will
contain the true value of β1
25
Some considerations on making
inferences concerning β0 and β1 (cont.)
• Spacing of the X levels
– Variances of b1 and b0 are affected by the spacing of
the X levels in the observed data.
– The greater is the spread in the X levels, the larger is
the quantity ( X i X ) and the smaller is the variance
of b1.
• Power of tests
– Probability that the decision rule will lead to
conclusion Ha when Ha in fact holds.
26
2.4 Interval Estimation of E{Yh}
• Let Xh denote the level of X for which we wish to
estimate the mean response.
• Xh may be a value which occurred in the sample
or it may be some other value of the predictor
variable within the scope of the model.
• The mean response when X = Xh is denoted by
E{Yh}.
• The point estimator for E{Yh} is
Ŷh = b0 + b1Xh
27
Sampling Distribution of Ŷh
• Refers to the different values of Ŷh that would be
obtained if repeated samples were selected, each
holding the levels of the predictor variable X constant
and calculating Ŷh for each sample.
• Mean
– Ŷh is an unbiased estimator of E{Yh}
• Variance
– Estimated variance of Ŷh is obtained by substituting σ2 in (**)
with MSE
1 ( X X ) 2
s {Yˆh } MSE
2 h
2
n i
( X X )
29
Sampling Distribution of
(Ŷh – E{Yh}) / s{Ŷh}
• (Ŷh – E{Yh}) / s{Ŷh} is distributed as t(n-2):
Yˆh E{Yh }
~ t ( n 2)
ˆ
s{Yh }
30
• Toluca Co. example
Find a 90% C.I. for E{Yh} when the lot size is Xh = 65 units.
• To predict the college GPA of the applicant whose high school GPA
is Xh = 3.5 will be between:
E{Yh } 3 3.425 3(.12)
Thus , prediction int erval ,
3.065 Yh ( new) 3.785 33
Prediction Interval for Yh(new) When
Parameters Known (cont.)
34
Prediction Interval for Yh(new) When
Parameters Unknown
• Parameters are unknown so they must be estimated.
• The mean of the distribution of Y is estimated by Ŷh as usual and the
variance of the dist. of Y is estimated by MSE.
• Prediction limits for a new obs. Yh(new) at Xh are:
Yˆh t (1 / 2; n 2) s{ pred }
where s 2 { pred } MSE s 2 {Yˆh }
1 ( X h X )2
MSE 1 2
n i( X X )
2 ˆ
1 ( X h X )2
Note : s {Yh } MSE 2
n i
( X X )
35
Prediction Interval for Yh(new) When
Parameters Unknown (cont.)
• Toluca Co. example
Suppose that the next lot to be produced consists of
Xh=100 units and a 90% prediction interval is required.
Earlier, we have:
Ŷh=419.4 s2{Ŷh}=203.72 MSE=2,384 t(.95; 23)=1.714
With 90% confidence, we predict that the number of work hrs for the next
production run of 100 units will be somewhere between 332 and 507 hrs.
36
Prediction of Mean of m New
Observations for Given Xh
• Predict the mean of m new obs. on Y for a
given level of the predictor variable.
• Mean of the new Y obs. to be predicted is
denoted as Yh (new)
• 1 – α prediction limits are:
Yˆh t (1 / 2; n 2) s{ predmean}
MSE
where s 2 { predmean} s 2 {Yˆh } or
m
1 1 ( X h X )2
MSE 2
m n ( X i X ) 37
Prediction of Mean of m
New Observations for Given Xh (cont.)
• Toluca Co. example
Find the 90% pred. Interval for the mean no. of work hrs
Yh (new) in three new production runs, each for Xh = 100
units. Earlier we have:
Ŷh = 419.4 s2{Ŷh} = 203.72
MSE = 2,384 t(.95; 23) = 1.714
Hence,
s2{predmean} = 2,384/3 + 203.72 = 998.4
s{predmean} = 31.60
The prediction interval for the mean work hrs per lot is:
Yˆh t (1 / 2; n 2) s{ predmean}
419.4 1.714(31.60) Yh ( new) 419.4 1.714(31.60)
365.2 Yh ( new) 473.6 38
2.6 Confidence Band for
Regression Line
• Here we want to find the confidence band for the entire
regression line
E{Y} = β0 + β1X
39
2.6 Confidence Band for
Regression Line (cont.)
• Toluca Co. example
The 90% confidence band for the regression line when Xh=100 is
calculated as follows:
Ŷh = 419.4 s{Ŷh} = 14.27
W2 = 2F(1-α; 2, n-2) = 2F(.90; 2, 23) = 2(2.549) = 5.098
W = 2.258
40
2.7 Analysis of Variance Approach
to Regression Analysis
• The analysis of variance approach is based on
the partitioning of sums of squares and degrees
of freedom associated with the response
variable Y.
• Variation in Y is conventionally measured in
terms of the deviations of the Yi around their
mean Y i.e. Yi Y
• These variation are shown in Fig. 2.7. The
measure of total variation, denoted by SSTO, is
the sum of the squared deviations:
SSTO Yi Y
2
41
Partitioning of Total Sum of Squares
42
Partitioning of Total Sum of Squares
• When we utilize predictor variable X, the variation reflecting the
uncertainty concerning the variable Y is:
Yi Yˆ
These deviations are shown in Fig. 2.7b.
Yi Y (Yˆi Y ) (Yi Yˆ )
ˆ
1. The deviation of the fitted value Yi around the mean
2. Y line
The deviation of the obs. Yi around the fitted regression
Y Y
2 2
Yˆi Y Yi Yˆi
2
i
46
Analysis of Variance Table
• ANOVA table (pg 67)
– The breakdown of the total sum of squares
and associated degrees of freedom are
displayed in the form of an analysis of
variance table (ANOVA table).
– Mean squares are also shown.
– A column containing expected mean squares
is also shown.
47
Expected Mean Square
• The expected value of a mean square is the mean of its
sampling distribution and tells us what is being estimated
by the mean square.
E{MSE} 2
X i X
2
E{MSR} 2 1
2
49
F Test of β1= 0 vs β1≠ 0
• Analysis of variance provide a test for:
H0: β1 = 0
Ha: β1 ≠ 0
• Test statistic for the analysis of variance
approach is denoted by F*.
F* = MSR / MSE
• Based on the basic ANOVA table, large
values of F* support Ha and values of F*
near 1 support H0.
50
Sampling Distribution of F*
• Cochran’s theorem: If H0 (β1 = 0) holds, F* follows the
F(1, n-2) dist.
• Proof (refer to text)
• Construction of decision rule
– F* is upper-tail and is distributed as F(1, n–2) when H0
holds, the decision rule is as follows when the risk of
a Type I error is to be controlled at α:
If F* <= F(1 – α; 1, n – 2), conclude H0
If F* > F(1 – α; 1, n – 2), conclude H1
where F(1 – α;1, n – 2) is the (1 – α) 100 percentile of
the appropriate F distribution.
51
Toluca co. example
• We shall repeat the earlier test on β1 using the F test.
H 0 : 1 0
H a : 1 0
• We require F(1 – α;1, n – 2) [where α=.05, n=25]
= F(.95; 1, 23) = 4.28 (http://www.z-table.com/f-distribution-table.html)
If F * 4.28, conclude H 0
If F * 4.28, conclude H a
MSR 252,378
F* 105.9
MSE 2,384
X i X
2 2
SSR b1
X i X
2 2
SSR 1 b1
F*
SSE (n 2) MSE
Since s 2 {b1} MSE / X i X ,
2
2
2
b1
b1
F 2
*
t *
s {b1} s{b1}
2
54
2.8 General Linear Test Approach
• Involves 3 basic steps
1. Fit the full model Yi 0 1 xi i
and obtain the error sum of squares - SSE(F)
55
Full Model (Unrestricted Model)
• The full model is the normal regression model.
y i 0 1 xi i
• We fit this full model by either
– Method of least squares
– Method of maximum likelihood
to obtain the error sum of squares.
• The error sum of squares is the sum of the
squared deviations of each observation Yi
around its estimated expected value.
– It is denoted as SSE(F).
SSE ( F ) (Yi Yˆi ) 2 SSE
56
Reduced Model (Restricted
Model)
• We now consider H0
H 0: β 1 = 0
H a: β 1 ≠ 0
• The model when H0 holds is called the
reduced or restricted model.
• When β1 = 0, the full model is reduced to
Y i = β 0 + i
57
Reduced Model (Restricted
Model)
• We fit this reduced model by either
– Method of least squares
– Method of maximum likelihood
to obtain the error sum of squares.
• It is denoted as SSE(R).
• The least squares and maximum likelihood estimator of
β0 is Y . Hence, the estimated expected value for each
observation is b0 = Y . Thus,
SSE ( R ) (Yi b0 ) 2 (Yi Y ) 2 SSTO
58
Test Statistic
• To compare the two error sum of squares SSE(F) and SSE(R).
• It can be shown that SSE(F) is never greater than SSE(R).
SSE(F) SSE(R)
• The reason is that the more parameters are in the model, the
better one can fit the data and the smaller are the deviations
around the fitted regression function.
59
Test Statistic
• A small difference SSE(R) – SSE(F) suggest that H0 holds.
60
Test Statistic
•The decision rule is:
If F* F(1 – α; dfR – dfF, dfF), conclude H0
If F* > F(1 – α; dfR – dfF, dfF), conclude Ha
61
2.9 Descriptive Measures of Linear
Association between X and Y
62
Coefficient of Determination
• Denoted by R2
SSR SSE
• R
2
1
SSTO SSTO
• Since 0 <= SSE <= SSTO, it follows that 0 <= R2 <= 1.
• The limiting value of R2 occur as follows: (refer to whiteboard)
63
Coefficient of Determination
• Limitations of R2
– Note that no single measure will be adequate for
describing the usefulness of a regression model for
different applications.
– Still, the coefficient of determination is widely used but
it is subject to serious misunderstanding.
– 3 common misunderstanding are:
• A high coefficient of determination indicates that useful
predictions can be made.
• A high coefficient of determination indicates that the
estimated regression line is a good fit.
• A coefficient of determination near zero indicates that X and
Y are not related.
64
Coefficient of Correlation
• Define as the measure of linear association between Y and X
when both Y and X are random. Denoted as
r= R
2
65
End of Chapter 2
66