0% found this document useful (0 votes)
11 views

Chap02-5 (Autosaved)

The document discusses making inferences from regression analysis, including interval estimation and hypothesis tests of regression parameters β0 and β1. It outlines the normal regression model and sampling distributions of the least squares estimators b1 and b0. Methods for constructing confidence intervals and conducting significance tests for β1 are presented, along with an example and exercise.

Uploaded by

qurratulainnn19
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Chap02-5 (Autosaved)

The document discusses making inferences from regression analysis, including interval estimation and hypothesis tests of regression parameters β0 and β1. It outlines the normal regression model and sampling distributions of the least squares estimators b1 and b0. Methods for constructing confidence intervals and conducting significance tests for β1 are presented, along with an example and exercise.

Uploaded by

qurratulainnn19
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 66

Chapter 2

Inferences in Regression Analysis

1
Introduction
In this chapter, we will:-
• Make inferences concerning the regression parameters
0 1 and - interval estimation and tests about
them
• Interval estimation of the mean E{Y} of the p.d. of Y, for
given X
• Prediction intervals for a new obs. Y
• Confidence bands for the regression line
• The analysis of variance approach to regression analysis
• The general linear test approach
• Correlation coefficient, a measure of association when
both X and Y are random variables
2
Normal regression model
• We assume the normal regression model:
Yi   0  1 X i   i

where:
 0 and 1 are parameters
Xi are known constants
 i are independent N (0,  2 )

3
2.1 Inferences concerning β1
• β1 is the slope of the regression line
Yi   0  1 xi   i

• When β1 = 0, there is no linear association


between Y and X.

• Also implies no relation of any type between Y


and X, since the probability distribution of Y are
then identical at all levels of X.
4
Sampling Distribution of b1

b1 
 ( X  X )(Y  Y )
i i

(X  X )
i
2

• The sampling distribution of b1 refers to the


different values of b1 that would be obtained
with repeated sampling when the levels of the
predictor variable X are held constant from
sample to sample.

5
Sampling Distribution of b1
• For normal regression model Yi   0  1 X i  i
the sampling distribution of b1 is normal with
mean and variance defined as follows:

E{b1}  1
2
 2 {b1} 
 i
( X  X ) 2

• To show this, we need to recognize that b1 is a linear


combination of obs. Yi

6
b1 
 ( X  X )(Y  Y )
i i

(X  X )i
2

• It can be shown that:


b1   kiYi

where :
Xi  X
ki 
 i
( X  X ) 2

• ki are a fn of the Xi and thus are fixed values


• Hence, b1 is a linear combination of Yi where the
coefficients are solely a fn of the fixed Xi
7
• The coefficients ki have a no. of properties that will be
used later:

k  0
i

k X 1
i i

1
 k  ( X  X )2
2

 i
i

• Normality. The normality of the sampling distribution of


b1 follows from the fact that b1 is a linear combination of
the Yi
• If Yi are independent and normally distributed, a linear
combination of Yi is also normally distributed
8
Estimated variance

• We can estimate the variance of the sampling dist. of b1:


2
 2 {b1} 
 i
( X  X ) 2

MSE
 s 2 {b1} 
 i
( X  X ) 2

• The point estimator s2{b1} is an unbiased estimator of


σ2{b1}

9
Sampling Distribution of
(b1- β1)/s{b1}
• Since b1 is normally dist., the standardized statistic (b1-
β1)/σ{b1} is a standard normal variable.
• Normally, we need to estimate σ{b1} by s{b1}, hence we
are interested in the dist. of the statistic (b1- β1)/s{b1}
• When a statistic is standardized but an estimated s.d. is
used, it is called studentized statistic
• An impt. theorem in statistics states that:
b1  1
s{b1} is dist. as t(n - 2) for the normal regression model

10
Confidence interval for β1
• Since (b1- β1)/s{b1} follows a t dist.:
P{t ( / 2; n  2)  (b1  1 / s{b1}  t (1   / 2; n  2)}  1   (*)

• Because of the symmetry of the t dist. around its mean


0,
t ( / 2; n  2)  t (1   / 2; n  2) (**)

• Rearranging the inequalities in (*) and using (**), we


obtain:
P{b1  t (1   / 2; n  2) s{b1}  1  b1  t (1   / 2; n  2) s{b1}}  1  

• Thus, the 1 – α confidence limits for β1 are:


b1  t (1   / 2; n  2) s{b1} 11
Example: Toluca Co.
• Mgt wishes an estimate of β1 with 95% confidence
coefficient
• Given the 1 – α confidence limits for β1 :
b1  t (1   / 2; n  2) s{b1}
and
MSE 2,384
s 2 {b1}    .12040
 ( X i  X ) 19,800
2

s{b1}  .3470

• Thus, the 95% C.I. for β1 is:

3.5702  2.069(.3470)  1  3.5702  2.069(.3470)


2.85  1  4.29

given that t(.975; 23) = 2.069


12
Tests concerning β1
b1  1
~ t (n  2)
s{b1}
Two-sided test:
H 0 : 1  0 
 (*)
H a : 1  0

• The test of the alternatives is based on the test statistic:


b1
t*  (**)
s{b1}

• The decision rule for controlling the level of sig. at α is:


If t   t (1   / 2; n  2), conclude H 0
If t   t (1   / 2; n  2), conclude H a
13
• For the Toluca Co. example, where α = .05, b1 = 3.5702,
and s{b1} = .3470, we require t(.975; 23) = 2.069
• Thus, the decision rule for testing alternatives (*) is:
If t   2.069, conclude H 0
If t   2.069, conclude H a

• Since |t*| = |3.5702/.3470| = 10.29 > 2.069, we conclude


(i.e. 1  0)
Ha . Thus, a linear association exists bet. work
hrs and lot size
• Two-sided P-value = 2 x P{t(23) > t* = 10.29}
= 2 x 0.000… = 0.
• Since the two-sided P-value is less than the specified
level of sig. α = .05, we conclude Ha is true
14
Tests concerning β1

One-sided test:
• To test whether or not β1 is positive, at the level of sig.
α = .05, the alternatives are:
H 0 : 1  0
H a : 1  0

and the decision rule based on test statistic (**) is:


If t *  t (1   ; n  2), conclude H 0
If t *  t (1   ; n  2), conclude H a

15
• For the Toluca Co., for α = 0.05, we require
t(.95; 23) = 1.714. Since t* = 10.29 > 1.714, we conclude
Ha (i.e. 1  0)

• The same conclusion can be derived directly from the


one-sided P-value, which was noted to be 0.

• Since this P-value is < .05, we conclude Ha

16
Comments
• The P-value is sometimes called the observed
level of significance

• P-value is normally reported together with the


value of the test statistic t*. Thus, one can
conduct a test at any desired level of
significance α by comparing the P-value with the
specified level α

17
Exercise
Grade point average. The director of admissions of a small college
selected 120 students at random from new freshman class in a
study to determine whether a student’s grade point average
(GPA) at the end of the freshman year (Y) can be predicted from
the ACT test score (X). Assume that first-order regression model
is appropriate and also given that b1 = 0.03883, t(.995, 118) =
2.61814, s(b1)=.01277);

a) Obtain a 99 % C.I. for β1. Interpret the C.I. Does it include zero?
Why might the director of admissions be interested in whether the
C.I. includes zero.
b) Test, using the test statistic t*, whether or not a linear association
exists between student’s ACT score (X) and GPA at the end of
the freshman year (Y). Use a level of significance of .01. State the
alternatives, decision rule, and conclusion.
c) What is the P-value of your test in part (b)? How does it support
the conclusion reached in part (b)?
18
2.2 Inferences Concerning β0
• Inferences concerning β0 is rare
• It only occurs when the scope of the model
includes X=0
• β0 is the intercept of the regression line

19
Sampling Distribution of b0
• Point estimator b0 is given as follows:
b0 =Y  b1 X

• The sampling distribution of b0 refers to the


different values of b0 that would be obtained
with repeated sampling when the levels of the
predictor variable X are held constant from
sample to sample.

20
Sampling Distribution of b0
• For normal regression model, the sampling dist. of b0 is
normal, with mean and variance:
E{b0 }   0
1 X2 
 {b0 }    
2 2

 n  X i  X  
2

• The normality of the sampling dist. of b0 follows because


b0, like b1, is a linear combinations of the obs. Yi
• An estimator of σ2{b0} is obtained by replacing σ2 by its
point estimator MSE.
1 X 2 
s {b0 }  MSE  
2

 n  X i  X  
2

21
Sampling Distribution of (b0- β0)/s{b0}

• For normal regression model,


b0   0
~ t ( n  2)
s{b0 }

• Hence, the confidence intervals for β0 and tests


concerning β0 can be set up using the t dist. as usual

Confidence interval for β0

• The 1 – α confidence limits for β0 are obtained in the


same manner as for β1 earlier:
b0  t (1   / 2; n  2) s{b0 }
22
Confidence interval for β0 (cont.)

• Toluca Co. example

1 X2   1 (70.00) 2 
s {b0 }  MSE  
2
2
 2,384     685.34
 n  ( X i  X )   25 19,800 
 s{b0 }  26.18

Thus, the 90% C.I. for β0 is:


62.37  1.714(26.18)   0  62.37  1.714(26.18)
17.5   0  107.2

23
2.3 Some Considerations on Making
Inferences Concerning β0 and β1
• Effects of departures from normality
– If the prob. dist. of Y are not exactly normal
but do not depart seriously, the sampling
distributions of b0 and b1 will be approx.
normal and the use of the t dist. will provide
approx. the specified confidence coefficient or
level of significance.
– For large samples, the t value is replaced by
the z value for the standard normal dist.

24
Some considerations on making
inferences concerning β0 and β1 (cont.)
• Interpretation of Confidence Coefficient
– 95% C.I. is interpreted to mean that if many
indep. samples are taken where the levels of
X are the same and a 95% C.I. is constructed
for each sample, 95% of the intervals will
contain the true value of β1

25
Some considerations on making
inferences concerning β0 and β1 (cont.)
• Spacing of the X levels
– Variances of b1 and b0 are affected by the spacing of
the X levels in the observed data.
– The greater is the spread in the X levels, the larger is
the quantity  ( X i  X ) and the smaller is the variance
of b1.

• Power of tests
– Probability that the decision rule will lead to
conclusion Ha when Ha in fact holds.

26
2.4 Interval Estimation of E{Yh}
• Let Xh denote the level of X for which we wish to
estimate the mean response.
• Xh may be a value which occurred in the sample
or it may be some other value of the predictor
variable within the scope of the model.
• The mean response when X = Xh is denoted by
E{Yh}.
• The point estimator for E{Yh} is
Ŷh = b0 + b1Xh
27
Sampling Distribution of Ŷh
• Refers to the different values of Ŷh that would be
obtained if repeated samples were selected, each
holding the levels of the predictor variable X constant
and calculating Ŷh for each sample.

• For the normal error regression model ,


Yi   0  1 X i   i
the sampling distribution of Ŷh is normal with mean
and variance defined as follows:
E{Yˆh }  E{Yh }
 1 ( X  X ) 2 
 2 {Yˆh }   2   h
2
(**)
 n  ( X i  X ) 
28
Sampling Distribution of Ŷh
• Normality
– The normality of the sampling distribution of Ŷh follows directly
from the fact that Ŷh is a linear combination of the obs. Yi.

• Mean
– Ŷh is an unbiased estimator of E{Yh}

• Variance
– Estimated variance of Ŷh is obtained by substituting σ2 in (**)
with MSE
1 ( X  X ) 2 
s {Yˆh }  MSE  
2 h
2
 n  i
( X  X ) 

29
Sampling Distribution of
(Ŷh – E{Yh}) / s{Ŷh}
• (Ŷh – E{Yh}) / s{Ŷh} is distributed as t(n-2):
Yˆh  E{Yh }
~ t ( n  2)
ˆ
s{Yh }

• Confidence interval for E{Yh}


– The 1 – α confidence limits are:
Yˆh  t (1   / 2; n  2) s{Yˆh }

30
• Toluca Co. example
Find a 90% C.I. for E{Yh} when the lot size is Xh = 65 units.

Firstly, find the point estimate Ŷh:


Ŷh = 62.37 + 3.5702(65) = 294.4

Next, find the estimated s.d. s{Ŷh}:


 1 ( 65  70.00) 2

s {Yˆh }  2,384 
2
  98.37
 25 19,800 
s{Yˆh }  9.918
For a 90% confidence coefficient, we require t(.95; 23) = 1.714. Hence,
the 90 C.I. is:

294.4  1.714(9.918)  E{Yh }  294.4  1.714(9.918)


277.4  E{Yh }  311.4
We conclude with 90% confidence that the mean no. of work hrs required to
produce lots of 65 units is somewhere between 277.4 and 311.4 hrs.
31
2.5 Prediction of New Observation
• Example: In the Toluca co., the next lot to be
produced consists of 100 units and mgmt wishes
to predict the no. of hrs for this particular lot.
• The new obs. on Y to be predicted is viewed as
the result of a new independent trial.
• The level of X for the new trial predictor is
denoted as Xh and the new obs. on Y as Yh(new).
• Prediction of a new response Yh(new) is an
individual outcome drawn from the distribution of
Y.
32
Prediction Interval for Yh(new) When
Parameters Known
• Suppose that in the college admissions example, the parameters
are known:
 0  .10 1  .95
E{Y }  .10  .95 X
  .12
• The admissions officer is considering an applicant whose high
school GPA is Xh = 3.5. The mean college GPA for students whose
high school ave. is 3.5 is:
E{Yh} = .10 + .95(3.5) = 3.425

• To predict the college GPA of the applicant whose high school GPA
is Xh = 3.5 will be between:
E{Yh }  3  3.425  3(.12)
Thus , prediction int erval ,
3.065  Yh ( new)  3.785 33
Prediction Interval for Yh(new) When
Parameters Known (cont.)

• The basic idea of a prediction interval is to


choose a range in the distribution of Y wherein
most of the obs. will fall and then to declare that
the next obs. will fall in this range.
• In general, when the regression parameters are
known, the 1 – α prediction limits for Yh(new) are
E{Yh} z(1 – α/2) σ

34
Prediction Interval for Yh(new) When
Parameters Unknown
• Parameters are unknown so they must be estimated.
• The mean of the distribution of Y is estimated by Ŷh as usual and the
variance of the dist. of Y is estimated by MSE.
• Prediction limits for a new obs. Yh(new) at Xh are:
Yˆh  t (1   / 2; n  2) s{ pred }
where s 2 { pred }  MSE  s 2 {Yˆh }
 1 ( X h  X )2 
 MSE 1   2
 n  i( X  X ) 

2 ˆ
1 ( X h  X )2 
Note : s {Yh }  MSE   2
 n  i
( X  X ) 
35
Prediction Interval for Yh(new) When
Parameters Unknown (cont.)
• Toluca Co. example
Suppose that the next lot to be produced consists of
Xh=100 units and a 90% prediction interval is required.
Earlier, we have:
Ŷh=419.4 s2{Ŷh}=203.72 MSE=2,384 t(.95; 23)=1.714

s2{pred} = 2,384 + 203.72 = 2,587.72


s{pred} = 50.87

Hence, the 90% pred. interval for Yh(new) is:


Yˆh  t (1   / 2; n  2) s{ pred }
 419.4  1.714(50.87)  Yh ( new)  419.4  1.714(50.87)
 332.2  Yh ( new)  506.6

With 90% confidence, we predict that the number of work hrs for the next
production run of 100 units will be somewhere between 332 and 507 hrs.
36
Prediction of Mean of m New
Observations for Given Xh
• Predict the mean of m new obs. on Y for a
given level of the predictor variable.
• Mean of the new Y obs. to be predicted is
denoted as Yh (new)
• 1 – α prediction limits are:
Yˆh  t (1   / 2; n  2) s{ predmean}
MSE
where s 2 { predmean}   s 2 {Yˆh } or
m
1 1 ( X h  X )2 
 MSE    2
 m n  ( X i  X )  37
Prediction of Mean of m
New Observations for Given Xh (cont.)
• Toluca Co. example
Find the 90% pred. Interval for the mean no. of work hrs
Yh (new) in three new production runs, each for Xh = 100
units. Earlier we have:
Ŷh = 419.4 s2{Ŷh} = 203.72
MSE = 2,384 t(.95; 23) = 1.714
Hence,
s2{predmean} = 2,384/3 + 203.72 = 998.4
s{predmean} = 31.60
The prediction interval for the mean work hrs per lot is:
Yˆh  t (1   / 2; n  2) s{ predmean}
 419.4  1.714(31.60)  Yh ( new)  419.4  1.714(31.60)
 365.2  Yh ( new)  473.6 38
2.6 Confidence Band for
Regression Line
• Here we want to find the confidence band for the entire
regression line
E{Y} = β0 + β1X

• The Working-Hotelling 1 – α confidence band for the


regression line has the following two boundary values at
any level Xh.
Yˆh  Ws{Yˆh }
where :
W 2  2 F (1   ;2, n  2)

39
2.6 Confidence Band for
Regression Line (cont.)
• Toluca Co. example
The 90% confidence band for the regression line when Xh=100 is
calculated as follows:
Ŷh = 419.4 s{Ŷh} = 14.27
W2 = 2F(1-α; 2, n-2) = 2F(.90; 2, 23) = 2(2.549) = 5.098
W = 2.258

Hence, the boundary values of the confidence band for the


regression line at Xh = 100 are 419.4 ± 2.258(14.27), and the
confidence band there is:
387.2   0  1 X h  451.6 for X h  100

40
2.7 Analysis of Variance Approach
to Regression Analysis
• The analysis of variance approach is based on
the partitioning of sums of squares and degrees
of freedom associated with the response
variable Y.
• Variation in Y is conventionally measured in
terms of the deviations of the Yi around their
mean Y i.e. Yi  Y
• These variation are shown in Fig. 2.7. The
measure of total variation, denoted by SSTO, is
the sum of the squared deviations:
SSTO   Yi  Y 
2

41
Partitioning of Total Sum of Squares

Fig. 2.7 Illustration of partitioning of total deviations Yi  Y - Toluca Co. example

42
Partitioning of Total Sum of Squares
• When we utilize predictor variable X, the variation reflecting the
uncertainty concerning the variable Y is:
Yi  Yˆ
These deviations are shown in Fig. 2.7b.

• The measure of variation in the Yi obs. when the predictor variable X


is used is the SSE (error sum of squares):
SSE   
Yi  Yˆi 
2

• The measure of that part of the variability of the Yi which is


associated with the regression line (Fig. 2.7c) is the regression sum
of squares (SSR):

SSR   Yˆi  Yi
2

43
Formal development of partitioning
• The total deviation Yi  Y used in the measure of the total
variation of the obs. Yi without taking the predictor variable into
a/c, can be decomposed into two components:

Yi  Y  (Yˆi  Y )  (Yi  Yˆ )

Total Deviation Deviation


deviation of fitted around
regression fitted
value regression
around line
mean

ˆ
1. The deviation of the fitted value Yi around the mean
2. Y line
The deviation of the obs. Yi around the fitted regression

 Y  Y        
2 2
Yˆi  Y Yi  Yˆi
2
i

 SSTO  SSR  SSE


44
Breakdown of Degrees of Freedom
• SSTO has n-1 degrees of freedom (df).
– One df is lost because the deviations Yi  Y are subject to
one constraint where they must sum to zero.
– Equivalently, one degree of freedom is lost because the
sample mean Y is used to estimate the population mean.
• SSE has n-2 degrees of freedom.
– Two degrees of freedom are lost because the two parameters
β0 and β1 are estimated in obtaining the fitted values Ŷi.
• SSR has one degree of freedom.
– Two degrees of freedom are associated with the regression
line (intercept and slope). One of the two degrees of freedom
is lost because the deviations Ŷi - Y is subject to a constraint
where they must sum to zero.
45
Mean Squares
• A sum of squares divided by its associated
degrees of freedom is called a mean
square (MS).
• Regression mean square (MSR)
MSR = SSR/1 = SSR
• Error mean square (MSE)
MSE = SSE / (n-2)

46
Analysis of Variance Table
• ANOVA table (pg 67)
– The breakdown of the total sum of squares
and associated degrees of freedom are
displayed in the form of an analysis of
variance table (ANOVA table).
– Mean squares are also shown.
– A column containing expected mean squares
is also shown.

47
Expected Mean Square
• The expected value of a mean square is the mean of its
sampling distribution and tells us what is being estimated
by the mean square.
E{MSE}   2

 X i  X
2
E{MSR}   2  1
2

• Two important implications of the expected mean squares


above.
– The mean of the sampling distribution of MSE is σ2
whether or not X and Y are linearly related, i.e. whether
or not β1 = 0.
– The mean of the sampling distribution of MSR is also σ2
for β1 = 0 but for β1 ≠ 0, the mean of the sampling
distribution of MSR is greater than σ2.
48
Expected Mean Square (cont.)

• This means that a comparison of MSR and


MSE is useful for testing whether or not
β1 = 0.
– If MSR = MSE, then β1 = 0.
– If MSR > MSE, then β1 ≠ 0.
• Proof of expected mean squares
– (refer to textbook)

49
F Test of β1= 0 vs β1≠ 0
• Analysis of variance provide a test for:
H0: β1 = 0
Ha: β1 ≠ 0
• Test statistic for the analysis of variance
approach is denoted by F*.
F* = MSR / MSE
• Based on the basic ANOVA table, large
values of F* support Ha and values of F*
near 1 support H0.
50
Sampling Distribution of F*
• Cochran’s theorem: If H0 (β1 = 0) holds, F* follows the
F(1, n-2) dist.
• Proof (refer to text)
• Construction of decision rule
– F* is upper-tail and is distributed as F(1, n–2) when H0
holds, the decision rule is as follows when the risk of
a Type I error is to be controlled at α:
If F* <= F(1 – α; 1, n – 2), conclude H0
If F* > F(1 – α; 1, n – 2), conclude H1
where F(1 – α;1, n – 2) is the (1 – α) 100 percentile of
the appropriate F distribution.
51
Toluca co. example
• We shall repeat the earlier test on β1 using the F test.

H 0 : 1  0
H a : 1  0
• We require F(1 – α;1, n – 2) [where α=.05, n=25]
= F(.95; 1, 23) = 4.28 (http://www.z-table.com/f-distribution-table.html)
If F *  4.28, conclude H 0

If F *  4.28, conclude H a
MSR 252,378
F*    105.9
MSE 2,384

• Since F* = 105.9 > 4.28, we conclude Ha, or that there is a linear


association between work hrs and lot size.
52
Equivalence of F test and t test
• For a given level of α, the F test of β1 = 0 vs β1 ≠ 0 is equivalent
algebraically to the two-tailed t test.

 X i  X
2 2
SSR  b1

 X i  X
2 2
SSR  1 b1
F*  
SSE  (n  2) MSE
Since s 2 {b1}  MSE /  X i  X  ,
2

2
2
 b1 
b1
F  2
*
    t *
s {b1}  s{b1} 
 2

• From earlier work, t* = 10.29. Thus (t*)2 =10.292 = 105.9 = F*.


• Similarly, t (1   / 2; n  2)  F (1   ;1, n  2)
2

e.g . [t (.975; 23)]2  2.0692  4.28  F (.95; 1, 23)


53
Equivalence of F test and t test (cont.)

• Note that t test is two-tailed (2-sided) whereas F


test is one-tailed (1-sided).
• Thus, at any given level of α, we can use either
the t test of the F test for testing β1 = 0 vs β1 ≠ 0.
• If one leads to H0, so will the other and
correspondingly for Ha.
• However, the t test is more flexible as it can be
used for 1-sided alternatives which includes
β1 ( ) 0 vs β1 (> <) 0, while the F test cannot.

54
2.8 General Linear Test Approach
• Involves 3 basic steps
1. Fit the full model Yi   0  1 xi   i
and obtain the error sum of squares - SSE(F)

2. Fit the reduced model under H0 and obtain the error


sum of squares - SSE(R).
Yi   0   i Re duced mod el

3. Test statistic – Compare the two error sums of


squares SSE(F) and SSE(R)

55
Full Model (Unrestricted Model)
• The full model is the normal regression model.
y i   0   1 xi   i
• We fit this full model by either
– Method of least squares
– Method of maximum likelihood
to obtain the error sum of squares.
• The error sum of squares is the sum of the
squared deviations of each observation Yi
around its estimated expected value.
– It is denoted as SSE(F).
SSE ( F )   (Yi  Yˆi ) 2  SSE
56
Reduced Model (Restricted
Model)
• We now consider H0
H 0: β 1 = 0
H a: β 1 ≠ 0
• The model when H0 holds is called the
reduced or restricted model.
• When β1 = 0, the full model is reduced to
Y i = β 0 + i
57
Reduced Model (Restricted
Model)
• We fit this reduced model by either
– Method of least squares
– Method of maximum likelihood
to obtain the error sum of squares.
• It is denoted as SSE(R).
• The least squares and maximum likelihood estimator of
β0 is Y . Hence, the estimated expected value for each
observation is b0 = Y . Thus,
SSE ( R )   (Yi  b0 ) 2  (Yi  Y ) 2  SSTO

58
Test Statistic
• To compare the two error sum of squares SSE(F) and SSE(R).
• It can be shown that SSE(F) is never greater than SSE(R).
SSE(F)  SSE(R)
• The reason is that the more parameters are in the model, the
better one can fit the data and the smaller are the deviations
around the fitted regression function.

• When the difference between SSE(F) and SSE(R) is close, using


the full model does not account for much more of the variability
of the Yi than does the reduced model. In this case, the data
suggested that the reduced model is adequate (i.e. H0 holds).

59
Test Statistic
• A small difference SSE(R) – SSE(F) suggest that H0 holds.

• A large difference suggests that Ha holds because the


additional parameters in the model do help in reducing the
variation of the observation Yi.

• The test statistic is given as


SSE ( R )  SSE ( F ) SSE ( F )
F*  
df R  df F df F

which follows the F dist. when H0 holds.

60
Test Statistic
•The decision rule is:
If F*  F(1 – α; dfR – dfF, dfF), conclude H0
If F* > F(1 – α; dfR – dfF, dfF), conclude Ha

•For testing whether or not β1 = 0,


SSE(R) = SSTO SSE(F) = SSE
dfR = n – 1 dfF = n – 2

•So F* now becomes:


SSTO  SSE SSE SSR SSE MSR
F*     
(n  1)  (n  2) n  2 1 n  2 MSE

61
2.9 Descriptive Measures of Linear
Association between X and Y

• We will look at two descriptive measures


to describe the degree of linear
association between X and Y.
– Coefficient of determination
– Coefficient of correlation

62
Coefficient of Determination
• Denoted by R2

SSR SSE
• R 
2
 1
SSTO SSTO
• Since 0 <= SSE <= SSTO, it follows that 0 <= R2 <= 1.
• The limiting value of R2 occur as follows: (refer to whiteboard)

• Example: For the Toluca Co.,


SSTO = 307,203 and SSR = 252,378
252,378
R 
2
 .822
307,203

63
Coefficient of Determination
• Limitations of R2
– Note that no single measure will be adequate for
describing the usefulness of a regression model for
different applications.
– Still, the coefficient of determination is widely used but
it is subject to serious misunderstanding.
– 3 common misunderstanding are:
• A high coefficient of determination indicates that useful
predictions can be made.
• A high coefficient of determination indicates that the
estimated regression line is a good fit.
• A coefficient of determination near zero indicates that X and
Y are not related.
64
Coefficient of Correlation
• Define as the measure of linear association between Y and X
when both Y and X are random. Denoted as
r= R
2

• A plus minus sign is attached to this measure according to


whether the slope of the fitted regression line is positive or
negative. The range is –1 <= r <= 1.
• For the Toluca Co. example: r   .822  .907
• Regression models do not contain a parameter to be
estimated by R2 or r. These are simply descriptive measures
of the degree of linear association between X and Y in the
sample observations.

65
End of Chapter 2

66

You might also like