0% found this document useful (0 votes)
16 views

10-Regression

Uploaded by

xabc0983
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

10-Regression

Uploaded by

xabc0983
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

PROBABILITY & STATISTICS

Empirical models Chapter 11: Simple Linear Regression and


Simple Linear
Correlation
Regression

Estimating σ2 Learning objectives


Hypothesis
tests
Confidence 1. Empirical Models
intervals
Prediction 2. Simple Linear Regression
Adequacy
3. Correlation
Correlation

Summary

07/12/2023 Department of Mathematics 1/40


Empirical Models

Empirical models
Empirical models
• Many problems in engineering and science involve
Simple Linear exploring the relationships between two or more
Regression
variables.
Estimating σ2
• Regression analysis is a statistical technique that is
Hypothesis
tests very useful for these types of problems.
Confidence
intervals
• For example, in a chemical process, suppose that the
Prediction yield of the product is related to the process-operating
Adequacy temperature.
Correlation • Regression analysis can be used to build a model to
predict yield at a given temperature level.
Summary

07/12/2023 Department of Mathematics 2/40


Empirical Models

Empirical models
Empirical models

Simple Linear
Regression

Estimating σ2
Hypothesis
tests
Confidence
intervals
Prediction
Adequacy

Correlation

Summary

07/12/2023 Department of Mathematics 3/40


Empirical Models

Empirical models
Empirical models

Simple Linear
Regression

Estimating σ2
Hypothesis
tests
Confidence
intervals
Prediction
Adequacy

Correlation

Summary Figure 11-1 Scatter Diagram of oxygen purity versus


hydrocarbon level from Table 11-1.
07/12/2023 Department of Mathematics 4/40
Empirical Models

Empirical models
Empirical models Based on the scatter diagram, it is probably reasonable
Simple Linear
to assume that the mean of the random variable Y is
Regression related to x by the following straight-line relationship:

Estimating σ2 E(Y | x)  Y |x  0  1 x
Hypothesis
tests
Confidence
regression coefficients.
intervals
Prediction
Adequacy
The simple linear regression model is given by

Correlation Y   0  1 x  
Summary
random error
07/12/2023 Department of Mathematics 5/40
Empirical Models

Empirical models
Empirical models Suppose that the mean and variance of  are 0 and 2,
respectively, then
Simple Linear
Regression
E(Y | x)  E(0  1x   )  0  1x  E( )  0  1x
Estimating σ2
Hypothesis
tests The variance of Y given x is
Confidence
intervals
V(Y | x) V (0  1x  ) V (0  1x) V ( )  0  2   2
Prediction
Adequacy
The true regression model is a line of mean values:
Correlation

Summary
Y | x   0  1 x

07/12/2023 Department of Mathematics 6/40


Empirical Models

Empirical models
Empirical models

Simple Linear
Regression

Estimating σ2
Hypothesis
tests
Confidence
intervals
Prediction
Adequacy

Correlation

Summary Figure 11-2 The distribution of Y for a given value of x


for the oxygen purity-hydrocarbon data.
07/12/2023 Department of Mathematics 7/40
Simple Linear Regression

Empirical models • The case of simple linear regression considers a single


Simple Linear
Linear
regressor or predictor x and a dependent or response
Simple
Regression
Regression variable Y.

Estimating σ2 • The expected value of Y at each level of x is a random


Hypothesis variable:
E (Y | x)   0  1 x
tests
Confidence
intervals
Prediction
Adequacy • We assume that each observation, Y, can be described
by the model
Correlation
Y   0  1 x  
Summary

07/12/2023 Department of Mathematics 8/40


Simple Linear Regression

Empirical models Suppose that we have n pairs of observations (x1, y1), (x2,
y2), …, (xn, yn):
Simple Linear
Simple Linear
Regression
yi  0  1xi i , i 1,...,n
Regression

Estimating σ2
Hypothesis
tests
Confidence
intervals
Prediction Figure 11-3
Adequacy Deviations of the
data from the
Correlation
estimated
Summary regression model.

07/12/2023 Department of Mathematics 9/40


Simple Linear Regression

Empirical models The method of least squares is used to estimate the


parameters, 0 and 1 by minimizing the sum of the
Simple Linear
Simple Linear
Regression
Regression
squares of the vertical deviations in Figure 11-3.

Estimating σ2
Hypothesis
tests
Confidence
intervals
Prediction Figure 11-3
Adequacy Deviations of the
data from the
Correlation
estimated
Summary regression model.

07/12/2023 Department of Mathematics 10/40


Simple Linear Regression

Empirical models
The sum of the squares of the deviations of the observations
Simple Linear
Simple Linear
from the true regression line is
Regression
Regression
n n 2
Estimating σ2 L    i2   ( yi   0  1 x)
Hypothesis i 1 i 1
tests
Confidence
intervals
Prediction
Adequacy

Correlation

Summary

07/12/2023 Department of Mathematics 11/40


Simple Linear Regression

Empirical models Simplifying these two equations yields


Simple Linear
Simple Linear
Regression
Regression
ˆ0 , ˆ1  ?
Estimating σ2
Hypothesis
tests
Confidence
intervals Notation
Prediction
Adequacy

Correlation
 n  n 
  xi   yi 
S xy   ( yi  y )( xi  x )   xi yi   i 1  i 1 
n n
Summary
i 1 i 1 n
07/12/2023 Department of Mathematics 12/40
Simple Linear Regression

Empirical models
Theorem
Simple Linear
Simple Linear The least squares estimates of the intercept and slope in
Regression
Regression the simple linear regression model are
Estimating σ2
ˆ 0  y  ˆ1 x
Hypothesis
tests
ˆ S xy
Confidence 1 
intervals S xx
Prediction Estimated regression line is
Adequacy

Correlation
yˆ  ˆ0  ˆ1 x

Summary

07/12/2023 Department of Mathematics 13/40


Simple Linear Regression

Empirical models

Simple Linear
Simple Linear
Regression
Regression

Estimating σ2
Hypothesis
tests
Confidence
intervals
Prediction
Adequacy

Correlation

Summary

07/12/2023 Department of Mathematics 14/40


Simple Linear Regression

Empirical models Example


We will fit a simple linear regression model to the oxygen
Simple Linear
Simple Linear
Regression
Regression
purity data in Table 11-1. The following quantities may be
computed:
Estimating σ2
Hypothesis
tests
Confidence
intervals
Prediction
Adequacy

Correlation

Summary

07/12/2023 Department of Mathematics 15/40


Simple Linear Regression

Empirical models

Simple Linear
Simple Linear
Regression
Regression

Estimating σ2
Hypothesis Therefore, the least squares estimates of the slope and
tests intercept are
Confidence
intervals
Prediction
Adequacy

Correlation

Summary

07/12/2023 Department of Mathematics 16/40


Simple Linear Regression

Empirical models The fitted simple linear regression model is


Simple Linear
Simple Linear
Regression
Regression

Estimating σ2
Hypothesis
tests
Confidence
intervals
Prediction
Adequacy

Correlation

Summary

07/12/2023 Department of Mathematics 17/40


Simple Linear Regression Estimating σ2

Empirical models Estimating 2


Simple Linear The error sum of squares is
Regression

Estimating σσ22
Estimating
Hypothesis
tests
Confidence We have
intervals
E(SSE) = (n – 2)2.
Prediction
Adequacy
SSE  SST  ̂1Sxy
Correlation

Summary

07/12/2023 Department of Mathematics 18/40


Simple Linear Regression Estimating σ2

Empirical models Estimating 2


Simple Linear
Regression Theorem
Estimating σσ22
Estimating
An unbiased estimator of 2 is
Hypothesis Standard error
tests
ˆ 2 
SS E ˆ
Confidence
intervals
n2
Prediction where
Adequacy SSE  SST  ̂1Sxy

Correlation

Summary

07/12/2023 Department of Mathematics 19/40


Hypothesis Tests in Simple Linear Regression

Empirical models Test on the β1


Simple Linear
Regression

Estimating σ2
Hypothesis
Hypothesis Test statistic
tests
tests
Confidence
intervals
Prediction
Adequacy has the t distribution with n - 2 degrees of freedom.
Correlation
If |t0| > tα/2, n-2 : reject H0
Summary
If |t0| < tα/2, n-2 : fail to reject H0

07/12/2023 Department of Mathematics 20/40


Hypothesis Tests in Simple Linear Regression

Empirical models Test on the β1


Simple Linear
Regression An important special case

Estimating σ2
Hypothesis
Hypothesis
tests
tests
Confidence
intervals
Prediction These hypotheses relate to the significance of regression.
Adequacy
Failure to reject H0 is equivalent to concluding that there
Correlation is no linear relationship between x and Y.

Summary

07/12/2023 Department of Mathematics 21/40


Hypothesis Tests in Simple Linear Regression

Empirical models Test on the β1


Simple Linear
Regression

Estimating σ2
Hypothesis
Hypothesis
tests
tests
Confidence
intervals
Prediction
Adequacy

Correlation

Figure 11-5 The hypothesis H0: 1 = 0 is not rejected.


Summary

07/12/2023 Department of Mathematics 22/40


Hypothesis Tests in Simple Linear Regression

Empirical models Test on the β1


Simple Linear
Regression

Estimating σ2
Hypothesis
Hypothesis
tests
tests
Confidence
intervals
Prediction
Adequacy

Correlation

Summary
Figure 11-6 The hypothesis H0: 1 = 0 is rejected.
07/12/2023 Department of Mathematics 23/40
Hypothesis Tests in Simple Linear Regression

Empirical models Example


We will test for significance of regression using the model
Simple Linear
Regression
for the oxygen purity data from Table 11-1. The hypotheses
are
Estimating σ2
Hypothesis
Hypothesis
tests
tests
Confidence
intervals
and we will use α = 0.01.
Prediction
Adequacy
Recall
Correlation

Summary Test statistic

07/12/2023 Department of Mathematics 24/40


Hypothesis Tests in Simple Linear Regression

Empirical models tα/2, n-2 = t0.005,18 = 2.88 < |t0|


Simple Linear
Regression If |t0| > tα/2, n-2 : reject H0
Reject H0
Estimating σ2 If |t0| < tα/2, n-2 : fail to reject H0
Hypothesis
Hypothesis Test on the β0
tests
tests
Confidence
intervals
Prediction
Adequacy
Test statistic
Correlation

Summary

07/12/2023 Department of Mathematics 25/40


Confidence Intervals

Empirical models Confidence Intervals on the Slope and Intercept


Under the assumption that the observations are normally
Simple Linear
Regression and independently distributed, a 100(1-α)% confidence
interval on the slope β1 in simple linear regression is
Estimating σ2
Hypothesis ˆ 2 ˆ 2
tests
ˆ1  t / 2 , n  2   1  ˆ1  t / 2 , n  2
S xx S xx
Confidence
Confidence
intervals
intervals
Prediction
Similarly, a 100(1-α)% confidence interval on the
Adequacy intercept β0 is

Correlation 1 x2   1 x 2

ˆ0  t / 2,n 2 ˆ  
2
   0  ˆ t
 0  / 2, n  2 ˆ 2
  
Summary  n S xx   n S xx 

07/12/2023 Department of Mathematics 26/40


Confidence Intervals

Empirical models Example


We will find a 95% confidence interval on the slope of the
Simple Linear
Regression
regression line using the data in Table 11-1.

Estimating σ2 Recall
Hypothesis
tests CI 95% for β1 :
Confidence
Confidence
intervals
intervals
Prediction
Adequacy

Correlation

Summary

07/12/2023 Department of Mathematics 27/40


Confidence Intervals

Empirical models Confidence Interval on the Mean Response


Simple Linear
Regression

Estimating σ2 A 100(1-α)% confidence interval about the mean


Hypothesis response at the value of x=x0 is given by
tests
 1 ( x  x ) 2

Confidence
Confidence
intervals
intervals
ˆY|x  t / 2,n2 ˆ  
2 0

 n S 
0
xx
Prediction
Adequacy
 1 ( x  x ) 2

 Y|x0  ˆY |x0  t / 2,n2 ˆ  
2 0

Correlation  n S xx 
Summary

07/12/2023 Department of Mathematics 28/40


Confidence Intervals

Empirical models Example


We will find a 95% confidence interval about the mean
Simple Linear
Regression
response for the data in Table 11-1.

Estimating σ2 The fitted model is


Hypothesis
tests If we are interested in predicting mean oxygen
Confidence
Confidence purity when x0 = 100% then
intervals
intervals
Prediction
Adequacy

Correlation CI 95% on

Summary

07/12/2023 Department of Mathematics 29/40


Confidence Intervals

Empirical models

Simple Linear
Regression

Estimating σ2
Hypothesis
tests
Confidence
Confidence
intervals
intervals Scatter diagram of
Prediction oxygen purity with
Adequacy
fitted regression line
Correlation and 95% confidence
limits on Y|x0.
Summary

07/12/2023 Department of Mathematics 30/40


Prediction of New Observations

Empirical models A 100(1-α)% prediction interval on a future observation


Simple Linear
Y0 at the value x0 is given by
Regression
 1 ( x0  x ) 2 
yˆ 0  t / 2,n  2 ˆ 1  
2

Estimating σ2  n S xx 
Hypothesis
tests  1 ( x  x ) 2

Confidence
 Y0  yˆ 0  t / 2,n  2 ˆ 1  
2 0

intervals
 n S xx 
Prediction
Prediction
Adequacy
Return to Table 11.1, confidence interval 95% on
Y0 at x0 = 100%
Correlation

Summary

07/12/2023 Department of Mathematics 31/40


Prediction of New Observations

Empirical models

Simple Linear
Regression

Estimating σ2
Hypothesis
tests
Confidence
intervals
Prediction
Prediction
Adequacy

Correlation
Scatter diagram of oxygen purity data from Table 11.1
Summary with fitted regression line, 95% prediction limits, and
95% confidence limits on Y|x0.
07/12/2023 Department of Mathematics 32/40
Adequacy of the Regression Model

Empirical models
• Fitting a regression model requires several
Simple Linear
Regression
assumptions.
1. Errors are uncorrelated random variables with
Estimating σ2
Hypothesis mean zero;
tests
Confidence
2. Errors have constant variance; and,
intervals
3. Errors be normally distributed.
Prediction
Adequacy
Adequacy • The analyst should always consider the validity of
Correlation
these assumptions to be doubtful and conduct
analyses to examine the adequacy of the model
Summary

07/12/2023 Department of Mathematics 33/40


Adequacy of the Regression Model

Empirical models Coefficient of Determination (R2)

Simple Linear
• The quantity
Regression

Estimating σ2
Hypothesis
tests
Confidence is called the coefficient of determination and is often
intervals used to judge the adequacy of a regression model.
Prediction
Adequacy
Adequacy

Correlation
• 0  R2  1;
• We often refer to R2 as the amount of variability in the
Summary
data explained or accounted for by the regression model.
07/12/2023 Department of Mathematics 34/40
Adequacy of the Regression Model

Empirical models Example


For the oxygen purity regression model,
Simple Linear
Regression R2 = SSR/SST
Estimating σ2 = 152.13/173.38
Hypothesis
tests = 0.877
Confidence
intervals Thus, the model accounts for 87.7% of the variability in the
Prediction data.
Adequacy
Adequacy

Correlation

Summary

07/12/2023 Department of Mathematics 35/40


Correlation

Empirical models Definition


Simple Linear
The sample correlation coefficient
Regression n

(X i  X )(Yi  Y )
S XY
Estimating σ2 R i 1

n n
S XX SST
(X  (Y  Y )
Hypothesis
i  X) 2
i
2
tests i 1 i 1
Confidence
intervals
Prediction
Adequacy Note that
Correlation
Correlation

We may also write:


Summary

07/12/2023 Department of Mathematics 36/40


Correlation

Empirical models
Properties of the Linear Correlation Coefficient r
Simple Linear
Regression
1. –1  r  1
2. The value of r does not change if all values of either
Estimating σ2
Hypothesis
variable are converted to a different scale.
tests 3. The value of r is not affected by the choice of x and y.
Confidence
intervals
Interchange all x- and y-values and the value of r will
Prediction not change.
Adequacy
4. r measures strength of a linear relationship.
Correlation
Correlation

Summary

07/12/2023 Department of Mathematics 37/40


Correlation
y y
Empirical models r = 0.91
r = 0.88
Simple Linear
Regression

Estimating σ2
x x
Hypothesis
tests
Confidence Strong negative correlation Strong positive correlation
y y
intervals
Prediction r = 0.07
r = 0.42
Adequacy

Correlation
Correlation

Summary x x

Weak positive correlation Nonlinear Correlation


07/12/2023 Department of Mathematics 38/40
Correlation

Empirical models Test on the ρ


Simple Linear
Regression

R n2
Estimating σ2 Test statistic T0 
Hypothesis 1 R2
tests
Confidence has the t distribution with n - 2 degrees of freedom.
intervals
Prediction
Adequacy If |t0| > tα/2, n-2 : reject H0
Correlation
Correlation If |t0| < tα/2, n-2 : fail to reject H0
Summary

07/12/2023 Department of Mathematics 39/40


Correlation

Empirical models Test on the ρ


Simple Linear
Regression

Estimating σ2 Test statistic Z 0  (arctanh R  arctanh 0 ) n  3


Hypothesis
tests
Confidence
intervals If |t0| > zα/2 : reject H0
Prediction
Adequacy If |t0| < zα/2 : fail to reject H0
Correlation
Correlation

Summary

07/12/2023 Department of Mathematics 40/40


SUMMARY

Empirical models We have studied:


Simple Linear
Regression 1. Empirical Models
Estimating σ2 2. Simple Linear Regression
Hypothesis
tests 3. Correlation
Confidence
intervals
Prediction
Adequacy Homework: Read slides of the next lecture.
Correlation

Summary
Summary

07/12/2023 Department of Mathematics 41/40

You might also like