0% found this document useful (0 votes)
1 views

Chapter_8_Linear_regression (1)

Chapter 8 discusses linear regression, a statistical method used to model relationships between variables. It covers concepts such as covariance, correlation, the least squares method, and hypothesis testing in simple linear regression. The chapter also provides examples and formulas for estimating regression coefficients and variance.

Uploaded by

dungbeotb17
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

Chapter_8_Linear_regression (1)

Chapter 8 discusses linear regression, a statistical method used to model relationships between variables. It covers concepts such as covariance, correlation, the least squares method, and hypothesis testing in simple linear regression. The chapter also provides examples and formulas for estimating regression coefficients and variance.

Uploaded by

dungbeotb17
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Chapter 8: Linear regression

Phan Thi Khanh Van

E-mail: [email protected]

April 28, 2025

(Phan Thi Khanh Van) Chapter 8: Linear regression April 28, 2025 1 / 22
Table of Contents

1 Introduction

2 Covariance and correlation

3 Linear regression method


Empirical Models
Least square method
Linear regresstion method

4 Confidence interval

5 Hypothesis Tests in Simple Linear Regression

(Phan Thi Khanh Van) Chapter 8: Linear regression April 28, 2025 2 / 22
Introduction

Many problems in engineering and the sciences involve a study or analysis of the
relationship between two or more variables. For example:

The pressure of a gas (P) in a container is related to the temperature (T ):


P = kT - deterministic linear relationship.

The displacement of a particle at a certain time is related to its velocity:


dt = d0 + vt - deterministic linear relationship.

The electrical energy consumption of a house (y ) is related to the size of the house
(x, in square feet) - nondeterministic relationship

The fuel usage of an automobile (y ) is related to the vehicle weight x -


nondeterministic relationship

Regression analysis
The collection of statistical tools that are used to model and explore relationships between
variables that are related in a nondeterministic manner is called regression analysis.

(Phan Thi Khanh Van) Chapter 8: Linear regression April 28, 2025 3 / 22
Assume a linear relationship

E (Y |x) = µY |x = β0 + β1 x, Y = β0 + β1 x + ε
(Phan Thi Khanh Van) Chapter 8: Linear regression April 28, 2025 4 / 22
Covariance and correlation

Covariance
The covariance between the random variables X and Y , denoted as cov (X , Y ) or σXY ,
is
cov (X , Y ) = σXY = E [(X − E (X ))(Y − E (Y ))] = E (XY ) − E (X )E (Y )

Correlation
The correlation between the random variables X and Y is
cov (X , Y ) σXY
ρXY = p = , −1 ≤ ρXY ≤ 1.
V (X )V (Y ) σX σY

Remark
If ρXY ≈ 1 (or −1), X and Y tend to fall along a line of positive (or negative) slope. If
ρXY 6= 0: X and Y are called correlated. Covariance and correlations are measures of
the linear relationship between random variables. If X and Y are independent, then
ρXY = 0.

(Phan Thi Khanh Van) Chapter 8: Linear regression April 28, 2025 5 / 22
Example
Find cov (X , Y ), ρXY , if the joint probability mass function of X , Y is

P p
E (X ) = xi f (xi ) = 1.8125, σX = E (X 2 ) − (E (X ))2 = 0.7043.
P p
E (Y ) = yi f (yi ) = 2.875, σY = E (Y 2 ) − (E (Y ))2 = 1.3636.
P
E (XY ) = xi yi fXY (xi , yi ) = 6.125.
σXY = E (XY ) − E (X )E (Y ) = 0.9141.
σXY 0.9141
ρXY = = = 0.9518
σX σY 0.7043 · 1.3636

(Phan Thi Khanh Van) Chapter 8: Linear regression April 28, 2025 6 / 22
X Y1 Y2 ··· Yh ni
X1 n11 n12 ··· n1h n1
X2 n21 n22 ··· n2h n2
··· ··· ··· ··· ··· ···
Xk nk1 nk2 ··· nkh n
P k
mj m1 m2 ··· mh ni = n
1
P
E (Y |X = Xi ) = ni
Yj nij
j

X E (Y |X ) ni
X1 E (Y |X = X1 ) n1
X2 E (Y |X = X2 ) n2
··· ··· ···
Xk E (Y |X = Xk ) nk

(Phan Thi Khanh Van) Chapter 8: Linear regression April 28, 2025 7 / 22
Empirical Models

Simple linear regression model


Y = β0 + β1 x + ε,
where ε is a random error term, the slope and intercept of the line are called
regression coefficients. The model is simple because it has only 1 independent
(regressor) variable and a dependent (respone variable).

Suppose that ε is of mean 0 and variance σ 2 . Then,

E (Y |x) = E (β0 + β1 x + ε) = β0 + β1 x + E (ε) = β0 + β1 x.


V (Y |x) = V (β0 + β1 x + ε) = V (ε) = σ 2

Thus, the true regression model µY |x = β0 + β1 x is a line of mean values.


If we have no theoretical knowledge of the relationship between x and y and will base the
choice of the model on inspection of a scatter diagram, such as we did with the oxygen
purity data. We then think of the regression model as an empirical model (basing on
experiences).

(Phan Thi Khanh Van) Chapter 8: Linear regression April 28, 2025 8 / 22
Least square method

(Phan Thi Khanh Van) Chapter 8: Linear regression April 28, 2025 9 / 22
Least square method

Least square method


The least squares method is a statistical procedure to find the best fit for a set of data
points by minimizing the sum of the squares of the vertical deviations.

n
ε2i → min .
P
We have to find the coefficients such that:
i=1

(Phan Thi Khanh Van) Chapter 8: Linear regression April 28, 2025 10 / 22
Least squares estimates in the simple linear regression

Suppose that we have n pairs of observations (x1 , y1 ), (x2 , y2 ), ..., (xn , yn ). We have to
find the linear regression model for the data as

yi = β0 + β1 xi + εi , i = 1, 2, ..., n

The sum of the squares of the deviations of the observations from the true regression line
n n
ε2i = (yi − β0 − β1 xi )2 → min.
P P
is L =
i=1 i=1
The least squares estimators β̂0 , β̂1 must satisfy
 n
 ∂L
P
= −2 (yi − β̂0 − β̂1 xi ) = 0

 ∂β0 β̂0 ,β̂1
i=1
n
∂L
P


 ∂β1 β̂0 ,β̂1
= −2
(yi − β̂0 − β̂1 xi )xi = 0
 i=1
Pn Pn
nβ̂0 + β̂1

 xi = yi
i=1 i=1
⇔ n
P n
P 2 P n .
β̂0 xi + β̂1 xi = xi yi


i=1 i=1 i=1

(Phan Thi Khanh Van) Chapter 8: Linear regression April 28, 2025 11 / 22
Least squares estimates in the simple linear regression

Least squares estimates in the simple linear regression


The least squares estimates of the intercept and slope in the simple linear regression
model are
S
β̂0 = ȳ − β̂1 x̄, β̂1 = Sxy
xx
n  n 
where x̄ = n1 xi , ȳ = n1
P P
yi ,
i=1  i=1   n  n 
n n
yi xi − n1
P P P P
Sxy = (xi − x̄)(yi − ȳ ) = xi yi ,
i=1 i=1 i=1 i=1
n
 n
  n
 2
Sxx = (xi − x̄)2 = xi2 − n1
P P P
xi
i=1 i=1 i=1
The fitted or estimated regression line is therefore
ŷ = β̂0 + β̂1 x.

(Phan Thi Khanh Van) Chapter 8: Linear regression April 28, 2025 12 / 22
Special symbols

ei = yi − ŷi is the error in the fit of the model to the ith observation yi and it is
called residual.
n
n   n 2
Sxx = (xi − x̄)2 =
P 2
xi − n1
P P
xi
i=1 i=1 i=1
n
 n
  n  n 
1
P P P P
Sxy = (xi − x̄)(yi − ȳ ) = yi xi − n xi yi
i=1 i=1 i=1 i=1
n n
ei2 2
P P
SSE = = (yi − ŷi ) - The sum of squares of the residuals
i=1 i=1
n n
(yi − ȳ )2 = yi2 − nȳ 2 = Syy - The total sum of square of the
P P
SST =
i=1 i=1
response variable
n
SSR = (ŷi − ȳ )2 = bSxy - The sum of squares for regression
P
i=1
Fundamental identity SST = SSE + SSR

r2 = 1 − SSE
SST
= ρ2XY is called the coefficient of determination

(Phan Thi Khanh Van) Chapter 8: Linear regression April 28, 2025 13 / 22
Least squares estimates in the simple linear regression

Example: Oxygen Purity


n = 20
20
P 20
P
xi = 23.92, yi = 1, 843.21
i=1 i=1
x̄ = 1.196, ȳ = 92.1605
20 20
yi2 = 170, 044.5321, xi2 = 29.2892.
P P
i=1 i=1
P20
xi yi = 2, 214.6566.
i=1
β̂1 = 14.9475, β̂0 = 74.2733.
The estimated regression line:
ŷ = βˆ0 + βˆ1 x.
If x = 1.5%, then
ŷ = 74.2833 + 14.9475 · 1.5 ≈ 96.7045.

(Phan Thi Khanh Van) Chapter 8: Linear regression April 28, 2025 14 / 22
Least squares estimates in the simple linear regression

Estimator of Variance
An unbiased estimator of σ 2 (the variance of the error term ε) is
SSE SST − β̂1 Sxy
σ̂ 2 = =
n−2 n−2

Example: Oxygen Purity


An unbiased estimator of σ 2 in the example of Oxygen Purity:
n
n  n   n 
yi2 − nȳ 2 − β̂1 xi yi − n1
P P P P
xi yi
i=1 i=1 i=1 i=1
σ̂ 2 = ≈ 1.1805
n−2

(Phan Thi Khanh Van) Chapter 8: Linear regression April 28, 2025 15 / 22
Example
The biochemical oxygen demand (BOD) test is conducted over a period of time in days.
The resulting data for X : Time (days) and Y BOD (mg/liter) follow:
x 1 2 4 6 8 10 12 14 16 18 20
y 0.6 0.7 1.5 1.9 2.1 2.6 2.9 3.7 3.5 3.7 3.8
a) Assuming that a simple linear regression model is appropriate, fit the regression model
relating BOD y to x. What is the estimate of σ 2 ?
b)What is the estimate of expected BOD level when the time is 15 days?
c) What change in mean BOD is expected when the time changes by three days?
d) Suppose that the time used is six days. Calculate the fitted value of y and the
corresponding residual.
e) Calculate the fitted ŷi for each value of xi used to fit the model. Then construct a
graph of ŷi versus the corresponding observed values yi and comment on what this plot
would look like if the relationship between y and x was a deterministic (no random error)
straight line.

(Phan Thi Khanh Van) Chapter 8: Linear regression April 28, 2025 16 / 22
n n n n n
xi = 111, yi = 27, xi2 = 1541, yi2 = 80.36,
P P P P P
a) xi yi = 347.4.
i=1  i=1
  i=1
   i=1 i=1
n n n
yi xi − n1
P P P
Sxy = xi yi = 74.9455
i=1
n
  n
i=1
 2
i=1

xi2 − n1
P P
Sxx = xi = 420.9091
i=1 i=1
Sxy
β̂1 = = 0.1781.
Sxx
β̂0 = ȳ − β̂1 x̄ = 0.6578.
The estimator for σ 2 : σ̂ 2 = 0.0822.
b) If x = 15, then ŷ = 0.6578 + 0.1781 · 15 = 3.3294.
c) If ∆x = 3 then ∆ŷ = 0.1781 · ∆x = 0.5343.
d) If x = 6, then ŷ = 0.6578 + 0.1781 · 6 = 1.7264.
The corresponding residual: e = 1.9 − 1.7264 = 0.1736.

(Phan Thi Khanh Van) Chapter 8: Linear regression April 28, 2025 17 / 22
x 1 2 4 6 8 10 12 14 16 18 20
e)
ŷ 0.84 1.01 1.37 1.73 2.08 2.44 2.8 3.15 3.51 3.86 4.22

(Phan Thi Khanh Van) Chapter 8: Linear regression April 28, 2025 18 / 22
Properties of the least square estimators

Intercept Properties
σ 2 µxx β0 −β0 ˆ
E (βˆ0 ) = β0 , V (βˆ0 ) = Sxx
, and T = √ ∼ tn−2
σ̂ µxx /Sxx

A 100(1 − α)% confidence interval for the intercept


q q
βˆ0 − tα/2,n−2 σ̂ µSxxxx ≤ β0 ≤ βˆ0 + tα/2,n−2 σ̂ µSxxxx

Slope Properties
σ2 βˆ1√
−β1
E (βˆ1 ) = β1 , V (βˆ1 ) = Sxx
, and T = σ̂/ Sxx
∼ tn−2

A 100(1 − α)% confidence interval for the intercept


βˆ1 − tα/2,n−2 √σ̂S ≤ β1 ≤ βˆ1 + tα/2,n−2 √σ̂S
xx xx

(Phan Thi Khanh Van) Chapter 8: Linear regression April 28, 2025 19 / 22
Example
The biochemical oxygen demand (BOD) test is conducted over a period of time in days.
The resulting data for X : Time (days) and Y BOD (mg/liter) follow:
x 1 2 4 6 8 10 12 14 16 18 20
y 0.6 0.7 1.5 1.9 2.1 2.6 2.9 3.7 3.5 3.7 3.8
Find 96% confidence intervals for β0 and β1 .

α = 0.04, n = 11. t0.02,9 = 2.398.


σ̂ = 0.2873, Sxx = 420.9091, µxx = 140.0909.
β̂1 = 0.1781, β̂0 = 0.6578.
q
Eβ0 = tv /2,n−2 σ̂ µSxxxx = 0.3974, Eβ1 = tv /2,n−2 √σ̂S = 0.0336
xx

A 96% CI for β0 :

β0 ∈ [0.6578 − 0.3974, 0.6578 + 0.3974] = [0.2604, 1.0552]

A 96% CI for β1 :

β1 ∈ [0.1781 − 0.0336, 0.1781 + 0.00336] = [0.1445, 0.2117]

(Phan Thi Khanh Van) Chapter 8: Linear regression April 28, 2025 20 / 22
Hypothesis testing

Hypothesis testing for the intercept


Null hypothesis: H : β0 = b where b is a given value. We have alternative hypothesis
H̄ : β0 < b, H̄ : β0 > b, H̄ : β0 6= b.
We use t-test because
β0 −β0 ˆ
T = √ ∼ tn−2
σ̂ µxx /Sxx

ˆ
H̄ : β0 < b, Wα : Tob = √β0 −b < −Tn−2,α
σ̂ µxx /Sxx
ˆ
H̄ : β0 > b, Wα : Tob = √β0 −b > Tn−2,α
σ̂ µxx /Sxx

ˆ
H̄ : β0 6= b, Wα : |Tob | = √β0 −b > Tn−2,α/2
σ̂ µxx /Sxx

(Phan Thi Khanh Van) Chapter 8: Linear regression April 28, 2025 21 / 22
Thank you for your attention!

(Phan Thi Khanh Van) Chapter 8: Linear regression April 28, 2025 22 / 22

You might also like