Chapter_8_Linear_regression (1)
Chapter_8_Linear_regression (1)
E-mail: [email protected]
(Phan Thi Khanh Van) Chapter 8: Linear regression April 28, 2025 1 / 22
Table of Contents
1 Introduction
4 Confidence interval
(Phan Thi Khanh Van) Chapter 8: Linear regression April 28, 2025 2 / 22
Introduction
Many problems in engineering and the sciences involve a study or analysis of the
relationship between two or more variables. For example:
The electrical energy consumption of a house (y ) is related to the size of the house
(x, in square feet) - nondeterministic relationship
Regression analysis
The collection of statistical tools that are used to model and explore relationships between
variables that are related in a nondeterministic manner is called regression analysis.
(Phan Thi Khanh Van) Chapter 8: Linear regression April 28, 2025 3 / 22
Assume a linear relationship
E (Y |x) = µY |x = β0 + β1 x, Y = β0 + β1 x + ε
(Phan Thi Khanh Van) Chapter 8: Linear regression April 28, 2025 4 / 22
Covariance and correlation
Covariance
The covariance between the random variables X and Y , denoted as cov (X , Y ) or σXY ,
is
cov (X , Y ) = σXY = E [(X − E (X ))(Y − E (Y ))] = E (XY ) − E (X )E (Y )
Correlation
The correlation between the random variables X and Y is
cov (X , Y ) σXY
ρXY = p = , −1 ≤ ρXY ≤ 1.
V (X )V (Y ) σX σY
Remark
If ρXY ≈ 1 (or −1), X and Y tend to fall along a line of positive (or negative) slope. If
ρXY 6= 0: X and Y are called correlated. Covariance and correlations are measures of
the linear relationship between random variables. If X and Y are independent, then
ρXY = 0.
(Phan Thi Khanh Van) Chapter 8: Linear regression April 28, 2025 5 / 22
Example
Find cov (X , Y ), ρXY , if the joint probability mass function of X , Y is
P p
E (X ) = xi f (xi ) = 1.8125, σX = E (X 2 ) − (E (X ))2 = 0.7043.
P p
E (Y ) = yi f (yi ) = 2.875, σY = E (Y 2 ) − (E (Y ))2 = 1.3636.
P
E (XY ) = xi yi fXY (xi , yi ) = 6.125.
σXY = E (XY ) − E (X )E (Y ) = 0.9141.
σXY 0.9141
ρXY = = = 0.9518
σX σY 0.7043 · 1.3636
(Phan Thi Khanh Van) Chapter 8: Linear regression April 28, 2025 6 / 22
X Y1 Y2 ··· Yh ni
X1 n11 n12 ··· n1h n1
X2 n21 n22 ··· n2h n2
··· ··· ··· ··· ··· ···
Xk nk1 nk2 ··· nkh n
P k
mj m1 m2 ··· mh ni = n
1
P
E (Y |X = Xi ) = ni
Yj nij
j
X E (Y |X ) ni
X1 E (Y |X = X1 ) n1
X2 E (Y |X = X2 ) n2
··· ··· ···
Xk E (Y |X = Xk ) nk
(Phan Thi Khanh Van) Chapter 8: Linear regression April 28, 2025 7 / 22
Empirical Models
(Phan Thi Khanh Van) Chapter 8: Linear regression April 28, 2025 8 / 22
Least square method
(Phan Thi Khanh Van) Chapter 8: Linear regression April 28, 2025 9 / 22
Least square method
n
ε2i → min .
P
We have to find the coefficients such that:
i=1
(Phan Thi Khanh Van) Chapter 8: Linear regression April 28, 2025 10 / 22
Least squares estimates in the simple linear regression
Suppose that we have n pairs of observations (x1 , y1 ), (x2 , y2 ), ..., (xn , yn ). We have to
find the linear regression model for the data as
yi = β0 + β1 xi + εi , i = 1, 2, ..., n
The sum of the squares of the deviations of the observations from the true regression line
n n
ε2i = (yi − β0 − β1 xi )2 → min.
P P
is L =
i=1 i=1
The least squares estimators β̂0 , β̂1 must satisfy
n
∂L
P
= −2 (yi − β̂0 − β̂1 xi ) = 0
∂β0 β̂0 ,β̂1
i=1
n
∂L
P
∂β1 β̂0 ,β̂1
= −2
(yi − β̂0 − β̂1 xi )xi = 0
i=1
Pn Pn
nβ̂0 + β̂1
xi = yi
i=1 i=1
⇔ n
P n
P 2 P n .
β̂0 xi + β̂1 xi = xi yi
i=1 i=1 i=1
(Phan Thi Khanh Van) Chapter 8: Linear regression April 28, 2025 11 / 22
Least squares estimates in the simple linear regression
(Phan Thi Khanh Van) Chapter 8: Linear regression April 28, 2025 12 / 22
Special symbols
ei = yi − ŷi is the error in the fit of the model to the ith observation yi and it is
called residual.
n
n n 2
Sxx = (xi − x̄)2 =
P 2
xi − n1
P P
xi
i=1 i=1 i=1
n
n
n n
1
P P P P
Sxy = (xi − x̄)(yi − ȳ ) = yi xi − n xi yi
i=1 i=1 i=1 i=1
n n
ei2 2
P P
SSE = = (yi − ŷi ) - The sum of squares of the residuals
i=1 i=1
n n
(yi − ȳ )2 = yi2 − nȳ 2 = Syy - The total sum of square of the
P P
SST =
i=1 i=1
response variable
n
SSR = (ŷi − ȳ )2 = bSxy - The sum of squares for regression
P
i=1
Fundamental identity SST = SSE + SSR
r2 = 1 − SSE
SST
= ρ2XY is called the coefficient of determination
(Phan Thi Khanh Van) Chapter 8: Linear regression April 28, 2025 13 / 22
Least squares estimates in the simple linear regression
(Phan Thi Khanh Van) Chapter 8: Linear regression April 28, 2025 14 / 22
Least squares estimates in the simple linear regression
Estimator of Variance
An unbiased estimator of σ 2 (the variance of the error term ε) is
SSE SST − β̂1 Sxy
σ̂ 2 = =
n−2 n−2
(Phan Thi Khanh Van) Chapter 8: Linear regression April 28, 2025 15 / 22
Example
The biochemical oxygen demand (BOD) test is conducted over a period of time in days.
The resulting data for X : Time (days) and Y BOD (mg/liter) follow:
x 1 2 4 6 8 10 12 14 16 18 20
y 0.6 0.7 1.5 1.9 2.1 2.6 2.9 3.7 3.5 3.7 3.8
a) Assuming that a simple linear regression model is appropriate, fit the regression model
relating BOD y to x. What is the estimate of σ 2 ?
b)What is the estimate of expected BOD level when the time is 15 days?
c) What change in mean BOD is expected when the time changes by three days?
d) Suppose that the time used is six days. Calculate the fitted value of y and the
corresponding residual.
e) Calculate the fitted ŷi for each value of xi used to fit the model. Then construct a
graph of ŷi versus the corresponding observed values yi and comment on what this plot
would look like if the relationship between y and x was a deterministic (no random error)
straight line.
(Phan Thi Khanh Van) Chapter 8: Linear regression April 28, 2025 16 / 22
n n n n n
xi = 111, yi = 27, xi2 = 1541, yi2 = 80.36,
P P P P P
a) xi yi = 347.4.
i=1 i=1
i=1
i=1 i=1
n n n
yi xi − n1
P P P
Sxy = xi yi = 74.9455
i=1
n
n
i=1
2
i=1
xi2 − n1
P P
Sxx = xi = 420.9091
i=1 i=1
Sxy
β̂1 = = 0.1781.
Sxx
β̂0 = ȳ − β̂1 x̄ = 0.6578.
The estimator for σ 2 : σ̂ 2 = 0.0822.
b) If x = 15, then ŷ = 0.6578 + 0.1781 · 15 = 3.3294.
c) If ∆x = 3 then ∆ŷ = 0.1781 · ∆x = 0.5343.
d) If x = 6, then ŷ = 0.6578 + 0.1781 · 6 = 1.7264.
The corresponding residual: e = 1.9 − 1.7264 = 0.1736.
(Phan Thi Khanh Van) Chapter 8: Linear regression April 28, 2025 17 / 22
x 1 2 4 6 8 10 12 14 16 18 20
e)
ŷ 0.84 1.01 1.37 1.73 2.08 2.44 2.8 3.15 3.51 3.86 4.22
(Phan Thi Khanh Van) Chapter 8: Linear regression April 28, 2025 18 / 22
Properties of the least square estimators
Intercept Properties
σ 2 µxx β0 −β0 ˆ
E (βˆ0 ) = β0 , V (βˆ0 ) = Sxx
, and T = √ ∼ tn−2
σ̂ µxx /Sxx
Slope Properties
σ2 βˆ1√
−β1
E (βˆ1 ) = β1 , V (βˆ1 ) = Sxx
, and T = σ̂/ Sxx
∼ tn−2
(Phan Thi Khanh Van) Chapter 8: Linear regression April 28, 2025 19 / 22
Example
The biochemical oxygen demand (BOD) test is conducted over a period of time in days.
The resulting data for X : Time (days) and Y BOD (mg/liter) follow:
x 1 2 4 6 8 10 12 14 16 18 20
y 0.6 0.7 1.5 1.9 2.1 2.6 2.9 3.7 3.5 3.7 3.8
Find 96% confidence intervals for β0 and β1 .
A 96% CI for β0 :
A 96% CI for β1 :
(Phan Thi Khanh Van) Chapter 8: Linear regression April 28, 2025 20 / 22
Hypothesis testing
ˆ
H̄ : β0 < b, Wα : Tob = √β0 −b < −Tn−2,α
σ̂ µxx /Sxx
ˆ
H̄ : β0 > b, Wα : Tob = √β0 −b > Tn−2,α
σ̂ µxx /Sxx
ˆ
H̄ : β0 6= b, Wα : |Tob | = √β0 −b > Tn−2,α/2
σ̂ µxx /Sxx
(Phan Thi Khanh Van) Chapter 8: Linear regression April 28, 2025 21 / 22
Thank you for your attention!
(Phan Thi Khanh Van) Chapter 8: Linear regression April 28, 2025 22 / 22