Week 13
Week 13
Topic 8
Simple Linear Regression
Tutorial Week13
1
Outline
2
Covariance
◼ How do we measure the degree of linear association
between two variables 𝑋 and 𝑌?
◼ The answer to this question is the covariance
❑ A quantity that measures the linear association
◼ Population covariance
σ𝑁
𝑖=1(𝑋𝑖 −𝜇𝑋 )(𝑌𝑖 −𝜇𝑌 )
𝜎𝑋𝑌 = 𝑁
◼ Sample covariance
σ𝑛 ത ത
𝑖=1(𝑋𝑖 −𝑋)(𝑌𝑖 −𝑌)
𝑆𝑋𝑌 = 𝑛−1
❑ An estimator of 𝜎𝑋𝑌 based on 𝑛 pairs of sample values
3
Coefficient of Correlation
❑ An estimator of 𝜌𝑋𝑌
◼ The sign of 𝜌𝑋𝑌 (𝑟𝑋𝑌 ) is the same as that of 𝜎𝑋𝑌 (𝑆𝑋𝑌 )
❑ As the denominator of 𝜌𝑋𝑌 is always non-negative
4
Coefficient of Correlation
Cont’d
◼ It can be shown it is always the case that
−1 ≤ 𝜌𝑋𝑌 ≤ 1 and −1 ≤ 𝑟𝑋𝑌 ≤ 1
◼ Three special values of 𝜌𝑋𝑌 and 𝑟𝑋𝑌 are of interest
❑ When 𝜌𝑋𝑌 = 0 (𝑟𝑋𝑌 = 0), 𝑋 and 𝑌 are not linearly related, and
we say that 𝑋 and 𝑌 are uncorrelated in the population (sample)
❑ When all population (sample) values of 𝑋 and 𝑌 lie exactly on a
straight line having a positive slope, then 𝜌𝑋𝑌 = 1 (𝑟𝑋𝑌 = 1)
❑ When all population (sample) values of 𝑋 and 𝑌 lie exactly on a
straight line having a negative slope, then 𝜌𝑋𝑌 = −1 (𝑟𝑋𝑌 = −1)
◼ If the population (sample) values of 𝑋 and 𝑌 lie close to a
straight line, then 𝜌𝑋𝑌 (𝑟𝑋𝑌 ) will be close to 1 or -1
5
Coefficient of Correlation
Quadran 𝑋ത Quadran
t II tI
𝑌ത
Quadrant Quadrant
III IV
7
Least Squares Estimation
Cont’d
8
Least Squares Estimation
Cont’d
𝑆𝑌 σ𝑛 ത 2
𝑖=1(𝑌𝑖 −𝑌)
𝑏1 = 𝑟𝑋𝑌 = 𝑟𝑋𝑌
𝑆𝑋
σ𝑛 ത 2
𝑖=1(𝑋𝑖 −𝑋)
9
Developing Regression Model
in Excel Cont’d
◼ Output
|𝒓𝑿𝒀|
SSE
𝒃𝟎
𝒃𝟏
10
Coefficient of Determination
Cont’d
11
Coefficient of Determination
◼ The goal is to determine by how much the SSE is smaller
than SST
❑ Or, the amount of improvement in using the regression line and
the independent variable 𝑋 rather than just the sample mean to
predict 𝑌
◼ This measure is provided through a statistic called the
coefficient of determination (𝑅2 )
𝑆𝑆𝐸
𝑅2 = 1 − 𝑆𝑆𝑇
❑ 𝑅 2 is unit-free with value in between 0 and 1 inclusive
❑ The higher the 𝑅 2 , the better the fitting (the stronger linear
association between 𝑋 and 𝑌)
❑ However, it does not mean that 𝑋 causes 𝑌
12
Coefficient of Determination
13
Coefficient of Determination
Cont’d
14
Inferences about the Slope
◼ At times, tests concerning 𝛽1 are of interest, particularly
one of the forms: H0: 𝛽1 = 0 vs H1: 𝛽1 ≠ 0
◼ If 𝛽1 = 0, there is no linear relationship between 𝑋 and 𝑌
❑ The means of the probability distribution of 𝑌 are all equal,
namely 𝐸 𝑌 𝑋 = 𝑥 = 𝛽0 + 0𝑥 = 𝛽0 for all levels of 𝑋
❑ A change in 𝑋 does not induce any change in 𝑌
◼ Similar to those discussed in Topics 6 & 7, we need to
consider the sampling distribution of 𝑏1, the least squares
point estimate of 𝛽1 , in order to perform the inferences
on 𝛽1
15
Inferences about the Slope
Cont’d
◼ Sampling distribution of 𝑏1
❑ Since the 𝑌𝑖 are normal, the estimator 𝑏1 is also normal. It can be
shown that 𝑏1 has mean and variance
𝜎2
𝐸 𝑏1 = 𝛽1 𝜎𝑏21 = σ𝑛
𝑖=1 𝑋𝑖 −𝑋ത 2
19
Inferences about the Slope –
Exercise Cont’d
21
Inferences about the Slope –
Exercise Cont’d
◼ Output
|𝒓𝑿𝒀|
𝑹𝟐
𝑺𝒆
𝒏
SSE
SST
𝒃𝟎
𝒃𝟏 𝑺𝒃𝟏
24
Multiple Linear Regression
25