Chapter 2 Simple Linear Regression - Jan2023
Chapter 2 Simple Linear Regression - Jan2023
Chapter 2
At the end of this chapter students
should be able to understand
1
Overview
2.1 Background
2.2 Introduction
2.3 Regression
2.4 Least Squares Method
2.5 Simple Linear Regression (SLR)
2.6 Software Output
2.7 ANOVA
2.8 Model Evaluation
2.9 Applications/Examples
2
2.1 Background - Regression
3
2.1 Background – Regression Model
4
2.1 Background – Types of Regression
Regression
1 Variable Models 2+ Variables
Simple Multiple
Non- Non-
Linear Linear
Linear Linear
5
2.1 Background – Types of Regression
Bivariate or simple
regression model
(Education) x y (Income)
6
2.2 Introduction – Simple Regression
7
2.2 Introduction – Simple Regression
Man hours
The goal is to find a functional 180
relation between the response 160
variable y and the predictor 140
variable x. 120
𝑦 = 𝑓 ( 𝑥) 100
80
60
40
20
0
10 20 30 40 50 60 70 80 90
8
2.2 Introduction - Regression Function
Regard Y as a random
variable.
For each X, take f (x) to be
the expected value (i.e. mean
value) of y.
Given that E (Y) denotes the
expected value of Y, call the
equation the regression function.
𝐸(𝑌 )= 𝑓 (𝑥)
9
2.2 Introduction - Regression Application
Prediction
10
2.3 Regression
Scope of model
We may need to restrict the coverage of model
to some interval or region of values of the
independent variable(s) depend on the needs/
requirements.
11
2.3 Regression - Population & Sample
12
2.3 Regression - Regression Model
X is a known constant
13
2.3 Regression - Regression Coefficients
14
2.3 Regression - Regression Line
𝑦 = ^𝛽 0 + ^𝛽 1 𝑥
the relationship can be summarized by a straight-line
plot.
15
2.3 What is LR model used for?
Linear regression models are used to show or predict
the relationship between two variables or factors. The
factor that is being predicted is called the dependent
variable.
17
2.3 Regression - Regression Line
Example of
Linear
Regression
Blood pressure
reading vs stress
test score
19
2.4 Least Squares Method
20
2.4 Least Squares Method
21
2.4 Least Squares Method
22
2.4 Why Linear regression is used?
^𝑦 = ^𝛽 0 + ^𝛽 1 𝑥
The method of least squares chooses the values for b0, and
b1 to minimize the sum of squared errors (SSE)
𝑛
𝑆𝑆𝐸 = ∑ ¿ ¿
𝑖 =1
26
2.5 SLR - Computation
27
Example
Example 1:
The manager of a car plant wishes to investigate how the
plant’s electricity usage depends upon the plant production. The
data is given below
Production 4.51 3.58 4.31 5.06 5.64 4.99 5.29 5.83 4.7 5.61 4.9 4.2
($million) (x)
Electricity 2.48 2.26 2.47 2.77 2.99 3.05 3.18 3.46 3.03 3.26 2.67 2.53
Usage (y)
x 4.51 3.58 4.31 5.06 5.64 4.99 5.29 5.83 4.7 5.61 4.9 4.2
=58.62
y 2.48 2.26 2.47 2.77 2.99 3.05 3.18 3.46 3.03 3.26 2.67 2.53
=34.15
xy 11.18 8.09 10.65 14.02 16.86 15.22 16.82 20.17 14.24 18.29 13.08 10.63
=169.25
x2 20.34 12.82 18.58 25.60 31.81 24.90 27.98 33.99 22.09 31.47 24.01 17.64
=291.23
Estimated Regression Line
2.5 SLR - Estimation of Mean Response
Fitted regression line can be used to estimate the mean
value of y for a given value of x.
Example
The weekly advertising expenditure (x) and weekly
31
2.5 SLR – Estimation of Mean Response
^y =828+10.8 x
This means that if the weekly advertising expenditure is
increased by $1 we would expect the weekly sales to
increase by $10.8.
33
2.5 SLR – Estimation of Mean Response
𝑆𝑎𝑙𝑒𝑠=828+10.8(50)=1368
This is called the point estimate (forecast) of the mean
response (sales).
34
2.6 Software Output - Example
35
2.7 ANOVA
𝑆𝑆𝑇 =∑ ¿¿
If SST = 0, all observations are the same (No variability).
The greater is SST, the greater is the variation among the y
values.
In regression model, the measure of variation is that of the y
observations variability around the fitted line:
𝑦 𝑖 − ^𝑦 𝑖
2.7 ANOVA – SST, SSE & SSR
39
2.7 ANOVA – SST, SSE & SSR
41
2.7 ANOVA - Mean Squares (MS)
43
2.8 Model Evaluation - (i) Standard
error of estimate (s)
𝐒𝐒𝐄
Compute Standard Error of Estimate by 𝜎
^𝟐=
𝐧−𝟐
where
𝐧
𝐒𝐒𝐄= ∑ ¿ ¿
𝐢=𝟏
The smaller SSE the more successful is the Linear
Regression Model in explaining y.
45
2.8 Model Evaluation – (ii)
Coefficient of Determination
Coefficient of determination
2 𝑆𝑆𝑅 𝑆𝑆𝐸
𝑅 = = 1− R2 = 1 - (SSE/SST)
𝑆𝑆𝑇 𝑆𝑆𝑇
proportion of variability in the observed dependent variable
that is explained by the linear regression model.
The coefficient of determination measures the strength of
that linear relationship, denoted by R2
The greater R2 the more successful is the Linear Model
The R2 value close to 1 indicates the best fit and close to 0
indicates the poor fit.
46
2.8 R-squared
Hypothesis test
• A process that uses sample statistics to test a claim about the value
of a population parameter.
• Example: An automobile manufacturer advertises that its new hybrid
car has a mean mileage of 50 miles per gallon. To test this claim, a
sample would be taken. If the sample mean differs enough from the
© 2019 Petroliam Nasional Berhad (PETRONAS) | 50
√ ^2
𝜎 𝑠𝑒 (𝑏)
Test Statistic: T – distribution:
𝑠 𝑠 𝑥𝑥
𝐹(𝛼;1,𝑛−2)
2.8 Model Evaluation – (iii) The
hypothesis test (b. F-test)
This time we will use the F-test. The null and alternative
hypothesis are:
𝐻 0 :𝛽1=0
Construction of decision rule:
At = 5% level, Reject H0 if 𝐹> 𝐹(𝛼 ;1 , 𝑛−2)
Electricity Usage (y) 2.48 2.26 2.47 2.77 2.99 3.05 3.18 3.46 3.03 3.26 2.67 2.5
(kWh) 3
58
2.7 Example 1
59
Excel results: Regression Line
production Line Fit Plot
4
3.5
2.5
electricity
Linear (electricity )
electricity
2 Predicted electricity
1.5
0.5
0
3 3.5 4 4.5
production 5 5.5 6
Internal
2.7 Example 1
61
2.7 Example 1
62
2.7 Example 1
Using F-test. The null and alternative hypothesis are:
𝐻 0 :𝛽1=0
= 0.05. Since n=12, we require F(0.05; 1,10).
Internal
Excel Results – Example 2
Regression Statistics
Multiple R 0.680322
R Square 0.462837
Adjusted R
Square 0.461716
Standard Error 0.40947
Observations 481
ANOVA
df SS MS F Significance F
Regression 1 69.19926 69.19926 412.7226 1.22E-66
Residual 479 80.31167 0.167665
Total 480 149.5109
Internal
Excel Results – Example 2
4.50
4.00
f(x) = 0.00171031072766035 x + 0.309738613245874
3.50 R² = 0.462837451190233
3.00
RQI
2.50 Linear (RQI)
RQI
Predicted RQI
2.00
1.50
1.00
0.50
0.00
0.0 500.0 1000.0 1500.0 2000.0 2500.0
Permeability(md)
© 2019 Petroliam Nasional Berhad (PETRONAS) | 66
Internal
2.5.3a Interpretation of the results
- Example 2
• Permeability(md) coefficient (1=0.0017): Each unit
increase in Permeability adds 0.0017 to RQI value when
all other variables are fixed.
• 1> 0: (positive relationship): QRI increases with the
increase in Permeability.
• Intercept coefficient (0 = 0.309): The value of QRI when
Permeability equal to zero.
• R Square = 0.462837: indicates that the model explains 46% of
the total variability in the RQI values around its mean.
• P-value < 0.05: The regression is significant
© 2019 Petroliam Nasional Berhad (PETRONAS) | 67
Internal
68