Week 2
Week 2
Lecturer:
Wilhemina Adoma Pels
KNUST
1 / 33
SIMPLE LINEAR REGRESSION
2 / 33
REGRESSION
Economics
Social Science
Engineering
Management
4 / 33
SIMPLE LINEAR REGRESSION
Is a model that estimate the linear relationship between a single dependent
variable Y and an independent variable X.
Model
Yi = β0 + β1 Xi + εi i = 1, · · · , n (2)
Variables:
X = Independent Variable(we provide this )
Y = Dependent Variable (we observe this)
Parameters:
β0 = Y-intercept
β1 = Slope
ε = random error
In this model β0 , β1 and εi are parameters and Yi and Xi are measured
values.
5 / 33
SIMPLE LINEAR REGRESSION
6 / 33
LEAST SQUARE ESTIMATION OF THE PARAMETERS
7 / 33
The Least Squares Line
8 / 33
LEAST SQUARE ESTIMATION OF THE PARAMETERS
n
X n
X
L = min ε̂2 = (Y − Ŷ )2 (4)
i=1 i=1
n n
(yi − βˆ0 − βˆ1 xi )2
X X
L = min ε2i = (5)
i=1 i=1
n
δL
= −2 (yi − βˆ0 − βˆ1 xi ) = 0
X
(6)
δ βˆ0 i=1
n
δL
(yi − βˆ0 − βˆ1 xi )xi = 0
X
= −2 (7)
δ βˆ1 i=1
simplifying the equations yields
n n
nβˆ0 + βˆ1
X X
xi = yi (8)
i=1 i=1
9 / 33
LEAST SQUARE ESTIMATION OF THE PARAMETERS
Cont’d
n n n
βˆ0 +βˆ1
X X X
xi2 = yi xi (9)
i=1 i=1 i=1
The solution to the equations results in the least squares estimators of βˆ0
and βˆ1 The least squares estimates of the intercept and slope in the simple
linear regression model are;
and Pn Pn
Pn ( i=1 yi )( i=1 xi )
i=1 yi xi −
βˆ1 = Pn n 2 (11)
Pn 2 ( i=1 xi )
i=1 xi −
n
10 / 33
LEAST SQUARE ESTIMATION OF THE PARAMETERS
Cont’d
the fomular of the slope can be denoted using the sum of squares,
Sxy
βˆ1 = (12)
Sxx
where;
n Pn Pn
X ( i=1 yi )( i=1 xi )
Sxy = yi xi − (13)
i=1
n
and
n Pn 2
X ( i=1 xi )
Sxx = xi2 − (14)
i=1
n
11 / 33
SIMPLE LINEAR REGRESSION
Regression Equation
Regression Equation describes the regression line mathematically by βˆ0
and βˆ1 the intercept and the slope. We replace a by βˆ0 and b by βˆ1 in the
graph below.
12 / 33
REGRESSION
Cont’d
13 / 33
Example
xoC 6 5 10 7 8 12 5 9 7 11
y(grams) 21 19 31 25 28 33 20 29 22 32
14 / 33
Solution
xi yi xi yi xi2
6 21 126 36
5 19 95 25
10 31 310 100
7 25 175 49
8 28 224 64
12 33 396 144
5 20 100 25
9 29 261 81
7 22 154 49
11 32 352 121
Σ = 80 260 2193 694
15 / 33
Solution
(260) × (80)
Sxy = 2193 − = 113
10
(80)
Sxx = 694 − = 54
10
Sxy 113
βˆ1 = = = 2.093
Sxx 54
16 / 33
Interpretation of Coefficients in Regression Analysis
17 / 33
Interpretation of Coefficients in Regression Analysis
Cont’d
Now interpret this Regression Equation;
18 / 33
SIMPLE LINEAR REGRESSION
Line of best fit Plot
19 / 33
SIMPLE LINEAR REGRESSION
20 / 33
REGRESSION
Cont’d
Therefore the unbiased estimator of σ 2 is;
SSE
σˆ2 = (19)
n−2
Also the standard error of estimate is;
s
SSE
Sε = (20)
n−2
If Sε is Zero, all the points fall on the regression line. If Sε is small, the fit
is excellent and the linear model should be used for forecasting. If Sε is
large, the model is poor.
21 / 33
Example
Temperature 0 C (x) 0 10 20 30 40
Specific heat (y) 0.51 0.55 0.57 0.59 0.63
Find the least squares regression line of specific heat on temperature, and
hence estimate the value of the specific heat when the temperature is 25
0C .
22 / 33
Solution
x y xy x2
0 0.51 0 0
10 0.55 5.55 100
20 0.57 11.4 400
30 0.59 17.7 900
40 0.63 25.2 1600
P
= 100 2.85 59.8 3000
Sxy = ni=1 xy − n1 ( ni=1 x )( ni=1 y )
P P P
Sxy = 2.8
Sxx = ni=1 x 2 − n1 ( ni=1 x )2
P P
Sxx = 1000
Sxy 2.8
β1 = = = 0.00028
Sxx 1000
23 / 33
Solution
βˆ0 = ȳ − βˆ1 x̄
2.85
ȳ = = 0.57
5
100
x̄ = = 20
5
ŷ = 0.5644 + 0.00028x
24 / 33
Solution
at 250 C
ŷ = 0.5644 + 0.00025(25)
ŷ = 0.5714
25 / 33
REGRESSION
26 / 33
REGRESSION
Cont’d
We can implement this test statistic to try our hypothesis:
βˆ1 − β1
t= (21)
Sβˆ1
27 / 33
REGRESSION
Cont’d
If the error term ε is normally distributed, the test statistic has a
student t-distribution with n-2 degrees of freedom. The rejection
region depends on whether or not we’re doing a one or two tail
test(two tail test is most typical)
We reject the null hypothesis H0 if tcal > tα/2 , n − 2
28 / 33
Properties of the OLS estimates
29 / 33
GROUP ASSIGNMENT
30 / 33
Trial Questions
31 / 33
Trial Question
Regression methods were used to analyze the data from from a study
investigating the relationship between roadway surface temperature(x)
and pavement deflection(y).Summary quantities were;
P P 2 P P 2
n = 20, yi = 12.75, yi = 8.86, xi = 1478, xi =
P
143215.8 and xi yi = 1083.67
a. Calculate the least squares estimates of the slope and intercept of the
linear regression line.
b. Use the equation of the fitted regression line to predict the pavement
deflection when the surface temperature is 75 0 F .
32 / 33
Thank You.
33 / 33