0% found this document useful (0 votes)
25 views

Week 2

Uploaded by

Philip Owusu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

Week 2

Uploaded by

Philip Owusu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Introduction to Regression Analysis

Lecturer:
Wilhemina Adoma Pels

KNUST

January 24, 2024

1 / 33
SIMPLE LINEAR REGRESSION

2 / 33
REGRESSION

Regression is a statistical method used to describe the nature of the


relationship between variables, that is, positive or negative, linear or
nonlinear.
Regression Analysis is used to predict the value of one variable(the
dependent variable) on the basis of other variables(the independent
variables)
The variable we are trying to predict is called the response or
dependent variable: denoted Y
The variable predicting this is called the explanatory or independent
variable: denoted X
If we only have one independent variable, the model is
y = β0 + β1 x + ε (1)
This model is referred to as simple linear regression.
3 / 33
Applications

Economics

Social Science

Engineering

Management

Life & Biological Sciences

4 / 33
SIMPLE LINEAR REGRESSION
Is a model that estimate the linear relationship between a single dependent
variable Y and an independent variable X.
Model
Yi = β0 + β1 Xi + εi i = 1, · · · , n (2)
Variables:
X = Independent Variable(we provide this )
Y = Dependent Variable (we observe this)
Parameters:
β0 = Y-intercept
β1 = Slope
ε = random error
In this model β0 , β1 and εi are parameters and Yi and Xi are measured
values.
5 / 33
SIMPLE LINEAR REGRESSION

Required Conditions OR Assumption


For these regression methods to be valid, the following for conditions for
the error variable ε must be met:
The probability distribution of ε is normal.
The mean of the distribution is 0; that is, E (ε) = 0.
The standard deviation of ε is σε which is a constant regardless of the
value of x .
The value of ε associated with any particular value of y is
independent of ε associated with any other value of y .

6 / 33
LEAST SQUARE ESTIMATION OF THE PARAMETERS

Estimating the Coefficients


In much the same way we base estimates of µ on x̄ , we estimate β0
with βˆ0 and β1 with βˆ1 ,the y-intercept and slope respectively of the
least squares or regression line given by:

ŷ = βˆ0 + βˆ1 x (3)

This is an application of the least squares method and it produces a


straight line that minimizes the sum of the squared differences
between the points or the observation yi and the fitted line.

7 / 33
The Least Squares Line

Figure: Least Squares Line

8 / 33
LEAST SQUARE ESTIMATION OF THE PARAMETERS

n
X n
X
L = min ε̂2 = (Y − Ŷ )2 (4)
i=1 i=1
n n
(yi − βˆ0 − βˆ1 xi )2
X X
L = min ε2i = (5)
i=1 i=1
n
δL
= −2 (yi − βˆ0 − βˆ1 xi ) = 0
X
(6)
δ βˆ0 i=1
n
δL
(yi − βˆ0 − βˆ1 xi )xi = 0
X
= −2 (7)
δ βˆ1 i=1
simplifying the equations yields
n n
nβˆ0 + βˆ1
X X
xi = yi (8)
i=1 i=1

9 / 33
LEAST SQUARE ESTIMATION OF THE PARAMETERS

Cont’d
n n n
βˆ0 +βˆ1
X X X
xi2 = yi xi (9)
i=1 i=1 i=1

The solution to the equations results in the least squares estimators of βˆ0
and βˆ1 The least squares estimates of the intercept and slope in the simple
linear regression model are;

βˆ0 = ȳ − βˆ1 x̄ (10)

and Pn Pn
Pn ( i=1 yi )( i=1 xi )
i=1 yi xi −
βˆ1 = Pn n 2 (11)
Pn 2 ( i=1 xi )
i=1 xi −
n
10 / 33
LEAST SQUARE ESTIMATION OF THE PARAMETERS

Cont’d
the fomular of the slope can be denoted using the sum of squares,
Sxy
βˆ1 = (12)
Sxx
where;
n Pn Pn
X ( i=1 yi )( i=1 xi )
Sxy = yi xi − (13)
i=1
n
and
n Pn 2
X ( i=1 xi )
Sxx = xi2 − (14)
i=1
n

11 / 33
SIMPLE LINEAR REGRESSION

Regression Equation
Regression Equation describes the regression line mathematically by βˆ0
and βˆ1 the intercept and the slope. We replace a by βˆ0 and b by βˆ1 in the
graph below.

12 / 33
REGRESSION
Cont’d

13 / 33
Example

The amount of a chemical compound y, which is dissolved in 100 grams of


water at various temperatures x, were recorded as follows

xoC 6 5 10 7 8 12 5 9 7 11
y(grams) 21 19 31 25 28 33 20 29 22 32

1 Fit the linear regression model y = β0 + β1 x + ε to these data, using


the method of least squares

2 Estimate the amount of the chemical compound which will dissolve in


100 grams of water at 7.5o C

14 / 33
Solution

xi yi xi yi xi2
6 21 126 36
5 19 95 25
10 31 310 100
7 25 175 49
8 28 224 64
12 33 396 144
5 20 100 25
9 29 261 81
7 22 154 49
11 32 352 121
Σ = 80 260 2193 694

15 / 33
Solution

(260) × (80)
Sxy = 2193 − = 113
10
(80)
Sxx = 694 − = 54
10
Sxy 113
βˆ1 = = = 2.093
Sxx 54

βˆ0 = ȳ − βˆ1 x̄ = 26 − (2.093 × 8) = 9.259


The regression model is ŷ = 9.259 + 2.093x
2. When x=7.5
ŷ = 9.259 + 2.093(7.5) = 24.954

16 / 33
Interpretation of Coefficients in Regression Analysis

The coefficients describe the mathematical relationship between each


independent variable and the dependent variable.
The size of the coefficient for each independent variable gives you the
size of the effect that variable is having on your dependent variable,
and the sign on the coefficient (positive or negative) gives you the
direction of the effect.
1 A positive coefficient indicates that as the value of the independent
variable increases, the dependent variable also tends to increase.
2 A negative coefficient suggests that as the independent variable
increases, the dependent variable tends to decrease or vice versa.
The intercept is the average amount when the independent variable is
zero

17 / 33
Interpretation of Coefficients in Regression Analysis

Cont’d
Now interpret this Regression Equation;

ŷ = 4.692 + 0.923x (15)

18 / 33
SIMPLE LINEAR REGRESSION
Line of best fit Plot

19 / 33
SIMPLE LINEAR REGRESSION

Estimating the Variance of the error term ε


The residual;
εi = yi − yˆi (16)
is used to obtain the estimate of the error term.
The sum of squares of the residuals(Error Sum of Squares) is;
n
X n
X
SSE = ε2i = (yi − yˆi )2 (17)
i=1 i=1

The expected value of the error sum of square is;

E (SSE ) = (n − 2)σ 2 (18)

20 / 33
REGRESSION

Cont’d
Therefore the unbiased estimator of σ 2 is;
SSE
σˆ2 = (19)
n−2
Also the standard error of estimate is;
s
SSE
Sε = (20)
n−2

If Sε is Zero, all the points fall on the regression line. If Sε is small, the fit
is excellent and the linear model should be used for forecasting. If Sε is
large, the model is poor.

21 / 33
Example

The following measurements of the specific heat of a certain chemical were


made in order to investigate the variation in specific heat with
temperature.

Temperature 0 C (x) 0 10 20 30 40
Specific heat (y) 0.51 0.55 0.57 0.59 0.63
Find the least squares regression line of specific heat on temperature, and
hence estimate the value of the specific heat when the temperature is 25
0C .

22 / 33
Solution
x y xy x2
0 0.51 0 0
10 0.55 5.55 100
20 0.57 11.4 400
30 0.59 17.7 900
40 0.63 25.2 1600
P
= 100 2.85 59.8 3000
Sxy = ni=1 xy − n1 ( ni=1 x )( ni=1 y )
P P P

Sxy = 5i=1 (59.8) − 51 (100)(2.85)


P

Sxy = 2.8
Sxx = ni=1 x 2 − n1 ( ni=1 x )2
P P

Sxx = 5i=1 (3000) − 51 (100)2


P

Sxx = 1000

Sxy 2.8
β1 = = = 0.00028
Sxx 1000
23 / 33
Solution

βˆ0 = ȳ − βˆ1 x̄

2.85
ȳ = = 0.57
5
100
x̄ = = 20
5

βˆ0 = 0.57 − 0.0028(20) = 0.5644

The fitteed squares regression line is ŷ = βˆ0 + βˆ1 x

ŷ = 0.5644 + 0.00028x

24 / 33
Solution

at 250 C
ŷ = 0.5644 + 0.00025(25)
ŷ = 0.5714

25 / 33
REGRESSION

Testing the slope


If no linear relationship exists between the two variables, we would
expect the regression line to be horizontal, that is, to have a slope of
zero.
We want to see if there is a linear relationship, i.e. we want to see if
the slope(β1 ) is something other than zero. Our research hypothesis
becomes:
H0 = β1 = 0 [no linear relationship]
H1 = β1 ̸= 0 [there is linear relationship]

26 / 33
REGRESSION

Cont’d
We can implement this test statistic to try our hypothesis:

βˆ1 − β1
t= (21)
Sβˆ1

Where Sβˆ1 is the standard deviation of βˆ1 , defined as:


s
σˆ2
Sβˆ1 = (22)
Sxx
where
n Pn 2
X ( i=1 xi )
Sxx = xi2 − (23)
i=1
n

27 / 33
REGRESSION

Cont’d
If the error term ε is normally distributed, the test statistic has a
student t-distribution with n-2 degrees of freedom. The rejection
region depends on whether or not we’re doing a one or two tail
test(two tail test is most typical)
We reject the null hypothesis H0 if tcal > tα/2 , n − 2

28 / 33
Properties of the OLS estimates

These can be summarized by: OLS Estimator is BLUE


B-Best
L-Linear
U-Unbiased
E-Estimator
Note: The Gauss Markov theorem is required for the proof.

29 / 33
GROUP ASSIGNMENT

1 PROVE THAT OLS IS BLUE


2 Estimate β0 and β1
Show working

30 / 33
Trial Questions

A study was made on the amount of converted sugar(y) in a certain


process at various temperatures(x). The data were coded and
recorded as follows:
(x) 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0
(y) 8.1 7.8 8.5 9.8 9.5 8.9 8.6 10.2 9.3 9.2 10.5
a. Find the equation of the least squares regression line.
b. Estimate the converted sugar when the coded sugar is 1.75.

31 / 33
Trial Question

Regression methods were used to analyze the data from from a study
investigating the relationship between roadway surface temperature(x)
and pavement deflection(y).Summary quantities were;
P P 2 P P 2
n = 20, yi = 12.75, yi = 8.86, xi = 1478, xi =
P
143215.8 and xi yi = 1083.67
a. Calculate the least squares estimates of the slope and intercept of the
linear regression line.
b. Use the equation of the fitted regression line to predict the pavement
deflection when the surface temperature is 75 0 F .

32 / 33
Thank You.

33 / 33

You might also like