0% found this document useful (0 votes)
20 views

Chapter 9 Simple Linear Regression and Correlation (1) (1)

Chapter 9 discusses Simple Linear Regression and Correlation, focusing on the relationship between two variables, such as force and velocity in engineering experiments. It explains the mathematical formulation of linear regression, the estimation of regression coefficients, and the importance of residuals in assessing model fit. The chapter also includes an example of applying linear regression to experimental data to determine coefficients and make predictions.

Uploaded by

gppm9krpz9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Chapter 9 Simple Linear Regression and Correlation (1) (1)

Chapter 9 discusses Simple Linear Regression and Correlation, focusing on the relationship between two variables, such as force and velocity in engineering experiments. It explains the mathematical formulation of linear regression, the estimation of regression coefficients, and the importance of residuals in assessing model fit. The chapter also includes an example of applying linear regression to experimental data to determine coefficients and make predictions.

Uploaded by

gppm9krpz9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

CHAPTER 9

Simple Linear Regression and Correlation

Prepared by Dr Rania Minkara

1
Introduction
• In real life engineering practice, often a relationship is found to exist between two (or
more) variables.
• For example: the experimental data for force (N) and velocity (m/s) from a wind
tunnel experiment.
• A mechanical element/component is suspended in a wind tunnel and the force
measured for various levels of wind velocity. This relationship can be visualised by
plotting force versus velocity.
• It is frequently desirable to express this relationship in mathematical/analytical form
by establishing an equation connecting the variables. In order to determine an
equation connecting the variables, it is often necessary to collect the data depicting
the values of the variables under consideration.
• if x and y denote respectively the velocity and force from the wind tunnel
experiment, then a sample of n individual would give the velocities x1, x2, …, xn and
the corresponding forces y1, y2, …, yn.

2
3
Linear Regression
• Linear regression is a commonly used method for examining the relationship
between quantitative variables and for making predictions.
• In this chapter, we will learn how to find the regression equation, the
equation of the line that best fits a set of data points.

4
Linear Equation
The general form of a linear equation with one independent variable can be
written as
y = 𝛽0 + 𝛽1 𝑥
where 𝛽0 (intercept) and 𝛽1 (slope) are constants, 𝑥 is the independent
variable, and y is the dependent variable.

5
Notes
• The independent variables of your problem are called the regressors.
• If your problem consists of 1 independent variable, the analysis is called
simple regression.
• If your problem consists of more than 1 independent variable, the analysis is
called multiple regression.

6
The Simple Linear Regression (SLR)
Model
y = 𝛽0 + 𝛽1 𝑥 + 𝜖
• 𝛽0 and 𝛽1 are unknown intercept and slope parameters, respectively, and 𝜖
is a random variable that is assumed to be distributed with E(𝜖 )=0 and
Var(𝜖 )=σ2.
• The quantity σ2 is often called the error variance or residual variance.
• From the model above, several things become apparent:

1. The quantity Y is a random variable since 𝜖 is random.


2. The value x of the regressor variable is not random and, in fact, is measured
with negligible error.
3. The quantity 𝜖, often called a random error or random disturbance, has constant
variance.
7
Notes
• The fact that E(𝜖) = 0 implies that at a specific x the y-values are distributed
around the true, or population, regression line y = 𝛽0 + 𝛽1 𝑥 .
• If the model is well chosen, then positive and negative errors around the
true regression are reasonable.
• We must keep in mind that in practice 𝛽0 and 𝛽1 are not known and must be
estimated from data. In addition, the described model is conceptual in
nature. As a result, we never observe the actual 𝛽0 values in practice and
thus we can never draw the true regression line (but we assume it is there).
We can only draw an estimated line.

8
9
The Fitted Regression Line
• An important aspect of regression analysis is to estimate the parameters 𝛽0
and 𝛽1 (i.e., estimate the so-called regression coefficients).
• Suppose we denote the estimates b0 for 𝛽0 and b1 for 𝛽1 . Then the
estimated or fitted regression line is given by
𝑦ො = 𝑏0 + 𝑏1 𝑥
where 𝑦ො is the predicted or fitted value.
• The fitted line is an estimate of the true regression line. We expect that the
fitted line should be closer to the true regression line when a large amount
of data is available.

10
Another Look at the Model Assumptions

11
Notes Regarding The Previous Plot
• Suppose we have a simple linear regression with n = 6 evenly spaced
values of x and a single y-value at each x.
• The line in the graph is the true regression line.
• The points plotted are actual (y, x) points which are scattered about the line.
Each point is on its own normal distribution with the center of the distribution
(i.e., the mean of y) falling on the line.
• All distributions have the same variance, which we referred to as σ2.
• The deviation between an individual y and the point on the line will be its
individual 𝜖𝑖 value.

12
Residual
• A residual is essentially an error in the fit of the model 𝑦ො = 𝑏0 + 𝑏1 𝑥
• Given a set of regression data {(𝑥 i, yi); i =1, 2,...,n} and a fitted model,
𝑦ෝ𝑖 = 𝑏0 + 𝑏1 𝑥𝑖 , the ith residual 𝑒𝑖 is given by

𝑒𝑖 = 𝑦𝑖 − 𝑦ෝ𝑖 𝑖 = 1 , 2, … , 𝑛

• If a set of n residuals is large, then the fit of the model is not good. Small
residuals are a sign of a good fit.

13
Residuals, 𝑒𝑖 , and Conceptual Model
Errors, 𝜖𝒊
• Conceptual Model: 𝑦𝑖 = 𝛽0 + 𝛽1 𝑥𝑖 + 𝜖i
• Statistical Model: 𝑦𝑖 = 𝑏0 + 𝑏1 𝑥𝑖 + 𝑒𝑖
• The 𝜖i are not observed whereas the 𝑒𝑖 are observed and play an important
role in the total analysis.

14
Least Squares
• We shall find 𝑏0 and 𝑏1 , the estimates of 𝛽0 and 𝛽1 , so that the sum of the squares
of the residuals is a minimum.
• The residual sum of squares is often called the sum of squares of the errors about
the regression line and is denoted by SSE.

𝑛 𝑛 𝑛

𝑆𝑆𝐸 = ෍ 𝑒𝑖2 = ෍(𝑦𝑖 −𝑦ෝ𝑖 )2 = ෍(𝑦𝑖 −𝑏0 − 𝑏1 𝑥𝑖 )2


𝑖=1 𝑖=1 𝑖=1
• Differentiating SSE with respect to 𝑏0 and 𝑏1 :

𝑛
𝜕𝑆𝑆𝐸
= −2 ෍(𝑦𝑖 −𝑏0 − 𝑏1 𝑥𝑖 )
𝜕𝑏0
𝑖=1

𝑛
𝜕𝑆𝑆𝐸
= −2 ෍(𝑦𝑖 −𝑏0 − 𝑏1 𝑥𝑖 )𝑥𝑖
𝜕𝑏1
𝑖=1
15
• Setting the partial derivatives equal to zero and rearranging the terms:

𝑛 𝑛

𝑛𝑏0 + 𝑏1 ෍ 𝑥𝑖 = ෍ 𝑦𝑖
𝑖=1 𝑖=1

𝑛 𝑛 𝑛

𝑏0 ෍ 𝑥𝑖 + 𝑏1 ෍ 𝑥𝑖2 = ෍ 𝑥𝑖 𝑦𝑖
𝑖=1 𝑖=1 𝑖=1

• Solved simultaneously, the above equations yields to computing formulas


for 𝑏0 and 𝑏1 .

16
Estimated Regression Coefficients
Given the sample {(xi,yi); i =1 ,2,...,n}, the least squares estimates 𝑏0 and 𝑏1 of
the regression coefficients 𝛽0 and 𝛽1 are computed from the formulas:

𝑛 𝑛 𝑛 𝑛
𝑛 ෍ 𝑥𝑖 𝑦𝑖 − ෍ 𝑥𝑖 ෍ 𝑦𝑖 ෍(𝑥𝑖 −𝑥)(𝑦
ҧ 𝑖 −𝑦)

𝑖=1 𝑖=1 𝑖=1 𝑖=1
𝑏1 = = 𝑛
𝑛 𝑛 2
෍(𝑥𝑖 −𝑥)ҧ 2
𝑛 ෍ 𝑥𝑖2 − ෍ 𝑥𝑖
𝑖=1
𝑖=1 𝑖=1

𝑛 𝑛

෍ 𝑦𝑖 − 𝑏1 ෍ 𝑥𝑖
𝑖=1 𝑖=1
𝑏0 = = 𝑦ത − b1 𝑥ҧ
𝑛
17
Example 1
The following table gives experimental data for force (N) and velocity (m/s) for
an object suspended in a wind tunnel.

Velocity (m/S) 10 20 30 40 50 60 70 80
Force (N) 24 68 378 552 608 1218 831 1452

a) Use the linear least-squares regression to determine the coefficients b0


and b1 of the function 𝑦ො = 𝑏0 + 𝑏1 𝑥 that best fits the data.
b) Estimate the force when the velocity is 55 m/s.

18
Solution
360
X Y x2 XY a) 𝑥ҧ = 8
= 45
10 24 100 240 5131
𝑦ത = = 641.375
20 68 400 1360 8
30 378 900 11340
8(312830)−360(5131)
40 552 1600 22080 𝑏1 = = 19.5083
8(20400)−3602
50 608 2500 30400
60 1218 3600 73080
70 831 4900 58170 𝑏0 = 641.375 − 19.5083 45 = −236.4958
80 1452 6400 116160
Sum 360 5131 20400 312830 𝑦ො = −236.4958 + 19.5083𝑥
b) 𝑥 = 55𝑚/𝑠 ⇒ 𝑦 = 836.4607

19
Plot of Example 1

20
Mean and Variance of Least Squares
Estimators
Assuming that the error term in the model 𝑦𝑖 = 𝛽0 + 𝛽1 𝑥𝑖 + 𝜖i is a random
variable with mean 0 and constant variance σ2, and that 𝜖1 , 𝜖2 , … , 𝜖𝑛 are
independent from run to run in the experiment, the estimators B0 and B1 of 𝛽0
and 𝛽1 has mean and standard deviation as follows:

𝜎2
𝜇𝐵1 = 𝛽1 and 𝜎𝐵21 = 𝑛
σ𝑖=1(𝑥𝑖 −𝑥)ҧ 2

σ𝑛𝑖=1 𝑥𝑖2
𝜇𝐵0 = 𝛽0 and 𝜎𝐵20 = 𝜎 2
𝑛 σ𝑛𝑖=1(𝑥𝑖 −𝑥)ҧ 2

21
𝑆𝑥𝑥 , 𝑆𝑦𝑦 and 𝑆𝑥𝑦
• Let 𝑆𝑥𝑥 , 𝑆𝑦𝑦 and 𝑆𝑥𝑦 be as follows:

𝑛 𝑛 𝑛 2
2 2
σ 𝑖=1 𝑥𝑖
𝑆𝑥𝑥 = ෍(𝑥𝑖 −𝑥)ҧ = ෍ 𝑥𝑖 −
𝑛
𝑖=1 𝑖=1

𝑛 𝑛 𝑛 2
2 2
σ 𝑖=1 𝑦𝑖
𝑆𝑦𝑦 = ෍(𝑦𝑖 −𝑦)
ത = ෍ 𝑦𝑖 −
𝑛
𝑖=1 𝑖=1

𝑛 𝑛
σ𝑛𝑖=1 𝑥𝑖 σ𝑛𝑖=1 𝑦𝑖
𝑆𝑥𝑦 = ෍(𝑥𝑖 −𝑥)(𝑦
ҧ 𝑖 −𝑦)
ത = ෍ 𝑥𝑖 𝑦𝑖 −
𝑛
𝑖=1 𝑖=1

22
An unbiased estimate of σ2
𝑛 2
2
𝑆𝑆𝐸 (𝑦𝑖 − 𝑦
ෝ𝑖 ) 𝑆𝑦𝑦 −𝑏1 𝑆𝑥𝑦
𝑆 = =෍ =
(𝑛 − 2) (𝑛 − 2) (𝑛 − 2)
𝑖=1

(Recall that the mean of errors is 0)

23
Inferences Concerning the Regression
Coefficients
• Aside from estimating the linear relationship between x and Y for purposes
of prediction, the experimenter may also be interested in drawing certain
inferences about the slope and intercept.
• In order to allow for the testing of hypotheses and the construction of
confidence intervals on 𝛽0 and 𝛽1 , one must be willing to make the further
assumption that each 𝜖i 𝑖 = 1, … , 𝑛, is normally distributed. This assumption
implies that Y1,Y2,...,Yn are also normally distributed, each with probability
distribution n(yi; 𝛽0 + 𝛽1 xi, σ).

24
Confidence Interval for 𝛽1
• A 100(1 − α)% confidence interval for the parameter 𝛽1 in the regression line
𝜇𝑌/𝑋 = 𝛽0 + 𝛽1 𝑥𝑖 is

𝑠 𝑠
𝑏1 − 𝑡𝛼/2 < 𝛽1 < 𝑏1 + 𝑡𝛼/2
𝑆𝑥𝑥 𝑆𝑥𝑥

where 𝑡𝛼/2 is a value of the t-distribution with n−2 degrees of freedom leaving
an area of α/2 to the right.

25
Example 2
Construct a 95% confidence interval for 𝛽1 for the data in Example 1.

26
Solution
𝑛 𝑛

𝑆𝑥𝑥 = ෍(𝑥𝑖 −𝑥)ҧ 2 = ෍(𝑥𝑖 −45)2 = 4200


𝑖=1 𝑖=1

𝑛 𝑛

ത 2 = ෍(𝑦𝑖 −641.375)2 = 1813945.875


𝑆𝑦𝑦 = ෍(𝑦𝑖 −𝑦)
𝑖=1 𝑖=1

𝑛 𝑛

𝑆𝑥𝑦 = ෍(𝑥𝑖 −𝑥)(𝑦


ҧ 𝑖 −𝑦)
ത = ෍(𝑥𝑖 −45)(𝑦𝑖 −641.375) = 81935
𝑖=1 𝑖=1

27
2
𝑆𝑦𝑦 −𝑏1 𝑆𝑥𝑦 1813945.875 − 19.5083 × 81935
𝑆 = = = 35922.21908 ⇒ 𝑆 = 189.53
(𝑛 − 2) (𝑛 − 2)

𝑡𝛼/2 = 𝑡0.025 = 2.447 for (n-2=6 degrees of freedom)

𝑠 𝑠
𝑏1 − 𝑡𝛼/2 < 𝛽1 < 𝑏1 + 𝑡𝛼/2 ⟹ 12.352 < 𝛽1 < 26.6646
𝑆𝑥𝑥 𝑆𝑥𝑥

We are 95% confident that the slope of the population regression line is between
12.352 and 26.6646

28
Hypothesis Testing on the Slope 𝛽1
• To test the hypothesis that x does not determine y linearly, we test the null
hypothesis that the slope of the regression line is zero, that is, 𝛽1 = 𝛽10 = 0.
• The alternative hypothesis is one of the following:
a) x determines y, that is, 𝛽1 ≠ 0.
b) x determines y positively, that is, 𝛽1 > 0.
c) x determines y negatively, that is, 𝛽1 < 0.

• The value of the test statistic t for 𝑏1 is calculated as

𝑏1 − 𝛽10 𝑏1 − 0
𝑡= =
𝜎𝐵1 𝑠/ 𝑆𝑥𝑥
The value of 𝛽10 is substituted from the null hypothesis.
29
Example 3
Test at the 1% significance level if the slope of the regression line for the
Example 1 is positive.

30
Solution
H0: β1 = 0 (the slope is 0)
H1: β1 > 0 (the slope is positive)
Since the standard deviation of the error is unknown, we use an estimate for it
(𝑠) and a t-distribution to perform the test about β1.

𝑏1 − 𝛽10 19.5083
𝑡= = = 6.6706
𝑠/ 𝑆𝑥𝑥 189.53/ 4200
𝑡𝛼 = 𝑡0.01 = 3.143 for (n-2=6 degrees of freedom)

𝑡 > 𝑡0.01 ⟹ we reject H0 and the slope positive at the 1% level of significance.

31
Confidence Interval for 𝛽𝟎
A 100(1 − α)% confidence interval for the parameter 𝛽0 in the regression line
𝜇𝑌/𝑋 = 𝛽0 + 𝛽1 𝑥𝑖 is

𝑛 𝑛
𝑠 𝑠
𝑏0 − 𝑡𝛼/2 ෍ 𝑥𝑖2 < 𝛽0 < 𝑏0 + 𝑡𝛼/2 ෍ 𝑥𝑖2
𝑛𝑆𝑥𝑥 𝑖=1
𝑛𝑆𝑥𝑥 𝑖=1

where 𝑡𝛼/2 is a value of the t-distribution with n−2 degrees of freedom leaving
an area of α/2 to the right.

32
Hypothesis Testing on 𝛽𝟎
To test the null hypothesis H0 that 𝛽0 = 𝛽00 against a suitable alternative, we
can use the t-distribution with n−2 degrees of freedom to establish a critical
region and then base our decision on the value of the test statistic t calculated
as:

𝑏0 − 𝛽00
𝑡=
𝑠 σ𝑛 𝑥
𝑖=1 𝑖
2 /(𝑛𝑆 )
𝑥𝑥

33
Coefficient of Determination
• The coefficient of determination is a measure of quality of fit.
𝑛

෍(𝑦𝑖 −𝑦ෝ𝑖 )2
𝑖=1
𝑅2 = 1 − 𝑛
0 ≤ 𝑅2 ≤ 1
ത 2
෍(𝑦𝑖 −𝑦)
𝑖=1

• An R2 of 0 means that the dependent variable cannot be predicted from the


independent variable.
• An R2 of 1 means the dependent variable can be predicted without error from the
independent variable.
• An R2 between 0 and 1 indicates the extent to which the dependent variable is
predictable.
• What is an acceptable value for R2? This is depending on the problem type.
34
35
Example 4
Calculate the coefficient of determination for the data of Example 1.

36
Solution
𝑛 2
2
σ𝑖=1(𝑦𝑖 − 𝑦
ෝ𝑖 ) 𝑆𝑦𝑦 −𝑏1 𝑆𝑥𝑦
𝑅 =1− 𝑛 =1−
ത 2
σ𝑖=1(𝑦𝑖 −𝑦) 𝑆𝑦𝑦

Velocity
10 20 30 40 50 60 70 80
(m/S)
Force (N) 24 68 378 552 608 1218 831 1452
𝑦ෝ𝑖 41.4155 153.6675 348.7505 543.8335 738.9165 933.9995 1129.0825 1324.1655
𝑛

෍(𝑦𝑖 −𝑦ෝ𝑖 )2 = 215530.5833


𝑛 𝑖=1

ത 2 = 𝑆𝑦𝑦 = 1813945.875
෍(𝑦𝑖 −𝑦)
𝑖=1

⇒ 𝑅2 = 0.8811
The force can be predicted with limited error using the line equation.
37
Estimating The Mean Value Of Y
• The population regression model is
y = A + Bx + ∈
and the mean value of y for a given x is denoted by µy/x.
• Since the mean value of ∈ is assumed zero, the mean value of y is given by
µy/x = A + Bx.
• The value of 𝑦ො obtained from the sample regression line by substituting the
value of x, is the point estimate of µy/x for that x.
• All possible samples of the same size taken from the same population will
give different regression lines and a different point estimate of µy/x. Hence, a
confidence interval constructed for µy/x based on one sample will give a
more reliable estimate of µy/x than with a point estimate.
38
Confidence Interval for 𝝁𝒀/𝒙𝟎
• It can be shown that the sampling distribution of 𝑦
ෞ0 is normal with mean and
variance:

𝐸(ෞ
𝑦0 ) = 𝛽0 + 𝛽1 𝑥0 = 𝝁𝒀/𝒙𝟎

2
2 2
1 𝑥0 − 𝑥ҧ
𝜎𝑦ෞ = 𝜎 +
0 𝑛 𝑆𝑥𝑥
• A 100(1−α)% confidence interval for the mean response 𝝁𝒀/𝒙𝟎 is

1 𝑥0 −𝑥ҧ 2 1 𝑥0 −𝑥ҧ 2
𝑦
ෞ0 − 𝑡𝛼/2 𝑆 + < 𝝁𝒀/𝒙𝟎 < 𝑦
ෞ0 + 𝑡𝛼/2 𝑆 +
𝑛 𝑆𝑥𝑥 𝑛 𝑆𝑥𝑥

where tα/2 is a value of the t-distribution with n−2 degrees of freedom.

39
Example 5
Based on the data of Example 1, determine a 99% confidence interval for the
mean force with a velocity of 50 m/s.

40
Solution
For 𝑥0 = 50 𝑚/𝑠, 𝑦
ෞ0 = 738.9165 N
𝑡𝛼/2 = 𝑡0.005 = 3.707 for (n-2=6 degrees of freedom)

1 𝑥0 −𝑥ҧ 2 1 𝑥0 −𝑥ҧ 2
𝑦
ෞ0 − 𝑡𝛼/2 𝑆 + < 𝜇𝑌/𝑥0 < 𝑦
ෞ0 + 𝑡𝛼/2 𝑆 +
𝑛 𝑆𝑥𝑥 𝑛 𝑆𝑥𝑥

1 50−45 2 1 50−45 2
738.9165 − 3.707 × 189.53 8
+ 4200
< 𝜇𝑌/𝑥0 <738.9165 + 3.707 × 189.53 8
+ 4200

484.668 < 𝜇𝑌/𝑥0 < 993.165


With 99% confidence, we can state that the mean force for all velocities with a
value of 50 m/s is between 484.668 and 993.165.

41
Prediction Interval
• Another type of interval that is often misinterpreted and confused with that
given for 𝝁𝒀/𝒙𝟎 is the prediction interval for a future observed response.
Actually in many instances, the prediction interval is more relevant to the
scientist or engineer than the confidence interval on the mean.
• A 100(1−α)% prediction interval for a single response y0 is given by

1 𝑥0 −𝑥ҧ 2 1 𝑥0 −𝑥ҧ 2
𝑦
ෞ0 − 𝑡𝛼/2 𝑆 1 + + < 𝑦0 < 𝑦
ෞ0 + 𝑡𝛼/2 𝑆 1 + +
𝑛 𝑆𝑥𝑥 𝑛 𝑆𝑥𝑥

where tα/2 is a value of the t-distribution with n−2 degrees of freedom.


• The prediction interval represents an interval that has a probability equal to
1−α of containing a future value y0 of the random variable Y0.
42
Example 6
Based on the data of Example 1, find a 99% prediction interval for the
predicted forces for a randomly selected velocity of 50 m/s.

43
Solution
For 𝑥0 = 50 𝑚/𝑠, 𝑦
ෞ0 = 738.9165 N
𝑡𝛼/2 = 𝑡0.005 = 3.707 for (n-2=6 degrees of freedom)

1 𝑥0 −𝑥ҧ 2 1 𝑥0 −𝑥ҧ 2
𝑦
ෞ0 − 𝑡𝛼/2 𝑆 1 + + < 𝑦0 < 𝑦
ෞ0 + 𝑡𝛼/2 𝑆 1 + +
𝑛 𝑆𝑥𝑥 𝑛 𝑆𝑥𝑥

1 50−45 2 1 50−45 2
738.9165 − 3.707 × 189.53 1 + + < 𝑦0 <738.9165 + 3.707 × 189.53 1 + +
8 4200 8 4200

−8.2603 < 𝑦0 < 1486.094


We are 99% confident that the predicted force for a velocity of 50 𝑚/𝑠 is
between −8.2603 and 1486.094.

44
Linear Correlation
• Linear correlation coefficient is a measure of the relationship between two
variables. Linear correlation coefficient measures how closely the points in a
scatter diagram are spread around the regression line.
• The correlation coefficient calculated for the population is denoted by ρ and
the one calculated for sample data is denoted by r.

45
r=1 r= -1 r=0

• If r = 1, it refers to a case of perfect positive linear correlation and all points


in the scatter diagram lie on a straight line that slopes upward from left to
right.
• If r = –1, the correlation is said to be perfect negative linear correlation and
all points in the scatter diagram fall on a straight line that slopes downward
from left to right.
• If r is close to 0, there is no linear correlation between the two variables.

46
47
• Two variables are said to have a strong positive linear correlation when the
correlation is positive and close to 1.
• If the correlation between the two variables is positive but close to zero, then
the variables have a weak positive linear correlation.
• If the correlation between two variables is negative and close to –1, then the
variables are said to have a strong negative linear correlation.
• If the correlation between two variables is negative and close to 0, the
variables are said to have weak negative linear correlation.

48
Correlation Coefficient For A Sample

The simple linear correlation, denoted by r, measures the strength of the


linear relationship between two variables for a sample and is calculated a

𝑆𝑥𝑥 𝑆𝑥𝑦
𝑟 = 𝑏1 =
𝑆𝑦𝑦 𝑆𝑥𝑥 𝑆𝑦𝑦

−1 ≤ 𝑟 ≤ 1

49
Example 7
Determine the correlation coefficient for the data given in Example 1.

50
Solution
𝑆𝑥𝑦 81935
𝑟= = = 0.9387
𝑆𝑥𝑥 𝑆𝑦𝑦 4200 × 1813945.875

There is a a strong positive linear correlation between the force and the
velocity. When the velocity increases, the force increases and vice versa.
Furthermore, the force values can be predicted using the line equation we
found with a limited error.

51
Transformations
• There exist many situations in science and engineering that show that the
relationship between the quantities that are being considered is not linear.
There are several examples of non-linear functions used for curve fitting.
• A simpler alternative is to use analytical manipulations to transform the
equations into a linear form. Then linear regression can be used to fit the
equations to data.

52
Transformation of Some Non-linear equations

53
Example 8
Fit y = cxm (power function) to the data in Example 1 using a logarithmic
transformation.

54
Example 8 Solution
i Xi Yi log(Xi) log(Yi) (log(Xi))2 log(Xi) log(Yi)
1 10 24 1 1.3802 1 1.3802
2 20 68 1.301 1.8325 1.6927 2.3841
3 30 378 1.4771 2.5775 2.1819 3.8073
4 40 552 1.6021 2.7419 2.5666 4.3928
5 50 608 1.699 2.7839 2.8865 4.7298
6 60 1218 1.7782 3.0856 3.1618 5.4867
7 70 831 1.8451 2.9196 3.4044 5.387
8 80 1452 1.9031 3.162 3.6218 6.0175
σ 360 5131 12.6056 20.4832 20.5157 33.5854

55
𝑛 𝑛 𝑛

𝑛 ෍ log(𝑥𝑖 )log(𝑦𝑖 ) − ෍ log(𝑥𝑖 ) ෍ log(𝑦𝑖 )


𝑖=1 𝑖=1 𝑖=1
𝑏1 = = 2.0055
𝑛 𝑛 2

𝑛 ෍ (log(Xi))2 − ෍ log(𝑥𝑖 )
𝑖=1 𝑖=1

𝑛 𝑛

෍ log(𝑦𝑖 ) − 𝑏1 ෍ log(𝑥𝑖 )
𝑏0 = 𝑖=1 𝑖=1
= - 0.5997
𝑛

The least-square fit is log y = –0.5997 + 2.0055 log(x)

Transforming to the original coordinates, we have c = 10–(0.5997) = 0.2514 and

m = 2.0055, hence the least-squares fit is y = 0.2514 X2.0055

56

You might also like