0% found this document useful (0 votes)
19 views

LBSRE1021 Data Interpretation: Correlation and Regression

This document discusses correlation and linear regression. It provides an example dataset and calculates the Pearson correlation coefficient and line of best fit. It also defines the coefficient of determination and provides an example exam question.

Uploaded by

I_Prashant97
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

LBSRE1021 Data Interpretation: Correlation and Regression

This document discusses correlation and linear regression. It provides an example dataset and calculates the Pearson correlation coefficient and line of best fit. It also defines the coefficient of determination and provides an example exam question.

Uploaded by

I_Prashant97
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 18

LBSRE1021 Data

Interpretation
Lecture 11

Correlation and Regression


Example Data
Day Output (TONS) Cost £000

1 23 58
2 17 50
3 24 54
4 35 64
5 10 40
6 16 43
7 15 42
8 24 50
9 18 53
10 30 62
The scatter diagram of the data would appear
as below:

70

65

60

55

50

45

40
5 10 15 20 25 30 35 40
Alternatively a negative correlation would
appear as below:
50

40

30

20

10

0
5 10 15 20 25 30 35 40
Alternatively data with no correlation may
appear as below:

60

50

40

30

20

10

0
0 5 10 15 20 25 30 35 40
Correlation Scale

-1 0 +1

Perfect negative No correlation Perfect positive


correlation correlation
Pearson’s product moment
correlation coefficient (r)
r = n ∑ xy - ∑x ∑y

√ [n ∑x - (∑x)] [n ∑y - (∑y)]

 x y xy x y
23 58 1334 529 3364
17 50 850 289 2500
24 54 1296 576 2916
∑ 212 516 11452 5000 27242
Pearson’s product moment
correlation coefficient (r) (2)

r = 10 x 11452 – 212 * 516

√ [10 x 5000 – (212)] [10 x 27242 – (516)]

= 5128

√ 5056 x 6164

= 0.9186
Linear Regression
Need to establish a ‘line of best fit’
The ‘freehand method’ has many
drawbacks.

In some sense we need the ‘best fit’ to


the data. To obtain this we do not use
crude graphical techniques. We identify
the ‘line of best fit’ or ‘least squares line.’
Linear Regression (2)
70

65
The equation for this line is Y = 30.10 + 1.014X

60

55

50

45

40
5 10 15 20 25 30 35 40
Linear Regression (3)
The equation of this line is Y =30.10 +1.014X
But how is this obtained?

The scattered points illustrate the actual data,


while the least squares line is an estimate of
Y for a given value of X. Notice the distance
between the scattered points and the line;
this will give you some idea of how good a fit
the line is.
Linear Regression (4)
How do we determine the least squares line?

Simply we need to determine the intercept (a)


and the (b) gradient.

The formula is therefore Y = a + bx

You need to apply a little calculus (we will omit


that process here) to develop standard
equations.
Linear Regression Equations
 b = n ∑ xy - ∑ x ∑ y

n ∑ x - (∑ x)

b = 10 x 11452 – 212 x 516


10 x 5000 – 44944

b = 1.0142405
Linear Regression Equations (2)
And
a = y – b.x

a = 51.6 – 1.0142405 x 21.2

a = 30.098101

Rounding these values a little:


Y = 30.10 + 1.014X
Coefficient of Determination
The coefficient of determination
measures the proportion of the
variation in the dependent variable (y)
explained by the variation in the
independent variable (x).

Itis reported as r - the square of the


product moment correlation coefficient.
Coefficient of Determination (2)
For our previous example:

r = 0.9186 = 0.844

This means that 84.4% of the variation in


cost is dependent upon output volume.
Alternatively, 15.6% of variation is not
explained.
Summary
Correlation is measured on a scale from
-1 to +1 using Pearson’s product moment
correlation coefficient (r).
Linear regression identifies the line of
‘best fit’ using the formula Y = a + bx
The coefficient of determination (r)
measures the extent to which the
dependent variable is explained by the
independent variable.
Exam Question – May 2008
Q. 7. The data below shows annual company income (£m) against year of
trading.
 
Year Income (£m)

1 20
2 23
3 26
4 28
5 35

A regression of income on year gives the following results:


 
r = 0.974, r squared = 0.948, intercept = 11.4, slope = 3.5
 
a. Explain each of the results above (1 mark each).
b. Use the results above to make a forecast for company income for year 6
(4marks).
c. What assumption is made in making this forecast? (2marks).

You might also like