5 Corellation & Regression
5 Corellation & Regression
Eg Bob thinks that height and IQ are connected. He collects data on 5 people:
a) Plot a scatter graph of this data and draw a line of best fit.
b) Use your line to estimate the height of someone with an IQ of 116
Height
167 173 164 170 175
cm
IQ 120 126 118 124 129
b) Height 162cm
Types of correlation
The relationship between 2 variables is called the correlation.
You must be able to recognise the different types of correlation:
-1 0 1
The student calculated the value of the product moment correlation coefficient
for each of the sets of data. The values were:
0.68 = C –0.79 = A 0.08 = B
Write down, with a reason, which value corresponds to which scatter diagram.
WB15 Students in Mr Brawn’s exercise class have to do press-ups and sit-ups.
The number of press-ups x and the number of sit-ups y done by a random sample
of 8 students are summarised below.
a) S xx 10164
2722
916
S xx x 2
x
2
8
n
y
2
320
2
S yy 13464 664
8 S yy y 2
n
272 320
S xy 11222 342
S xy xy
x y
8
n
342
b) r 0.439 S xy
916 664 r
S xx S yy
c) Pupils that are able to do more press-ups can
also do more sit-ups
You can code raw data in order to reduce the numbers you work with.
Coding
WB16 A company owns two petrol stations P and Q along a main road. Total daily
sales in the same week for P (£p) and for Q (£q) are summarised in the table below.
p 4365 q 4340
When these data are coded using x and y
100 100
p q x y
Monday 4760 5380 3.95 10.4
S xx 486.44
48.12
155.92...
Use the memory
functions on S xx x 2
x
2
7 n
2 your calculator
S yy 613.22
52.8
214.95... to store these
S yy y 2
y
2
7 exact values
n
48.1 52.8
S xy 204.95 157.86...
S xy xy
x y
7
n
157.86...
b) r 0.862 S xy
155.92... 214.95... r
S xx S yy
c) The same, -0.862 Coding does not effect the PMCC
Known as the least squares regression line of y on x, the equation is given by:
a y bx S xx x 2
x
2
x
x y
y
n n
mb
The above definitions, except for x and y ,
a are given on the formula sheet, but you must
be able to recognise which variable is x and
which is y, even when different letters are used
WB17 A manufacturer stores drums of chemicals. During storage, evaporation
takes place. A random sample of 10 drums was taken and the time in storage, x
weeks, and the evaporation loss, y ml, are shown in the table below.
x 3 5 6 8 10 12 13 15 16 18
y 36 50 53 61 69 79 82 90 88 96
(a) On the grid below, draw a scatter diagram to represent these data.
x 3 5 6 8 10 12 13 15 16 18 x 106 x 10.6
y 36 50 53 61 69 79 82 90 88 96 y 704 y 70.4
106 704 106 2
c) S xy 8354 891.6 S xx 1352 228.4
10 10
891.6 891.6 16571 To 2dp, a = 29.02, b = 3.90
b a 70.4 10.6
228.4 228 .4 571 y 29.02 3.90 x
d) For every week in storage, about 4ml evaporates
e) x 19 y 103.12 b is the gradient of
S xx x 2
x 2
m 0 10 20 30 40 50
S xx x 2
x 2
S mt mt
m t 150 71.6
2147 357
n 6
S mm m 2
m
2
150 2
5500 1750
n 6
An office has the heating switched on at 7.00 a.m. each morning. On a particular
day, the temperature of the office, t °C, was recorded m minutes after 7.00 a.m.
(b) Calculate the equation of the regression line of t on m in the form t = a + bm.
(c) Use your equation to estimate the value of t at 7.35 a.m.
(d) State, giving a reason, whether or not you would use the regression equation in
(b) to estimate the temperature
(i) at 9.00 a.m. that day, (ii) at 7.15 a.m. one month later.
b
357
0.204 t
t
71.6
m
m
150
a t bm
41
1750 n 6 n 6 6
41
t 0.204m di) Using the model, when m = 120 (ie 9am), t = 31oC,
6 which is higher than you would expect any heating
system to go to. The regression line is not valid.
m 35 t 13.97C dii) Yes – reset m = 0 each day
S xy S mt
The regression coefficient of y on x is b
S xx S mm
Least squares regression line of y on x is y = a + bx where a y b x
Interpreting a regression line
Eg the weekly growth, g, in mm, of a banana is plotted against the
amount of fertiliser used, f, in ml. A scatter graph is made of the results:
g
The regression line is calculated as g 4.3 2.6 f