Correlation and Regression
Correlation and Regression
Madas
CORRELATION
&
REGRESSION
Part 1
Created by T. Madas
Created by T. Madas
Question 1 (**)
The annual car sales of a small car manufacturer, c , and the annual advertising
expenditure, £ a , has product moment correlation coefficient rac .
a
x = c − 7000 and y= ,
1000
Created by T. Madas
Created by T. Madas
Question 2 (**)
The percentage mock exam marks, of a random sample of 8 G.C.S.E. students, in
Geography and History are recorded in the table below.
Student A B C D E F G H
Geography 80 29 56 56 58 45 67 72
History 78 49 65 50 75 50 60 47
Test, at the 10% level of significance, whether there is evidence of positive correlation
between the percentage mock exam marks in Geography and History.
Created by T. Madas
Created by T. Madas
Question 3 (**)
The table below shows the number of Maths teachers x , working in 8 different towns
and the number of burglaries y , committed in a given month in the same 8 towns.
Town A B C D E F G H
x 35 42 21 55 33 29 39 40
y 30 28 21 38 35 27 30 k
b) Interpret the value of the product moment correlation coefficient in the context
of this question.
Created by T. Madas
Created by T. Madas
Question 4 (**)
The table below shows the marks obtained by a group of students, in two separate tests.
Student A B C D E F G H
Test 1 28 39 18 30 42 43 33 10
Test 2 12 23 16 16 28 18 24 7
The first test is out of 50 marks while the second test is out of 30 marks.
Let x and y represent the marks obtained in Test 1 and Test 2 , respectively.
a) Use a statistical calculator to find the value of the product moment correlation
coefficient between x and y .
b) Explain how the value of the product moment correlation coefficient between
x and y will be affected if the individual test marks were converted into
percentage marks.
A student was absent from the second test but he obtained 30 marks in the first test.
d) Use linear regression to estimate this student’s mark in the second test.
Created by T. Madas
Created by T. Madas
Question 5 (**)
The table below shows the maximum daytime temperature, in °C , at a certain city
centre, and the amount of a certain pollutant in mg per litre.
Maximum Temperature 10 12 14 16 18 20 22 24
Amount of Pollutant 513 475 525 530 516 520 507 521
a) Find, using a statistical calculator, the value of the product moment correlation
coefficient for the above data.
b) State, with justification, the value of the product moment correlation coefficient,
if the maximum daily temperatures were to be measured in degrees Fahrenheit.
Created by T. Madas
Created by T. Madas
Question 6 (**)
The table below shows the daily number of shoplifting incidents in a shopping mall,
for a given seven day week and the number of the security guards employed in each of
these seven days.
a) Find, using a statistical calculator, the value of the product moment correlation
coefficient for these data.
Created by T. Madas
Created by T. Madas
Question 7 (**)
An electrical appliances supplier wishes to investigate the impact of advertising on the
sales of his washing machines.
He records the number of monthly advertisements placed on the local radio station and
the number of washing machines sold.
Number of
52 37 66 45 77 27 80 19 47 40
Advertisements (x)
Number of Washing
180 115 171 166 177 99 174 100 143 164
Machines Sold (y)
Created by T. Madas
Created by T. Madas
Question 8 (**)
The table below shows the number of Maths teachers x , working in 8 different
schools and the number of students y , in each of these 8 schools.
School A B C D E F G H
x 5 9 11 17 12 10 9 8
y 225 247 334 811 382 340 285 k
Created by T. Madas
Created by T. Madas
Question 9 (***)
The table below shows the amount spent per month by a car dealership on marketing
and advertising m , in £1000 , and the number of cars c sold that month.
m 6 7 8 9 10
c 8 13 11 12 14
ii. … the equation of the regression line between m and c , giving the
answer in the form
c = a + bm ,
b) Use the equation of the regression line to estimate the number of cars that are
expected to be sold in a month where the amount spent on marketing and
advertising is …
i. … £8,800 .
ii. … £20,000 .
Created by T. Madas
Created by T. Madas
Question 10 (***)
The table below shows the maximum temperature T °C on five different days and the
corresponding ice cream sales, N , of a certain shop on those days.
T 15 20 25 30 35
N 69 165 172 200 232
a) State, with a reason, which is the explanatory variable in the above described
scenario and state the statistical name of the other variable.
ii. … the equation of the regression line between N and T , giving the
answer in the form
N = a + bT ,
d) Use the equation of the regression line to estimate the value of N when …
i. … T = 18°C .
ii. … T = 37°C .
iii. … T = 45°C
MMS-L , r = 0.934 , N = 7.22T − 12.9 , N18 ≈ 117 , N37 ≈ 254 , T45 ≈ 312
Created by T. Madas
Created by T. Madas
Question 11 (***)
It is an actual fact that “sleeping with your clothes and shoes on is strongly correlated
with waking up with a headache”.
Evidently the conclusion is that “sleeping with your clothes and shoes on causes a
headache”.
Discuss the validity of the above conclusion indicating how a strong correlation is
possible in the above scenario.
Created by T. Madas
Created by T. Madas
CORRELATION
&
REGRESSION
Part 2
Created by T. Madas
Created by T. Madas
Question 1 (**)
The table below shows the marks obtained by a group of students, in two separate tests.
Student A B C D E F G H
Test 1 27 38 17 29 41 42 32 9
Test 2 13 24 17 17 29 19 25 8
The first test is out of 50 marks while the second test is out of 30 marks.
Let x and y represent the marks obtained in Test 1 and Test 2 , respectively.
b) Explain how the value of the product moment correlation coefficient between
x and y will be affected if the individual test marks were converted into
percentage marks.
Created by T. Madas
Created by T. Madas
Question 2 (**)
The table below shows the number of Maths teachers x , working in 8 different towns
and the number of burglaries y , committed in a given month in the same 8 towns.
Town A B C D E F G H
x 37 40 21 50 32 27 39 40
y 30 28 20 35 34 27 31 26
b) Interpret the value of the product moment correlation coefficient in the context
of this question.
FS2-P , r ≈ 0.692
Created by T. Madas
Created by T. Madas
Question 3 (**)
An electrical appliances supplier wishes to investigate the impact of advertising on the
sales of his washing machines.
He records the number of monthly advertisements placed on the local radio station and
the number of washing machines sold.
Number of
52 37 66 45 77 27 80 19 47 40
Advertisements (x)
Number of Washing
80 75 81 76 77 49 84 50 63 64
Machines Sold (y)
Find, by detailed calculations, the value of the product moment correlation coefficient
between x and y , and explain what conclusions the electrical appliances supplier
should make from this value.
FS2-M , r = 0.820
Created by T. Madas
Created by T. Madas
Question 4 (**+)
An electrical tester wishes to test the accuracy of a voltmeter used in a lab.
He uses a carefully calibrated voltage source and takes readings with the voltmeter he
wishes to be tested.
Actual Voltage
10 20 30 40 50 60 70 80 90 100
(x)
Voltmeter
9 19 34 39 54 61 68 80 92 99
Reading (y)
b) Determine the equation of the regression line between x and y , giving the
answer in the form
y = a + bx ,
Created by T. Madas
Created by T. Madas
Question 5 (**+)
The table below shows the marks obtained by a group of students, in two separate tests.
Student A B C D E F G H I J
Test 1 17 11 16 9 12 12 11 4 7 15
Test 2 24 21 24 20 22 18 18 9 15 21
Let x and y represent the marks obtained in Test 1 and Test 2 , respectively.
f) Determine the equation of the regression line between x and y , giving the
answer in the form
y = a + bx ,
Created by T. Madas
Created by T. Madas
Question 6 (**+)
The table below shows 10 pairs of bivariate data.
a) Determine the value of S xx , S yy and S xy , and hence calculate the value of the
product moment correlation coefficient between x and y .
b) Find the equation of the least squares regression line between x and y , giving
the answer in the form
y = a + bx ,
y = 12.6 − 0.0850 x
Created by T. Madas
Created by T. Madas
Question 7 (**+)
The table below shows the heights and weights of a random sample of 10 pupils,
where the heights are given to the nearest cm and the weights to the nearest 5 kg.
Pupil A B C D E F G H I J
Height (cm) 148 164 156 172 147 184 162 155 182 165
Weight (kg) 40 60 55 75 40 80 65 50 80 70
Let x and y represent the respective heights and weights of these pupils and r the
product moment correlation coefficient between x and y .
c) State the value of r if the heights were measured in metres instead of cm.
d) Determine the equation of the regression line between x and y , giving the
answer in the form
y = bx + a ,
where a and b are constants.
Created by T. Madas
Created by T. Madas
Question 8 (**+)
The table below shows the midday daily temperature x , in °C , and the number of cups
of tea y , sold in a small café.
x 20 25 26 27 29 29 32 36
y 100 80 72 74 65 69 63 60
a) Find the value of S xx , S yy and S xy , and hence calculate the product moment
correlation coefficient between x and y .
b) Determine the equation of the regression line between x and y , giving the
answer in the form
y = a + bx ,
c) Use the equation of the regression line to estimate the value of y when …
i. … x = 40 .
ii. … x = 50 .
Created by T. Madas
Created by T. Madas
Question 9 (***)
The table below shows the average midday temperature x of a seaside town, in °C ,
and the number of people y , that used a certain restaurant in that town.
x 17 20 25 29 27 21 20 24
y 40 42 42 43 44 39 41 45
a) Find the value of S xx , S yy and S xy , and hence calculate the product moment
correlation coefficient between x and y .
b) State the value of the product moment correlation coefficient between x and y
if the temperature was measured in degrees Fahrenheit instead of Centigrade.
c) Determine the equation of the regression line between x and y , giving the
answer in the form
y = a + bx ,
d) State, with a reason, which is the explanatory variable in the above described
scenario and state the statistical name of the other variable.
f) Use the equation of the regression line to estimate the value of y when …
i. … x = 16 .
ii. … x = 35 .
[solution overleaf]
Created by T. Madas
Created by T. Madas
Created by T. Madas
Created by T. Madas
Question 10 (***)
The table below shows the maximum temperature T °C on five different days and the
corresponding ice cream sales, N , of a certain shop on those days.
T 15 20 25 30 35
N 79 145 182 255 302
a) Find the value of STT , S NN and STN , and hence, determine the value of the
product moment correlation coefficient between T and N .
b) State, with a reason, which is the explanatory variable in the above described
scenario and state the statistical name of the other variable.
c) Determine the equation of the regression line between N and T , giving the
answer in the form
N = a + bT ,
e) Use the equation of the regression line to estimate the value of N when …
i. … T = 18°C .
ii. … T = 37°C .
iii. … T = 45°C
[solution overleaf]
Created by T. Madas
Created by T. Madas
Created by T. Madas
Created by T. Madas
Question 11 (***+)
The table below shows the marks obtained by a group of students, in two separate tests.
Student A B C D E F G H
Test 1 35 42 21 55 33 29 39 40
Test 2 30 28 21 38 35 27 30 k
Use linear regression for the test marks of the students A – G , to estimate the value of
k , for student H.
FS2-N , k ≈ 31
Created by T. Madas
Created by T. Madas
Question 12 (***+)
The table below shows the amount spent per month by a car dealership on marketing
and advertising m , in £1000 , and the number of cars c sold that month.
m 7 8 9 10 11
c 7 12 10 11 13
a) Find the value of the product moment correlation coefficient between m and c .
b) Determine the equation of the regression line between m and c , giving the
answer in the form
c = a + bm ,
c) Use the equation of the regression line to estimate the number of cars that are
expected to be sold in a month where the amount spent on marketing and
advertising is …
i. … £8,800 .
ii. … £20,000 .
Created by T. Madas
Created by T. Madas
Question 13 (***+)
The table below shows the tomato yield obtained by a group of ten plants that were
given different amounts of fertilizer and allowed to grow in otherwise identical
conditions.
Plant A B C D E F G H I J
Amount of Fertilizer
0 10 20 30 40 50 60 70 80 90
(grams)
Tomato Yield
1.2 1.9 2.1 2.4 2.5 2.7 3.0 k 3.2 3.1
(kilograms)
a) Find an equation of the line of least squares using the plants A to G, I and J ,
and hence estimate the value of k , for the plant H.
Detailed workings are expected in this part
Another plant N, not included in the table, was given 200 grams of fertilizer.
Created by T. Madas
Created by T. Madas
Question 14 (***+)
The table below shows a set of bivariate data involving two variables x and y .
x − 1012
X= and Y = 10000 y − 27
3
c) State with justification the value of the product moment correlation coefficient
between x and y .
d) Determine the equation of the regression line between x and y , giving the
answer in the form
y = a + bx ,
Created by T. Madas
Created by T. Madas
Question 15 (***+)
The table below shows a set of bivariate data involving two variables t and v .
t − 157 v
x= and y=
3 100
c) State with justification the value of the product moment correlation coefficient
between t and v .
d) Determine the equation of the regression line between t and v , giving the
answer in the form
v = A + Bt ,
Created by T. Madas
Created by T. Madas
Question 16 (***+)
Clinical trials are carried out to determine the effect of a stimulant.
Ten volunteers were given different amounts of the stimulant, X milligrams, and the
amount of their nightly sleep, Y hours, were recorded in the following night.
• Claim 1
For every additional 60 milligrams of the stimulant, the nightly sleep typically
reduces by 40 minutes.
• Claim 2
The expected nightly sleep would have been 8 hours if no stimulant was taken.
Created by T. Madas
Created by T. Madas
Question 17 (***+)
Dolphins are thought to communicate with each other by high pitch noises they
produce. The frequency, v kHz , of the noise made by a dolphin is recorded at 15
different sea depths, d m . These data are summarized below.
d) Interpret the value of the product moment correlation coefficient in the context
of this question.
v = a + bd ,
Created by T. Madas
Created by T. Madas
Question 18 (****)
The mean and variance of 10 independent observations of a random variable x , are
66.5 and 85.8 , respectively.
y = 0.0949 x − 0.0130 .
FS2-X , r = 0.977
Created by T. Madas
Created by T. Madas
Question 19 (****)
A gym opened on the first day of January of a given year.
The number of new members, N , at the end of each month m , was recorded for those
12 months.
N = 34 + 35m .
Use the regression line to find the total number of members which joined that gym
during that year.
No credit will be given for adding 69, 104, 139, … , 419, 454.
FS2-Z , 3138
Created by T. Madas
Created by T. Madas
Question 20 (****+)
Some summary statistics for a set of bivariate data, based two variables x and y , are
given below.
x , y , x2 , y2 .
b) Calculate the product moment correlation coefficient between x and y .
Created by T. Madas
Created by T. Madas
Question 21 (****+)
On a certain mountain climb, a scientist recorded the temperature, T °C , at ten
different heights, H m above sea level, and some of his results are summarized below.
FS2-T , ≈ 25.9 °C
Created by T. Madas
Created by T. Madas
Question 22 (****+)
The number of letters x in people’s first names and number of letters y in people’s
surnames is researched.
The summary data of the number of letters in the first names and the surnames of a
random sample of 20 individuals is shown below.
The name “Richard Edwards” is added to the sample, making the total number of
people in the sample, 21 .
i. ... show that S xx of the 21 first names is likely to have a different value
to the original value of S xx of the original 20 first names.
FS2-S , r ≈ 0.253
Created by T. Madas
Created by T. Madas
Question 23 (****+)
Two variables, x and y , have the following regression equations, based on 5
observations.
y on x : y = 18.5 + 0.1x
x on y : x = 16.6 + 0.4 y
proof
Created by T. Madas
Created by T. Madas
SPEARMAN'S
RANK
Created by T. Madas
Created by T. Madas
Question 1 (**)
Nine gymnasts performed in a gymnastics competition.
Their names were Arnold (A), Brian (B), Christian (C), Damon (D), Eli (E), Fabian (F),
Gordon (G), Harry (H) and Ian (I).
Rank 1 2 3 4 5 6 7 8 9
Judge 1 D C E B F A I H G
Judge 2 D E F C I B A G H
b) Test whether or not the judges are generally in agreement, at the 1% level of
significance, stating your hypotheses clearly.
Created by T. Madas
Created by T. Madas
Question 2 (**)
The data in the table below shows the time, in seconds, for the fastest qualifying lap for
8 different Formula One racing drivers, and their finishing order in the actual race.
Fastest Qualifying Lap 49.12 49.34 49.07 48.55 49.40 49.27 49.77 48.87
Finishing Position 5 6 1 3 7 4 8 2
b) Test whether or not there is any association between the fastest qualifying lap
time and the finishing position for Formula One racing drivers, at the 5% level
of significance, stating your hypotheses clearly.
Created by T. Madas
Created by T. Madas
Question 3 (**)
The table below shows the mileages travelled by eleven salesmen and the commission
they got paid during a given month.
Created by T. Madas
Created by T. Madas
Question 4 (**)
The actual ages, in complete years, of seven cats is shown below.
These seven cats were seen by a vet, during a day’s surgery, and the vet was asked to
order them according to their age by examination only.
c) Calculate Spearman's rank correlation coefficient between the actual age of the
cats and the vet’s order.
d) Test whether or not the vet has the ability to identify the age of cats, at the 1%
level of significance, stating your hypotheses clearly.
Created by T. Madas
Created by T. Madas
Question 5 (**+)
Six ordered pairs ( x, y ) , of bivariate data, are shown in the following set of axes.
O
x
FS2-R , rs = 31 ≈ 0.886
35
Created by T. Madas
Created by T. Madas
Question 6 (**+)
The table below shows, for a group of students in a recent mock exam, the number of
marks lost, y , and the corresponding number of papers, x , they practiced leading up
to that exam.
Student A B C D E F G H I J
Number of Papers (x) 17 39 24 26 11 22 25 10 8 6
Number of Marks Lost (y) 12 5 11 14 10 9 8 15 19 17
a) Find the value of S xx , S yy and S xy , and hence determine the value of the
product moment correlation coefficient between x and y .
Created by T. Madas