0% found this document useful (0 votes)
16 views35 pages

CHAP-5 S1

The document contains a series of exercises and answers related to statistical analysis, including correlation coefficients, regression lines, and interpretations of data. It includes various scenarios involving hospitals, washing machines, shop rents, and performance-related pay structures. The exercises require calculations and interpretations based on provided summary statistics and data points.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views35 pages

CHAP-5 S1

The document contains a series of exercises and answers related to statistical analysis, including correlation coefficients, regression lines, and interpretations of data. It includes various scenarios involving hospitals, washing machines, shop rents, and performance-related pay structures. The exercises require calculations and interpretations based on provided summary statistics and data points.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

1

2
2

Answers:
Exercise 5A

3
4
5
4

6
Answers:
Exercise 5B
2

7
5

8
6

9
2

10
7

Answers:
Exercise 5C
1 a = -3, b = 6 7 a Snn = 589.6 , Snp = 1474
2 y = -14 + 5.5x b p = 20 + 2.5n
3 a y = -59 + 57(6) = 283 c The increase in cost, in dollars,
b For each dexterity point, productivity increases by 57. for every 100 leaflets printed.
c i No, because this is extrapolation as it is outside the range of data. d t>8
ii No, because this is extrapolation as it is outside the range of data. 8 a y = -0.07 + 1.45x
4 g = 1.50 + 1.44h b Number of years protection per coat of paint.
5 a p = 65.4 - 1.38w c Unreliable, as 7 coats lies outside the range
b w = 47.4 - 0.72p of the data.
c The gradient of the second regression line is calculated using d 10.08 years
different summary statistics rather than just the reciprocal of the e i 0.4779 + 1.247x
summary statistics used for the first regression line. ii 9.2 years
d i The first one ii The second one iii The answer now uses interpolation not
6 a Snn = 6486 , Snp = 6344 extrapolation and the number of data
b p = 21.0 + 0.978n points has increased, which increases
c 60,100 accuracy in prediction.
d Reliable, as 40000 items lies inside the range of the data.
11
7

12
3

Answers:
Exercise 5D
1 s = 88 + p
2 a y = 3.5 + 0.5x b d = 35 + 2.5c
3 a Sxy = 162.2 , Sxx = 190.8 ; y = 7.87 + 0.850x
b c = 22.3 + 2.13a
c $90.46 or $90.56
4 a p = 3.03 + 1.49v b 10.1 tonnes

13
8

14
9

15
2

16
6

Answers:
Exercise 5E
1
2

17
May/June 2019, Q6

1. Ranpose hospital offers services to a large number of clinics that refer patients to a range of hospitals.
The manager at Ranpose hospital took a random sample of 16 clinics and recorded
• the distance, x km, of the clinic from Ranpose hospital
• the percentage, y %, of the referrals from the clinic who attend Ranpose hospital.
_The data are
_ summarised as
x = 8.1 y = 20.5 ∑y2 = 8266 Sxx = 368.16 Sxy= -630.9
(a) Find the product moment correlation coefficient for these data. (4)
(b) Give an interpretation of your correlation coefficient. (1)
The manager at Ranpose hospital believes that there may be a linear relationship between
the distance of a clinic from the hospital and the percentage of the referrals who attend the
hospital. She drew the following scatter diagram for these data.

(c) State, giving a reason, whether or not these data support the manager’s belief. (1)
(d) Find the equation of the regression line of y on x, giving your answer in the form y = a + bx (4)
(e) Give an interpretation of the gradient of your regression line. (1)
(f) Draw your regression line on the scatter diagram. (1)
The manager believes that Ranpose hospital should be attracting an “above average”
percentage of referrals from clinics that are less than 5 km from the hospital. She proposes
to target one clinic with some extra publicity about the services Ranpose offers.
(g) On the scatter diagram circle the point representing the clinic she should target. (1)

18
19
January 2020, Q3

2. Soapern sells washing machines. When a customer buys a washing machine from Soapern, the customer
is also invited to buy a guarantee policy to cover breakdowns and repairs for the next three years.
The manager of Soapern believes that the relationship between the number of washing machines sold (m)
and the number of guarantee policies sold (p) can be modelled by a straight line. She collected data
each month for 10 months. The scatter diagram below illustrates these data.

The data are summarised by the following statistics.


∑m = 1124 ∑p = 281 ∑mp = 32958 Smm = 6046.4 Spp = 382.9
(a) Show that Smp = 1373.6 (1)
(b) Find the value of the product moment correlation coefficient for these data. (2)
(c) State, giving a reason, whether or not the data are consistent with the manager’s belief. (1)
The manager noticed that the total number of washing machines sold was k times the total
number of guarantee policies sold and suggests a model of the form p = __ 1 m , where k is an integer.
k
(d) Find the value of k. (2)
Jiang works for Soapern and thought that this model oversimplified the situation and
suggested that a linear regression of p on m may be more appropriate.
(e) Find the equation of the linear regression of p on m, giving your answer in the
form p = a + bm, where a and b should be given to 3 significant figures. (4)
(f) Use Jiang’s model to estimate the number of guarantee policies sold when 70 washing
machines are sold in a month. (1)
Usually about 70 washing machines are sold in January. Soapern decides to offer a bonus
to staff during January based on the number of guarantee policies sold. If the number of
guarantee policies sold is greater than the number estimated by the model, the bonus will be paid.
(g) State, giving your reasons, whether you would recommend that the staff use the
manager’s model or Jiang’s model. (2)

20
21
May/Oct 2020, Q5

3. A large company rents shops in different parts of the country. A random sample of
10 shops was taken and the floor area, x in 10 m2, and the annual rent, y in thousands of
dollars, were recorded. The data are summarised by the following statistics
∑x = 900 ∑x2 = 84818 ∑y = 183 ∑y2 = 3434
and the regression line of y on x has equation y = 6.066 + 0.136x
(a) Use the regression line to estimate the annual rent in dollars for a shop with a floor area of 800 m2 (2)
(b) Find Syy and Sxx (3)
(c) Find the product moment correlation coefficient between y and x. (4)
An 11th shop is added to the sample. The floor area is 900 m2 and the annual rent is
15000 dollars. _ _
(d) Use the formula Sxy = ∑ (x − x )( y − y ) to show that the value of Sxy for the 11 shops
will be the same as it was for the original 10 shops. (2)
(e) Find the new equation of the regression line of y on x for the 11 shops. (3)
The company is considering renting a larger shop with area of 3000 m2
(f) Comment on the suitability of using the new regression line to estimate the annual
rent. Give a reason for your answer. (1)

22
23
Janaury 2021, Q5
4. A company director wants to introduce a performance-related pay structure for her managers.
A random sample of 15 managers is taken and the annual salary, y in £1000, was recorded
for each manager. The director then calculated a performance score, x, for each of these managers.
The results are shown on the scatter diagram in the figure below.

(a) Describe the correlation between performance score and annual salary. (1)
The results are also summarised in the following statistics.

∑x = 465 ∑y = 562 Sxx = 2492 ∑y2 = 23140 ∑xy = 19428

(b) (i) Show that Sxy = 2006 (1)


(ii) Find Syy (2)
(c) Find the product moment correlation coefficient between performance score and annual salary. (2)
The director believes that there is a linear relationship between performance score and annual salary.
(d) State, giving a reason, whether or not these data are consistent with the director’s belief. (1)
(e) Calculate the equation of the regression line of y on x, in the form y = a + bx
Give the value of a and the value of b to 3 significant figures. (4)
(f) Give an interpretation of the value of b. (1)
(g) Plot your regression line on the scatter diagram in the figure. (2)
The director hears that one of the managers in the sample seems to be under-performing.
(h) On the scatter diagram, circle the point that best identifies this manager. (1)
The director decides to use this regression line for the new performance related pay structure.
(i) Estimate, to 3 significant figures, the new salary of a manager with a performance score of 30 (2)

24
25
May 2021, Q6

5. Two economics students, Andi and Behrouz, are studying some data relating to unemployment,
x %, and increase in wages, y %, for a European country. The least squares regression line of y on x
has equation; y = 3.684 − 0.3242x and
∑y = 23.7 ∑y2 = 42.63 ∑x2 = 756.81 n = 16
(a) Show that Syy = 7.524375 (1)
(b) Find Sxx (4)
(c) Find the product moment correlation coefficient between x and y. (3)
Behrouz claims that, assuming the model is valid, the data show that when unemployment is
2% wages increase at over 3%
(d) Explain how Behrouz could have come to this conclusion. (1)
Andi uses the formula; range = mean ± 3 × standard deviation; to estimate the range of values for x.
(e) Find estimates of the minimum value and the maximum value of x in these data using Andi’s formula. (3)
(f) Comment, giving a reason, on the reliability of Behrouz’s claim. (2)
Andi suggests using the regression line with equation y = 3.684 – 0.3242x to estimate unemployment
when wages are increasing at 2%
(g) Comment, giving a reason, on Andi’s suggestion. (2)

26
27
Janaury 2022, Q2

6. Tom’s car holds 50 litres of petrol when the fuel tank is full. For each of 10 journeys, each starting
with 50 litres of petrol in the fuel tank, Tom records the distance travelled, d kilometres, and
the amount of petrol used, p litres. The summary statistics for the 10 journeys are given below.
∑d = 1029 ∑p = 50.8 ∑dp = 5240.8 Sdd = 344.9 Spp = 0.576
(a) Calculate the product moment correlation coefficient between d and p (3)
The amount of petrol remaining in the fuel tank for each journey, w litres, is recorded.
(b) (i) Write down an equation for w in terms of p
(ii) Hence, write down the value of the product moment correlation coefficient between w and p (2)
(c) Write down the value of the product moment correlation coefficient between d and w (1)

28
Janaury 2022, Q6

7. Students on a psychology course were given a pre‑test at the start of the course and a final exam
at the end of the course. The teacher recorded the number of marks achieved on the pre‑test, p, and
the number of marks achieved on the final exam, f, for 34 students and displayed them on the scatter diagram.

The equation of the least squares regression line for these data is found to be; f = 10.8 + 0.748 p
For these students, the mean number of marks on the pre‑test is 62.4
(a) Use the regression model to find the mean number of marks on the final exam. (2)
(b) Give an interpretation of the gradient of the regression line. (1)
Considering the equation of the regression line, Priya says that she would expect
someone who scored 0 marks on the pre‑test to score 10.8 marks on the final exam.
(c) Comment on the reliability of Priya’s statement. (1)
(d) Write down the number of marks achieved on the final exam for the student who
exceeded the expectation of the regression model by the largest number of marks. (1)
(e) Find the range of values of p for which this regression model, f = 10.8 + 0.748 p,
predicts a greater number of marks on the final exam than on the pre‑test. (3)
Later the teacher discovers an error in the recorded data. The student who
achieved a score of 98 on the pre‑test, scored 92 not 29 on the final exam.
The summary statistics used for the model f = 10.8 + 0.748 p are corrected to include
this information and a new least squares regression line is found. Given the original summary statistics were,
n = 34 ∑p = 2120 ∑pf = 133486 Spp = 15573.76 Spf = 11 648.35
(f) calculate the gradient of the new regression line. Show your working clearly. (5)

29
30
May 2022, Q2

8. Stuart is investigating the relationship between Gross Domestic Product (GDP) and the size of
the population for a particular country. He takes a random sample of 9 years and records the
size of the population, t millions, and the GDP, g billion dollars for each of these years.
The data are summarised as
n = 9 ∑t = 7.87 ∑g = 144.84 ∑g2 = 3624.41 Stt = 1.29 Stg = 40.25
(a) Calculate the product moment correlation coefficient between t and g (3)
(b) Give an interpretation of your product moment correlation coefficient. (1)
(c) Find the equation of the least squares regression line of g on t in the form g = a + bt (4)
(d) Give an interpretation of the value of b in your regression line. (1)
(e) (i) Use the regression line from part (c) to estimate the GDP, in billions of dollars, for a population of 7000000 (2)
(ii) Comment on the reliability of your answer in part (i). Give a reason, in context, for your answer. (1)
Using the regression line from part (c), Stuart estimates that for a population increase of
x million there will be an increase of 0.1 billion dollars in GDP.
(f) Find the value of x (2)

31
Janaury 2023, Q6

9. A research student is investigating the maximum weight, y grams, of sugar that will
dissolve in 100 grams of water at various temperatures, x °C, where 10 ≤ x ≤ 80
The research student calculated the regression line of y on x and found it to be y = 151.2 + 2.72x
(a) Give an interpretation of the gradient of the regression line. (1)
(b) Use the regression line to estimate the maximum weight of sugar that will dissolve
in 100 grams of water when the temperature is 90 °C. (2)
(c) Comment on the reliability of your estimate, giving a reason for your answer. (2)
Using the regression line of y on x and the following summary statistics
∑y = 3119 ∑y2 = 851093 ∑x2 = 24500 n = 12
(d) show that the product moment correlation coefficient for these data is 0.988 to 3 decimal places. (7)

The research student’s supervisor plotted the original data on a scatter diagram, shown below.
With reference to both the scatter diagram and the correlation coefficient, (2)
(e) discuss the suitability of a linear regression model to describe the relationship between x and y.

32
33
May 2023, Q2

10. Two students, Olive and Shan, collect data on the weight, w grams, and the tail length, t cm, of 15 mice.
Olive summarised the data as follows
Stt = 5.3173 ∑w2 = 6089.12 ∑tw = 2304.53 ∑w = 297.8 ∑t = 114.8
(a) Calculate the value of Stw and the value of Sww (3)
(b) Calculate the value of the product moment correlation coefficient between w and t (2)
(c) Show that the equation of the regression line of w on t can be written as w = –16.7 + 4.77t (3)
(d) Give an interpretation of the gradient of the regression line. (1)
(e) Explain why it would not be appropriate to use the regression line in part (c) to estimate
the weight of a mouse with a tail length of 2 cm. (2)
__
w
–5
Shan decided to code the data using x = t – 6 and y =
2

(f) Write down the value of the product moment correlation coefficient between x and y (1)
(g) Write down an equation of the regression line of y on x. You do not need to simplify your equation. (1)

34
Answers
1 (a) -0.837 (b) As the distance from the hospital increases the percentage of referrals decreases
e.g. smaller % of patients attend from clinics further away
(c) e.g. Points close to a straight line (of negative gradient) so does support belief (d) y = 34.4 - 1.71x
(e) [On average] each km further from the hospital reduces the % attendance by 1.7%
(f) Correct line drawn on scatter diagram. (g) Correct point circled (3.2,19)
2 (a) 1373.6 (b) 0.903
(c) In scatter diagram points are close to a line or r is close to (or near to) 1. It is consistent with the manager’s belief.
(d) k=4 (e) p = 2.57 + 0.227m (f) 18.467… accept answers in range [18, 18.6]
(g) Manager’s model (when m = 70) estimates p = 17.5 So use manager’s model since wants the lower estimate.

3 (a) $ 16946 (b) Syy = 85.1 Sxx = 3818 _ (c) 0.911


_
(d) Since (new x = 90 and [original or] new x = 90) the term ( x - x ) will be 0. Therefore (the 11th shop makes no change) Sxy stays the same.
(e) y = 5.766 + 0.136x (f) x = 300 is outside the range 300 >> 90 [ 300 >> 90 + 3σ = 90 + 3 × 18.63..≈ 146]
So not suitable (since involves extrapolation)
4 (a) Positive (correlation) or e.g. “salary (y) increases as performance (x) increases” (b) (i) 2006 (ii) 2080 (c) 0.880
(d) Is consistent and the points on the scatter diagram lie close to a straight line OR r is close to 1 or strong/high (positive) correlation.
(e) y = 12.5 + 0.805x (f) An increase of 1 (performance) point gives an extra £800 (1 sf) in salary.
(g) Line must cross x = 9 and x = 50 to score either of these marks, Line for 9~50 Intercept (extend line if necessary) at “12.5” (accept 11.5~13.5)
Line for 9~50 At x = 50 y = 52.8 (accept 52~54)
(h) For the point (25, 48) circled. (i) "12.5" + 30 × "0.805" [= 36 ~37] Salary of (£) 36 700 (or 36.7 thousands)

5 (a) 7.524375 (b) 18.2 (c) – 0.50399… = (– 0.49) ~ (– 0.51) (d) Sub x = 2 in the regression line gives y = 3.0356
(e) 3.5965…~ 9.9929… = 3.6 ~ 10 (f) The probability of x = 2 being in the range is very small; so Behrouz’s estimate is unreliable.
(g) Should use regression of x on y to estimate unemployment or equivalent. So Andi’s suggestion is not suitable or not to be recommended.
6 (a) 0.956 (b) (i) w = 50 – p (ii) – 1 (c) – 0.956

7(a) (b) )RUHDFKDGGLWLRQDOPDUNVFRUHGRQWKHSUHWHVWWKHDYHUDJHPDUNRQWKHILQDOH[DPLQFUHDVHVE\


(c) The statement is not reliable as there is no data below 19 (extrapolation). (d) 76 (e) p < 42.9 (f) 0.9

8 (a) 0.985 (b) As the population/t increases, GDP/g increases (c) g = – 11.2 + 31.2t
(d) The GDP/g increases by (an average of) "31.2" billion [dollars] when the population/t
_ increases by one million.
(e) (ii) 207 (ii) Unreliable as 7000000 is much greater than the mean population/ t for the 9 years (f) 0.0032

9 (a) An increase/change of 1°C will allow an extra 2.72 grams [of sugar] to dissolve (b) 151.2 + 2.72 × 90 = 396
(c) The temperature/90 [°C] is outside of the range ; so (may be) unreliable (d) 0.988
(e) e.g. the points lie reasonably close to a straight line/positive correlation and the PMCC is close to 1 therefore supports a linear model.

10 (a) Stw = 25.4 Sww = 177 (b) 0.827 or 0.828 (c) b = 4.771 or a = –16.66 and w = −16.7 + 4.77t *
(d) [On average,] for each cm/1 cm of tail length/t the weight/w increases by 4.77 g/grams
(e) w = − 7.16 or 9.54 < 16.7 or 2 < 3.5 which is negative/weight cannot be negative OR
for sd extrapolation since a 2 cm tail is (approx 9 sd)/(more than 3 sd) from the mean.
(f) 0.827 (g) 2y +10 = −16.7 + 4.77( x + 6)

35

You might also like