Correlation and Regression s1
Correlation and Regression s1
Q1.
A large company is analysing how much money it spends on paper in its offices each year. The number of
employees in the office, x, and the amount spent on paper in a year, p ($ hundreds), in each of 12
randomly selected offices were recorded.
He wants each office to aim for a model of the form where a and b are the values
found in part (c).
(e) estimate the percentage saving in the amount spent on paper each year by the company using the
director's model.
(3)
Q2.
Two economics students, Andi and Behrouz, are studying some data relating to unemployment, x %, and
increase in wages, y %, for a European country. The least squares regression line of y on x has equation
y = 3.684 − 0.3242x
and
(e) Find estimates of the minimum value and the maximum value of x in these data using Andi's formula.
(3)
(f) Comment, giving a reason, on the reliability of Behrouz's claim.
(2)
Andi suggests using the regression line with equation y = 3.684 – 0.3242x to estimate unemployment
when wages are increasing at 2%
Q3.
Stuart is investigating the relationship between Gross Domestic Product (GDP) and the size of the
population for a particular country. He takes a random sample of 9 years and records the size of the
population, t millions, and the GDP, g billion dollars for each of these years.
Q4.
A company director wants to introduce a performance-related pay structure for her managers. A random
sample of 15 managers is taken and the annual salary, y in £1000, was recorded for each manager. The
director then calculated a performance score, x, for each of these managers.
The results are shown on the scatter diagram in Figure 1 on the next page.
(a) Describe the correlation between performance score and annual salary.
(1)
The results are also summarised in the following statistics.
(d) State, giving a reason, whether or not these data are consistent with the director's belief.
(1)
(e) Calculate the equation of the regression line of y on x, in the form y = a + bx
Give the value of a and the value of b to 3 significant figures.
(4)
(f) Give an interpretation of the value of b.
(1)
(g) Plot your regression line on the scatter diagram in Figure 1
(2)
The director hears that one of the managers in the sample seems to be underperforming.
(h) On the scatter diagram, circle the point that best identifies this manager.
(1)
The director decides to use this regression line for the new performance related pay structure.
(i) Estimate, to 3 significant figures, the new salary of a manager with a performance score of 30
(2)
Q5.
The variables x and y have the following regression equations based on the same 12 observations.
(a) (i) Find the point of intersection of these lines.
Q6.
Two students, Olive and Shan, collect data on the weight, w grams, and the tail length, t cm, of 15 mice.
w = –16.7 + 4.77t
(3)
(d) Give an interpretation of the gradient of the regression line.
(1)
(e) Explain why it would not be appropriate to use the regression line in part (c) to estimate the weight of
a mouse with a tail length of 2 cm.
(2)
Shan decided to code the data using x = t – 6 and
(f) Write down the value of the product moment correlation coefficient between x and y
(1)
(g) Write down an equation of the regression line of y on x
You do not need to simplify your equation.
(1)
Q7.
Tom's car holds 50 litres of petrol when the fuel tank is full.
For each of 10 journeys, each starting with 50 litres of petrol in the fuel tank, Tom records the distance
travelled, d kilometres, and the amount of petrol used, p litres.
Q8.
The percentage oil content, p, and the weight, w milligrams, of each of 10 randomly selected sunflower
seeds were recorded. These data are summarised below.
Q9.
Students on a psychology course were given a pre-test at the start of the course and a final exam at the
end of the course. The teacher recorded the number of marks achieved on the pre-test, p, and the
number of marks achieved on the final exam, f, for 34 students and displayed them on the scatter
diagram.
The equation of the least squares regression line for these data is found to be
f = 10.8 + 0.748 p
For these students, the mean number of marks on the pre-test is 62.4
(a) Use the regression model to find the mean number of marks on the final exam.
(2)
(b) Give an interpretation of the gradient of the regression line.
(1)
Considering the equation of the regression line, Priya says that she would expect someone who scored 0
marks on the pre-test to score 10.8 marks on the final exam.
The summary statistics used for the model f = 10.8 + 0.748 p are corrected to include this information and
a new least squares regression line is found.
(f) calculate the gradient of the new regression line. Show your working clearly.
(5)
Q10.
A research student is investigating the maximum weight, y grams, of sugar that will dissolve in 100 grams
of water at various temperatures, x °C, where 10 ≤ x ≤ 80
y = 151.2 + 2.72x
(d) show that the product moment correlation coefficient for these data is 0.988 to 3 decimal places.
(7)
The research student's supervisor plotted the original data on a scatter diagram, below.
With reference to both the scatter diagram and the correlation coefficient,
(e) discuss the suitability of a linear regression model to describe the relationship between x and y.
(2)
Q11.
The production cost, £c million, of a film and the total ticket sales, £t million, earned by the film are
recorded for a sample of 40 films.
(a) Find the exact value of Stt and the exact value of Sct
(3)
(b) Calculate the value of the product moment correlation coefficient for these data.
(2)
(c) Give an interpretation of your answer to part (b)
(1)
(d) Show that the equation of the linear regression line of t on c can be written as
t = –5.84 + 0.976c
where the values of the intercept and gradient are given to 3 significant figures.
(3)
(e) Find the expected total ticket sales for a film with a production cost of £90 million.
(2)
Using the regression line in part (d)
(f) find the range of values of the production cost of a film for which the total ticket sales are less than
80% of its production cost.
(2)
Mark Scheme
Q1.
Q2.
Q3.
Q4.
Q5.
Q6.
Q7.
Q8.
Q9.
Q10.
Q11.