Linear Regression and Correlation (Examples With Answers)
Linear Regression and Correlation (Examples With Answers)
1. A comparison of the undergraduate grade point averages of 12 corporate employees with their scores on a managerial
trainee examination produced the following results:
Employee 1 2 3 4 5 6 7 8 9 10 11 12
GPA, x 2.2 2.4 3.1 2.5 3.5 3.6 2.5 2.0 2.2 2.6 2.7 3.3
Exam Score, y 76 89 83 79 91 95 82 69 66 75 80 88
2. The regional transit authority for a major metropolitan area wants to determine whether there is any relationship
between the age of a bus (X) and the annual maintenance cost (Y). A sample of 10 buses resulted in the following
data:
Bus No. 1 2 3 4 5 6 7 8 9 10
X: Age (years) 1 2 2 2 2 3 4 4 5 5
Y: Maintenance Cost ($) 350 370 480 520 590 550 750 800 790 950
3. A marketing professor is interested in the relationship between hours spent studying (X) and total points earned
(Y) in a course. Data collected on a sample of 10 students who took the course last term follow.
Student No. 1 2 3 4 5 6 7 8 9 10
X: Hours spent studying 45 30 90 60 105 65 90 80 55 75
Y: Total points earned 40 35 75 65 90 50 90 80 45 65
3.1 Set up the equation of the least squares line (SLRM) for this data set. (𝒚
̂ = 𝟓. 𝟖𝟒𝟕 + 𝟎. 𝟖𝟑𝟎𝒙)
3.2 Interpret the slope (b1) of the fitted SLRM in (3.1).
3.3 Give a point estimate of the expected total points earned ( Ŷ ) when 85 hours are spent for studying. (76.4)
4. The following table shows the number of sales contacts (X) made by a sample of n = 10 salespersons during a
week and the number of sales (Y) made.
Salesperson 1 2 3 4 5 6 7 8 9 10
X: No. of sales contacts 71 64 100 105 75 59 82 68 111 90
Y: No. of sales 25 16 37 40 18 10 22 14 42 19
Page | 1
5. A store manager wishes to find out whether there is a relationship between the age (X) of her employees and the number of
sick days (Y) they take each year. The data for a sample of n = 6 employees are shown below:
Employee 1 2 3 4 5 6
Age (X) 18 26 39 48 53 58
Days (Y) 16 12 9 5 6 2
5.1 Set up the equation of the fitted regression line for this data set. (𝒚
̂ = 𝟐𝟏. 𝟏𝟎𝟎 − 𝟎. 𝟑𝟏𝟕𝒙)
5.2 Is X (age) a significant explanatory variable (predictor) for the response variable Y (days)? Justify.
(t = -9.623; Reject H0)
5.3 Interpret the slope of the regression line in (5.1).
5.4 Give the expected number of sick days for employees with age 50. (5.3 days)
5.5 Give and explain briefly the sample coefficient of determination, r2. (95.9%)
5.6 Give and explain briefly the Pearson’s correlation coefficient, r. (-0.979)
6. A warehouse manager is interested in the possible improvements to labor efficiency if air-conditioning is installed in the
warehouse. The data set shown in the following table is collected which shows the times taken to unload a fully laden truck
at various temperature levels.
6.1 Fit a linear regression model with time as the dependent variable and temperature as the explanatory
(independent/predictor) variable. Indicate the scope of regression. (𝒚 ̂ = 𝟑𝟔. 𝟏𝟗𝟒 + 𝟎. 𝟐𝟔𝟔𝒙)
6.2 Is X (temperature) a significant predictor for the response variable Y (unloading time)? Justify using an appropriate
significance test. (t = 1.116; DNR H0)
6.3 Does your analysis indicate that there is evidence that the trucks take longer to unload when the temperature is higher?
(No)
6.4 Can a case be made that the installation of air-conditioning will improve worker efficiency? (No)
6.5 Interpret the slope of the regression equation in (6.1).
6.6 Give the expected unloading time when the temperature is 80F. (57.5 minutes)
6.7 Give and interpret the sample coefficient of determination, r2. (11.1%)
6.8 Give and interpret the Pearson’s correlation coefficient, r. (0.333)
7. The following data show the media expenditures, X (in millions of dollars) and the case sales, Y (in millions) for n = 7
major brands of soft drinks (Superbrands ’98, October 20, 1997).
8. For a company to maintain a competitive edge in the marketplace, spending on research and development (R & D) is essential.
To determine the optimum level for R & D spending and its effects on a company’s value, a simple linear regression analysis
was performed. Data collected for the largest R & D spenders were used to fit the straight-line model (SLRM)
y 0 1x ,
where:
x = R & D expenditures/sales (R/S) ratio y = Price/earnings (P/E) ratio.
The sample data for n = 20 of the companies used in the study are provided in the following table:
Company R/S Ratio P/E Ratio Company R/S Ratio P/E Ratio
x y x y
1 0.003 5.6 11 0.058 8.4
2 0.004 7.2 12 0.058 11.1
3 0.009 8.1 13 0.067 11.1
4 0.021 9.9 14 0.080 13.2
5 0.023 6.0 15 0.080 13.4
6 0.030 8.2 16 0.083 11.5
7 0.035 6.3 17 0.091 9.8
8 0.037 10.0 18 0.092 16.1
9 0.044 8.5 19 0.064 7.0
10 0.051 13.2 20 0.028 5.9
8.1 Set up the SLRM for this data set and indicate the scope of regression. (𝒚̂ = 𝟓. 𝟗𝟕𝟕 + 𝟕𝟒. 𝟎𝟔𝟖𝒙)
8.2 Estimate the expected P/E ratio of all companies with an R/S ratio of 0.070. (11.2)
8.3 Interpret the slope of regression equation in (8.1).
8.4 Test the significance of the linear relationship between R/S ratio and P/E ratio. (t = 4.482; Reject H0)
8.5 Give and interpret the following: Pearson’s r; Coefficient of determination r2 (r = 0.726; r2 = 52.7%)
9. The marketing manager of a large supermarket chain would like to determine the effects of shelf space on the sales of pet
food. A random sample of n = 12 equal-sized stores is selected with the following results:
Store Shelf Space, Weekly Sales, Store Shelf Space, Weekly Sales,
X feet Y dollars X feet Y dollars
1 5 160 7 15 230
2 5 220 8 15 270
3 5 140 9 15 280
4 10 190 10 20 260
5 10 240 11 20 290
6 10 260 12 20 310
9.1 Set up the SLRM for this data set and indicate the scope of regression. (𝒚̂ = 𝟏𝟒𝟓. 𝟎 + 𝟕. 𝟒𝒙)
9.2 Estimate the expected weekly sales of all the stores with a 12 feet of shelf space. ($233.80)
9.3 Interpret the slope of regression equation in (9.1).
9.4 Test the significance of shelf space as a predictor for the mean weekly sales. (t = 4.652; Reject H0)
9.5 Give and interpret the following: Pearson’s r; Coefficient of determination r2 (r = 0.827; r2 = 68.4%)
Page | 3