BS-Assignment 2 Solution
BS-Assignment 2 Solution
Question 1. A computer manufacturer estimates that its line of mini computers has, on average, 8.4 days
of downtime per year. To test this claim, a researcher contacts seven companies that own one of these
computers and is allowed to access company computer records. It is determined that, for the sample, the
average number of downtime is 5.6, with a sample standard deviation of 1.3 days. Assuming that number
of downtime days is normally distributed, test to determine whether these minicomputers actually average
8.4 days of downtime in the entire population. Consider 1% level of significance. (2)
Solution:
= .01 /2 = .005 n=7 df = 7 – 1 = 6 s = 1.3
H0: µ = 8.4
Ha: µ 8.4
Question 2. In order to test whether the branded soups are of same price as the store-branded soups, an
analyst randomly samples eight stores. Each store sells its own brand and a branded soup brand. The
prices of a can of branded tomato soup and a can of the store-brand tomato soup are as follows:
H0: μ B−μ S = 0
Ha: μ B−μ S 0
4.5−0
t Stat = =9.001
1.414/ √ 8
For a 90% Confidence Level, /2 = .05 and t.05,7 = 1.895
Since, t Stat =9.001 > t.05,7=1.895, by critical value method we reject the null hypothesis. Hence,
we can say that the prices of branded and store-branded soups are significantly different from one
another.
b) For a 90% Confidence Level, /2 = .05 and t.05,7 = 1.895
sd
d±t
√n
1.414
4.5 + 1.895 √8 = 4.5 ± .947
3.553 < D < 5.447
Hence, the average difference between the prices of branded and store-branded soups can range
from 3.553 to 5.447 at 90% confidence interval.
Question 3. Based on a recent conversation with an MD of Unisex Grooming Service chain, a statistician
intends to verify MD’s claim that present day men are matching up the expenditure on grooming and self-
care as their female counter part. In order to verify this, she decided to randomly inquire about the past
month’s grooming expenditure of some of the students in his class and ran an independent sample t-test
with unequal variances on the data collected from 5 females and 6 males. Develop and test the hypotheses
and write the interpretations from the output below. (2)
t-Test: Two-Sample Assuming Unequal Variances
Female Male
Mean 5580 4566.667
Variance 8557000 5962667
Observations 5 6
Hypothesized Mean Difference 0
df 8
t Stat 0.616105
P(T<=t) one-tail 0.277473
t Critical one-tail 1.859548
P(T<=t) two-tail 0.554946
t Critical two-tail 2.306004
Solution: Statistician finds it difficult to accept the claim made by the MD. Hence, she wants to test if the
average spending by men on grooming and self-care is equal to the average spending by women or not.
Therefore, the hypotheses are
H0: μ Men=μWomen
Ha: μ Men ≠ μWomen
This clearly means that it should be a two tailed test. In order to test this, she has chosen, t-test for two
population assuming unequal variances
Decision and Conclusion: From the output, it can be observed that t Stat = 0.6161 lies between t Critical,
i.e., ±2.306. Hence, through critical value method we fail to reject H 0.
Similarly, let us assume α = 5%. p-value obtained from the output = 0.5549 is greater than 0.05. Hence,
using p-value method we fail to reject H0.
Thus, observed difference between the sample means (5580 – 4566.67) is not convincing enough to say
that the average amount spent by females on grooming and self-care differ significantly from the average
amount spent by male of the same category. Hence, statistical evidences suggest that statistician should
accept MD’s claim.
Question 4. The personnel manager of a firm wants to compare the job satisfaction level of the
employees among the firm’s Finance, Purchase and Sales departments. A questionnaire was administered
to randomly selected employees from each of the three departments resulting in the following job
satisfaction level scores. Is there a significant difference in job satisfaction level among the employees
from the three different departments? (7)
SST 108.4
MST = = =54.2
df Between 2
SSE 43.2
MSE= = =3.6
df Within 12
MST 54.2
F= = = 15.056
MSE 3.6
From F-table for α = 0.05, we observe the value of F-critical
F 0.05,2,12=3.89
ANOVA
Source of Variation SS df MS F F crit
Between Groups 108.4 2 54.2 15.056 3.89
Within Groups 43.2 12 3.6
Total 151.6 14
Comparing F observed with F critical ( F 0.05,2,12), 15.056 > 3.89. Thus, we have found statistical evidence
to reject null hypothesis, which means at least one of the department has different level of job satisfaction.
Question 5. It is commonly accepted notion in the healthcare industry that the number of full time
employees (FTEs) in a hospital can be estimated by counting number of beds in the hospital (which is
common measure of the size of the hospital). A healthcare business researcher wants to develop a
regression model to predict number of FTEs. He surveyed 9 hospitals and collected following data.
No. of 23 29 29 35 42 46 50 54 64
Beds
FTEs 69 95 102 118 126 125 138 178 156
Without using any software, develop the regression equation, and compute all the desired assessment
parameters of the regression model. (9)
Solution: No. of beds is an independent variable (x), whereas number of full time employees is our
dependent variable (y) for this simple linear regression model.
Cov (x,y) =
∑ ( x− x́ )( y− ý ) = 3212 = 401.5
n−1 8
∑ ( x− x́ )2 1452
Variance of x = = =181.5
n−1 8
Cov( x , y) 401.5
b 1= = =2.212
Variance of x 181.5
b 0= ý−b 1 x́ = 123 – 2.212*41.33 = 31.58
Regression Equation: y = 31.58 + 2.212*x
2.212 is the regression coefficient for independent variable, number of beds. This suggests that a unit
increase in bed will lead to 2.212 units increase in the number of full time employees.
No. of Beds (x) FTEs (y) Predicted y ( ^y ) Error (y - ^y ) Error Squared ( ^y − ý)2
31.58 + 2.212*23 69-82.456 (82.456-123)^2
23 69 181.06
= 82.456 = -13.456 =1643.82
29 95 95.728 -0.728 0.53 743.76
29 102 95.728 6.272 39.34 743.76
35 118 109 9 81 196
42 126 124.484 1.516 2.3 2.20
46 125 133.332 -8.332 69.42 106.75
50 138 142.18 -4.18 17.47 367.87
54 178 151.028 26.972 727.49 785.57
64 156 173.148 -17.148 294.05 2514.82
Sum of Square 1412.66 7104.55
SSE = 1412.66
SSE 1412.66
Standard Error of Estimate se =
√ n−2
=
√ 9−2
=√ 201.81=14.205
H0: β 1=0
Ha: β 1 ≠ 0
se 14.205
sb 1= = =0.3727
√ ( n−1 ) s 2
x
√ 8∗181.5
2.212−0
t= =5.94
0.3727
df for t-test for slope = n-2 = 9-2 =7
t-observed = 5.94 > t-critical = 2.365. Hence, we reject null hypothesis, which suggests the existence of
linear relationship between no. of beds and no. of full time employee.
Overall Significance of the model:
ANOVA
Sum of Square Regression (SSR) = Sum of last column of the table in solution = 7104.55
df Reg=No . of independent variable=1
Sum of Square Error (SSE) = 1412.66
df Err =No . of observations−No. of independent variable−1=9−1−1=7
df SS MS F F-Critical
Regression 1 7104.55 7104.55 35.204 5.59
Residual 7 1412.66 201.81
Total 8 8517.21
Question 6. A startup has recently hired many sales professionals. In order to understand what quality of
the sales professional is actually impacting the performance of sales person (measured as sales per week),
they collected data on IQ scores (assessed during post-recruitment training), extroversion score (assessed
during post-recruitment training), total work experience and sales per week. Below is the data.
(ii) ANOVA table of the regression output will help us in determining the overall significance of
the regression model. The value of significance F (p value) is 0.0036 which is less than 0.05.
This provides enough statistical evidences to reject null hypothesis suggesting that the
regression model is significant, which means our regression model has at least one IV which
is a significant predictor of the dependent variables.
(iii) In case of multiple regression R-square may provide distorted measure of the explanatory
power of the model. However, Adjusted R square is takes into account the additional
information each new IV is bringing in explaining the variations in DV by adjusting R-square
for degrees of freedom. In case of multiple regression, adjusted R-square is the better
measure of the explanatory power of the model.
(iv) Explanatory power of the present regression model is 47.8% which suggests that only 47.8%
variations in Sales per Week is explained by our regression model.
(v) Significant independent variables- t-test for regression coefficient suggests that only variable
Work Ex is significant with p-value of 0.0051 which is less than level of significance 0.05.
Rest all the IVs are insignificant. This suggest that model should be changed to exclude IVs-
IQ score and Extroversion, and should be again checked for significance and explanatory
power.
∑ ( ei−ei−1 )2 2623394
i=2
d= n
= =2.06
1271789
∑ e i2
i=1
d = 2: No auto-correlation
d < 2: Positive auto-correlation
d > 2: Negative auto-correlation
Since, d observed is close to 2, we conclude absence of autocorrelation.
400 400
200 200
0 0
-200 85 90 95 100105110115120125130135 -200 16 18 20 22 24 26 28 30
-400 -400
IQ Score Extroversion
400
200
0
-200 10 15 20 25 30 35 40 45
-400
Work Ex
e. Multicollinearity: The condition when IVs are highly correlated with one another. VIF
(Variance Inflation Factor) is a widely used measure of multicollinearity. If VIF for an
independent variable is greater than 10 multicollinearity is said to be present (However,
some also consider the value of 5 as the cut-off.). For the present case, all the IVs have
VIF less than 10 (as well as 5). This suggests absence of multicollinearity.
Variable VIF
IQ Score 1.058
Extroversion 1.242
Work Ex 1.282
(viii) Sales per week of a sales rep who has the IQ of 109, extroversion score of 17 and work
experience of 15 months.
Sales per week = 1212.78 + 7.14*IQ Score + 14.91*Extroversion + 24.99*Work Ex
Sales per week = 1212.78 + 7.14*109 + 14.91*17 + 24.99*15
= 2619.36
Our regression model predicts weekly sales of 2619.36 units by a sales rep having an IQ of
109, extroversion score of 17 and work experience of 15 months.
Question 7. Below is the data of monthly orders of a manufacturing firms. Use this data to develop these
two forecasting models: 3-period moving average, exponential smoothing α = 0.4.
Month Orders
Jan. 120
Feb. 90
Mar. 100
Apr. 75
May. 110
Jun. 50
Jul. 75
Aug. 130
Sep. 110
Oct. 90
Nov.
Which one would you prefer and Why? (7)
We have excluded the forecast for November because actual demand for November is not
known. Hence, it cannot make any contribution towards determining the forecast accuracy.
Lesser the value of MAD and MSE, better in the forecast. Hence, exponential smoothing with
α = 0.4 is a better forecast than 3-period moving average.