0% found this document useful (0 votes)
154 views

BS-Assignment 2 Solution

The document contains questions and solutions related to business statistics. Question 1 involves testing whether the average downtime for a sample of mini computers matches the estimated population average using a t-test. Question 2 involves comparing the prices of branded and store-branded soups using a paired t-test and confidence interval. Question 3 involves testing whether average grooming expenditures differ between males and females using an independent samples t-test. Question 4 uses ANOVA to compare job satisfaction levels across departments. Question 5 develops a regression model to predict employees using number of beds in a hospital.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
154 views

BS-Assignment 2 Solution

The document contains questions and solutions related to business statistics. Question 1 involves testing whether the average downtime for a sample of mini computers matches the estimated population average using a t-test. Question 2 involves comparing the prices of branded and store-branded soups using a paired t-test and confidence interval. Question 3 involves testing whether average grooming expenditures differ between males and females using an independent samples t-test. Question 4 uses ANOVA to compare job satisfaction levels across departments. Question 5 develops a regression model to predict employees using number of beds in a hospital.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Business Statistics: Assignment 2

Question 1. A computer manufacturer estimates that its line of mini computers has, on average, 8.4 days
of downtime per year. To test this claim, a researcher contacts seven companies that own one of these
computers and is allowed to access company computer records. It is determined that, for the sample, the
average number of downtime is 5.6, with a sample standard deviation of 1.3 days. Assuming that number
of downtime days is normally distributed, test to determine whether these minicomputers actually average
8.4 days of downtime in the entire population. Consider 1% level of significance. (2)
Solution:
 = .01 /2 = .005 n=7 df = 7 – 1 = 6 s = 1.3
H0: µ = 8.4
Ha: µ  8.4

x = 5.6 t.005,6 = + 3.707


5. 6−8 .4
1.3
t = √7 = -5.70
Since the observed t = - 5.70 < t.005,6 = -3.707, the decision is to reject the null hypothesis.

Question 2. In order to test whether the branded soups are of same price as the store-branded soups, an
analyst randomly samples eight stores. Each store sells its own brand and a branded soup brand. The
prices of a can of branded tomato soup and a can of the store-brand tomato soup are as follows:

Branded Soup Store-branded Soup


54 49
55 50
59 52
53 51
54 50
61 56
51 47
53 49

a. Develop and test the hypotheses at 10% level of significance. (1)


b. Construct a 90% confidence interval to estimate the average difference between the prices of
branded soup and store-branded soup. Assume that the difference in prices of tomato soup are
normally distributed in the population. (1)
Solution: This is a case of t-test for dependent samples or paired sample t-test. Because the data for the
prices of branded and store-branded soup are collected from same store. For example, one data point can
be prices of knorr soup and royal bb soup at big basket (Assuming Big Basket has launched its own brand
of soup).

Branded Soup Store-branded Soup d


54 49 5
55 50 5
59 52 7
53 51 2
54 50 4
61 56 5
51 47 4
53 49 4
n=8 d = 4.5 sd=1.414 df = 8 - 1 = 7

a) Mean difference (d) = 4.5

H0: μ B−μ S = 0

Ha: μ B−μ S  0

4.5−0
t Stat = =9.001
1.414/ √ 8
For a 90% Confidence Level, /2 = .05 and t.05,7 = 1.895
Since, t Stat =9.001 > t.05,7=1.895, by critical value method we reject the null hypothesis. Hence,
we can say that the prices of branded and store-branded soups are significantly different from one
another.
b) For a 90% Confidence Level, /2 = .05 and t.05,7 = 1.895
sd
d±t
√n
1.414
4.5 + 1.895 √8 = 4.5 ± .947
3.553 < D < 5.447
Hence, the average difference between the prices of branded and store-branded soups can range
from 3.553 to 5.447 at 90% confidence interval.

Question 3. Based on a recent conversation with an MD of Unisex Grooming Service chain, a statistician
intends to verify MD’s claim that present day men are matching up the expenditure on grooming and self-
care as their female counter part. In order to verify this, she decided to randomly inquire about the past
month’s grooming expenditure of some of the students in his class and ran an independent sample t-test
with unequal variances on the data collected from 5 females and 6 males. Develop and test the hypotheses
and write the interpretations from the output below. (2)
t-Test: Two-Sample Assuming Unequal Variances
  Female Male
Mean 5580 4566.667
Variance 8557000 5962667
Observations 5 6
Hypothesized Mean Difference 0
df 8
t Stat 0.616105
P(T<=t) one-tail 0.277473
t Critical one-tail 1.859548
P(T<=t) two-tail 0.554946
t Critical two-tail 2.306004  

Solution: Statistician finds it difficult to accept the claim made by the MD. Hence, she wants to test if the
average spending by men on grooming and self-care is equal to the average spending by women or not.
Therefore, the hypotheses are

H0: μ Men=μWomen
Ha: μ Men ≠ μWomen
This clearly means that it should be a two tailed test. In order to test this, she has chosen, t-test for two
population assuming unequal variances
Decision and Conclusion: From the output, it can be observed that t Stat = 0.6161 lies between t Critical,
i.e., ±2.306. Hence, through critical value method we fail to reject H 0.
Similarly, let us assume α = 5%. p-value obtained from the output = 0.5549 is greater than 0.05. Hence,
using p-value method we fail to reject H0.
Thus, observed difference between the sample means (5580 – 4566.67) is not convincing enough to say
that the average amount spent by females on grooming and self-care differ significantly from the average
amount spent by male of the same category. Hence, statistical evidences suggest that statistician should
accept MD’s claim.

Question 4. The personnel manager of a firm wants to compare the job satisfaction level of the
employees among the firm’s Finance, Purchase and Sales departments. A questionnaire was administered
to randomly selected employees from each of the three departments resulting in the following job
satisfaction level scores. Is there a significant difference in job satisfaction level among the employees
from the three different departments? (7)

Finance Purchase Sales


14 18 10
12 19 12
13 20 17
12 18 11
11 16 13
Solution: Since we want to compare the satisfaction level of employees in three departments- Finance,
Purchase and Sales. These three departments form three populations, and ANOVA should be used to
compare the mean satisfaction levels.

H0: μ Fin=μ Pur =μ Sal

Ha: Atleast one of them is not equal

Finance Purchase Sales


14 18 10
12 19 12
13 20 17
12 18 11
11 16 13
Mean 12.4 18.2 12.6

Overall mean = 14.4


Sum of Squares for Treatment (SST) or Sum of Squares Between = 108.4
= 5*(12.4-14.4)^2 + 5*(18.2-14.4)^2 + 5*(12.6-14.4)^2
=5*(-2)^2 + 5*(3.8)^2 + 5*(-1.8)^2
=5*4 + 5*14.44 + 5*3.2
=20 + 72.2 + 16.2
=108.4
Sum of Squares for Errors (SSE) or Sum of Squares Within = 43.2
= (14-12.4)^2 + (12-12.4)^2 + (13-12.4)^2 + (12-12.4)^2 + (11-12.4)^2
+ (18-18.2)^2 + (19-18.2)^2 + (20-18.2)^2 + (18-18.2)^2 + (16-18.2)^2
+ (10-12.6)^2 + (12-12.6)^2 + (17-12.6)^2 + (11-12.6)^2 + (13-12.6)^2
= 5.2 + 8.8 + 29.2
= 43.2
Degree of Freedom- Between (df Between) = 3-1 = 2

Degree of Freedom- Within (df Within ) = 15-3 =12

SST 108.4
MST = = =54.2
df Between 2
SSE 43.2
MSE= = =3.6
df Within 12
MST 54.2
F= = = 15.056
MSE 3.6
From F-table for α = 0.05, we observe the value of F-critical
F 0.05,2,12=3.89
ANOVA
Source of Variation SS df MS F F crit
Between Groups 108.4 2 54.2 15.056 3.89
Within Groups 43.2 12 3.6

Total 151.6 14      

Comparing F observed with F critical ( F 0.05,2,12), 15.056 > 3.89. Thus, we have found statistical evidence
to reject null hypothesis, which means at least one of the department has different level of job satisfaction.

Question 5. It is commonly accepted notion in the healthcare industry that the number of full time
employees (FTEs) in a hospital can be estimated by counting number of beds in the hospital (which is
common measure of the size of the hospital). A healthcare business researcher wants to develop a
regression model to predict number of FTEs. He surveyed 9 hospitals and collected following data.
No. of 23 29 29 35 42 46 50 54 64
Beds
FTEs 69 95 102 118 126 125 138 178 156

Without using any software, develop the regression equation, and compute all the desired assessment
parameters of the regression model. (9)
Solution: No. of beds is an independent variable (x), whereas number of full time employees is our
dependent variable (y) for this simple linear regression model.

No. of Beds (x) FTEs (y) ( x−x́ ) ( x− x́ )2 ( y− ý ) ( x−x́ ) ( y− ý )


23 69 -18.33 336.11 -54 990.00
29 95 -12.33 152.11 -28 345.33
29 102 -12.33 152.11 -21 259.00
35 118 -6.33 40.11 -5 31.67
42 126 0.67 0.44 3 2.00
46 125 4.67 21.78 2 9.33
50 138 8.67 75.11 15 130.00
54 178 12.67 160.44 55 696.67
64 156 22.67 513.78 33 748.00
x́ = 41.33 ý = 123 Sum = 1452 Sum= 3212.00

Cov (x,y) =
∑ ( x− x́ )( y− ý ) = 3212 = 401.5
n−1 8

∑ ( x− x́ )2 1452
Variance of x = = =181.5
n−1 8
Cov( x , y) 401.5
b 1= = =2.212
Variance of x 181.5
b 0= ý−b 1 x́ = 123 – 2.212*41.33 = 31.58
Regression Equation: y = 31.58 + 2.212*x
2.212 is the regression coefficient for independent variable, number of beds. This suggests that a unit
increase in bed will lead to 2.212 units increase in the number of full time employees.

No. of Beds (x) FTEs (y) Predicted y ( ^y ) Error (y - ^y ) Error Squared ( ^y − ý)2
31.58 + 2.212*23 69-82.456 (82.456-123)^2
23 69 181.06
= 82.456 = -13.456 =1643.82
29 95 95.728 -0.728 0.53 743.76
29 102 95.728 6.272 39.34 743.76
35 118 109 9 81 196
42 126 124.484 1.516 2.3 2.20
46 125 133.332 -8.332 69.42 106.75
50 138 142.18 -4.18 17.47 367.87
54 178 151.028 26.972 727.49 785.57
64 156 173.148 -17.148 294.05 2514.82
Sum of Square 1412.66 7104.55

SSE = 1412.66

SSE 1412.66
Standard Error of Estimate se =
√ n−2
=
√ 9−2
=√ 201.81=14.205

t-test for slope:

H0: β 1=0

Ha: β 1 ≠ 0

se 14.205
sb 1= = =0.3727
√ ( n−1 ) s 2
x
√ 8∗181.5
2.212−0
t= =5.94
0.3727
df for t-test for slope = n-2 = 9-2 =7

t-critical = t α , 7=t 0.025 ,7=2.365


2

t-observed = 5.94 > t-critical = 2.365. Hence, we reject null hypothesis, which suggests the existence of
linear relationship between no. of beds and no. of full time employee.
Overall Significance of the model:
ANOVA
Sum of Square Regression (SSR) = Sum of last column of the table in solution = 7104.55
df Reg=No . of independent variable=1
Sum of Square Error (SSE) = 1412.66
df Err =No . of observations−No. of independent variable−1=9−1−1=7

F critical = F 0.05,1,7=¿ 5.59

df SS MS F F-Critical
Regression 1 7104.55 7104.55 35.204 5.59
Residual 7 1412.66 201.81
Total 8 8517.21

F-observed = 35.204 > F-critical = 5.59


Hence, we reject the null hypothesis, which means at least one of the independent variable is a significant
predictor of dependent variable. Thus, we can say that the overall regression model is significant.
Explanatory Power of the model:
SSE 1412.66
R2=1− =1− =1−0.1658=0.834
Total 8517.21
The value of R-square suggest that about 83.4% variations in the number of full time employee of a
hospital can be explained by our regression model.

Question 6. A startup has recently hired many sales professionals. In order to understand what quality of
the sales professional is actually impacting the performance of sales person (measured as sales per week),
they collected data on IQ scores (assessed during post-recruitment training), extroversion score (assessed
during post-recruitment training), total work experience and sales per week. Below is the data.

Sales Person IQ Score Extroversion Work Ex Sales Per Week


1 89 21 22 2625
2 93 24 24 2700
3 91 21 35 3100
4 122 23 40 3150
5 115 27 38 3175
6 100 18 12 3100
7 98 19 17 2700
8 105 20 19 2475
9 112 23 41 3625
10 109 28 38 3525
11 130 20 22 3225
12 104 25 34 3450
13 98 20 22 2425
14 100 26 21 3025
15 97 28 40 3625
16 115 29 21 2750
17 113 25 32 3150
18 88 23 28 2600
19 108 19 22 2525
20 101 26 12 2650

Use excel to obtain multiple regression output. Answer the following:


(i) Write the regression equation and interpretation of partial regression coefficient. (1)
(ii) Overall significance of the model. (1)
(iii) What is the difference between R-square and Adjusted R-square? (1)
(iv) Explanatory power of the model. (1)
(v) Significant independent variables and their impact. Would you like to modify your regression
equation? (Just write the modified equation, don’t re-run the model) (2)
(vi) Conduct Durbin Watson test in the present model and explain the results. (1)
(vii) Conduct the diagnostic test for assumptions in the present model and explain the results. (3)
(viii) Determine the sales per week of a sales rep who has the IQ of 109, extroversion score of 17
and work experience of 15 months. (1)
Solution:
(i) Regression equation
Sales per week = 1212.78 + 7.14*IQ Score + 14.91*Extroversion + 24.99*Work Ex

The values of partial regression coefficients suggest positive influence of independent


variables on the sales per week (dependent variable). Keeping all the other variables constant,
a unit increased IQ score will result in 7.14-unit increase in the Sales per week. Similarly, a
unit increased extroversion will result in 14.91-unit increase in Sales per week when all the
other independent variables are constant. Lastly, a month increase in the work experience will
result in 24.99-unit increase in the sales per week when all the other IVs are held constant.

(ii) ANOVA table of the regression output will help us in determining the overall significance of
the regression model. The value of significance F (p value) is 0.0036 which is less than 0.05.
This provides enough statistical evidences to reject null hypothesis suggesting that the
regression model is significant, which means our regression model has at least one IV which
is a significant predictor of the dependent variables.

(iii) In case of multiple regression R-square may provide distorted measure of the explanatory
power of the model. However, Adjusted R square is takes into account the additional
information each new IV is bringing in explaining the variations in DV by adjusting R-square
for degrees of freedom. In case of multiple regression, adjusted R-square is the better
measure of the explanatory power of the model.

(iv) Explanatory power of the present regression model is 47.8% which suggests that only 47.8%
variations in Sales per Week is explained by our regression model.

(v) Significant independent variables- t-test for regression coefficient suggests that only variable
Work Ex is significant with p-value of 0.0051 which is less than level of significance 0.05.
Rest all the IVs are insignificant. This suggest that model should be changed to exclude IVs-
IQ score and Extroversion, and should be again checked for significance and explanatory
power.

(vi) Durbin Watson test


n

∑ ( ei−ei−1 )2 2623394
i=2
d= n
= =2.06
1271789
∑ e i2
i=1
d = 2: No auto-correlation
d < 2: Positive auto-correlation
d > 2: Negative auto-correlation
Since, d observed is close to 2, we conclude absence of autocorrelation.

(vii) Five assumptions have to be tested for multiple regression


a. Normality
b. Linearity
c. Independence: Durbin Watson is already tested and no autocorrelation (independence) is
verified
d. Homoscedasticity: Residual plots of the independent variables do not show funnel shaped
distribution of residuals, except one positive distant data-point (which can be checked for
an outlier). Hence, we can say that no heteroscedasticity is observed.

IQ Score Residual Plot Extroversion Residual Plot


800 800
600 600
Residuals
Residuals

400 400
200 200
0 0
-200 85 90 95 100105110115120125130135 -200 16 18 20 22 24 26 28 30
-400 -400
IQ Score Extroversion

Work Ex Residual Plot


800
600
Residuals

400
200
0
-200 10 15 20 25 30 35 40 45
-400
Work Ex
e. Multicollinearity: The condition when IVs are highly correlated with one another. VIF
(Variance Inflation Factor) is a widely used measure of multicollinearity. If VIF for an
independent variable is greater than 10 multicollinearity is said to be present (However,
some also consider the value of 5 as the cut-off.). For the present case, all the IVs have
VIF less than 10 (as well as 5). This suggests absence of multicollinearity.
Variable VIF
IQ Score 1.058
Extroversion 1.242
Work Ex 1.282
(viii) Sales per week of a sales rep who has the IQ of 109, extroversion score of 17 and work
experience of 15 months.
Sales per week = 1212.78 + 7.14*IQ Score + 14.91*Extroversion + 24.99*Work Ex
Sales per week = 1212.78 + 7.14*109 + 14.91*17 + 24.99*15
= 2619.36
Our regression model predicts weekly sales of 2619.36 units by a sales rep having an IQ of
109, extroversion score of 17 and work experience of 15 months.

Question 7. Below is the data of monthly orders of a manufacturing firms. Use this data to develop these
two forecasting models: 3-period moving average, exponential smoothing α = 0.4.

Month Orders
Jan. 120
Feb. 90
Mar. 100
Apr. 75
May. 110
Jun. 50
Jul. 75
Aug. 130
Sep. 110
Oct. 90
Nov.  
Which one would you prefer and Why? (7)

Solution: We have to develop two forecasting models-


(i) 3 period Moving Average
D i −1 + D i−2 + D i−3
Forecast of ith period, F i=
3

(ii) Exponential Smoothing α = 0.4


Forecast of ith period, F i=α D i−1 + ( 1−α ) F i−1

Mont Order 3 period Moving Average Exponential Smoothing α = 0.4


h s Forecast Abs Dev Sq Err Forecast Abs Dev Sq Err
Jan. 120 120
0.4*120+0.6*12 |90-120| 900
0 = 30
Feb. 90 =120
Mar. 100 108 8 64
120+90+100 |75-103.33| (75-103.33)^2 104.8 29.8 888.04
=103.33
= 28.33 = 802.78
Apr. 75 3
May. 110 88.33 21.67 469.44 92.88 17.12 293.09
Jun. 50 95 45 2025 99.728 49.73 2472.87
Jul. 75 78.33 3.33 11.11 79.84 4.84 23.39
Aug. 130 78.33 51.67 2669.44 77.90 52.10 2714.19
Sep. 110 85 25 625 98.74 11.25 126.76
Oct. 90 105 15 225 103.24 13.24 175.42
Nov.   110 97.94
TOTAL 190 6827.77 216.08 7657.76
Forecast Accuracy
(i) 3-period Moving Average
190 190
Mean Absolute Deviation (MAD) = = =27.14
No . of forecast 7
6827.77
Mean Squared Error (MSE) = =975.39
7

We have excluded the forecast for November because actual demand for November is not
known. Hence, it cannot make any contribution towards determining the forecast accuracy.

(ii) Exponential Smoothing α = 0.4


216.08 216.08
Mean Absolute Deviation (MAD) = = =24.00
No . of forecast 9
7657.76
Mean Squared Error (MSE) = =850.86
9

Lesser the value of MAD and MSE, better in the forecast. Hence, exponential smoothing with
α = 0.4 is a better forecast than 3-period moving average.

You might also like