NMIMS Global Access
School for Continuing Education (NGA-SCE)
Course: Business Statistics
Internal Assignment Applicable for December 2018 Examination
Assignment Marks: 30
Assets Expense Ratio Return 3 - Year 5 – Year
2006 Return Return
904.8 1.51 4.6 10.7 8.1
675.9 1.28 8.5 11.9 7.3
909.7 0.80 13.1 10.4 6.3
52.2 1.50 11.6 10.3 6.4
8411.5 0.63 10.9 12.4 8.0
282.3 1.22 7.1 10.2 8.0
9870.7 0.86 12.3 15.0 7.7
424.8 1.13 12.3 11.0 6.2
15422.9 0.72 14.0 10.2 6.2
497.9 1.36 8.6 12.0 7.3
547.3 1.09 7.5 12.8 7.2
5527.1 0.41 11.2 10.2 6.5
22592.9 0.46 12.3 13.0 8.4
240.8 1.42 4.4 10.3 6.6
2403.4 0.93 8.0 10.1 4.3
233.3 1.33 6.5 9.4 5.4
71.2 0.15 15.4 6.6 5.0
506.9 1.15 11.2 9.3 4.5
221.6 1.12 13.2 8.9 4.7
434.9 1.19 14.2 12.3 7.1
7834.2 0.56 13.7 9.6 5.5
152.1 1.34 12.4 9.6 4.6
815.4 0.73 13.0 8.9 4.5
85.7 0.45 13.2 9.6 4.0
166.1 1.41 3.3 7.8 5.3
47.2 0.74 8.1 10.8 5.7
6955.2 0.87 7.8 10.7 5.8
135.4 1.25 14.6 8.2 5.8
142.0 1.18 9.2 9.7 5.6
601.8 1.00 9.7 7.9 3.8
1. For the data on 30 mutual funds given above, conduct the following analysis:
i. Determine the measures of central tendency and of dispersion for the five variables.
Solution:
i. Measure of Central Tendency
Measures of central tendency are the center values of a data set.
Mean is the average of all the data.
Mode is the data value appearing most often in the data set.
Median is the middle value of the data set, arranged in ascending order.
Mean
Mean is the total sume of all values devided by the total number.
Hence, Mean = sume of total numbers / N
Mean for Assets
87167.2
¿
30
2905.57
Mean for Expense Ratio
29.79
¿
30
0.993
Mean for Return 2006
311.9
¿
30
¿ 10.396
Mean for 3 – Year Return
309.8
¿
30
10.326
Mean for 5 – Year Return
181.8
¿
30
6.06
Mode: Mode is the maximum repeated value in the data set.
Mode for Assets
Since there is no repeated value in Assets hence, there is no mode present in the given data
for assets.
Mode for Expense Ratio
Since there is no repeated value in Expense ratio too hence, there is no mode present in the
given data for Expense Ratio.
Mode for Return 2006
Here, 11.2, 12.3 and 13.2 are most repeated value so the there are 3 modes i.e. 11.2, 12.3
and 13.2 for the Return 2006.
Mode for 3 – Year Return
Here, 8.9, 9.6, 10.2, 10.3 and 10.7 are most repeated value so the there are 5 modes i.e. 8.9,
9.6, 10.2, 10.3.
Mode for 5 – Year Return
Here, 4.5, 5.8, 6.2, 7.3 and 8 are most repeated value so the there are 5 modes i.e. 4.5, 5.8,
6.2, 7.3 and 8 for the 5 - Year Return.
Median: The median is also the number that is halfway into the set. To find the median, the
data should be arranged in order from least to greatest. If there is an even number of items
in the data set, then the median is found by taking the mean (average) of the two
middlemost numbers.
Median for Assets
Hence median is
497.9+506.9
¿
2
¿ 502.4
Median for Expense Ratio
1.09+ 1.12
¿
2
¿ 1.105
Median for Return 2006
11.2+11.2
¿
2
¿ 11.2
Median for 3 – Year Return
10.2+10.2
¿
2
¿ 10.2
Median for 5 – Year Return
5.8+ 6.2
¿
2
¿6
Measure of dispersion
In statistics, measures of dispersion describe how spread apart the data is from the
measure of center. There are three main types of dispersion:
Variance - the mean of the squares of the distance each data item
Standard deviation - the square root of the variance.
Range - the difference between the highest and lowest values in the data.
Variance
To find variance we use the following formula
Variance for Assets
Mean for the given data is
2905.57
Now we will create the following table
Assets Assets - Mean square of Revenue -
Value Mean value
(2905.6)
904.8 -2000.8 4003200.64
675.9 -2229.7 4971562.09
909.7 -1995.9 3983616.81
52.2 -2853.4 8141891.56
8411.5 5505.9 30314934.8
282.3 -2623.3 6881702.89
9870.7 6965.1 48512618
424.8 -2480.8 6154368.64
15423 12517.3 156682799
497.9 -2407.7 5797019.29
547.3 -2358.3 5561578.89
5527.1 2621.5 6872262.25
22593 19687.3 387589781
240.8 -2664.8 7101159.04
2403.4 -502.2 252204.84
233.3 -2672.3 7141187.29
71.2 -2834.4 8033823.36
506.9 -2398.7 5753761.69
221.6 -2684 7203856
434.9 -2470.7 6104358.49
7834.2 4928.6 24291098
152.1 -2753.5 7581762.25
815.4 -2090.2 4368936.04
85.7 -2819.9 7951836.01
166.1 -2739.5 7504860.25
47.2 -2858.4 8170450.56
6955.2 4049.6 16399260.2
135.4 -2770.2 7674008.04
142 -2763.6 7637484.96
601.8 -2303.8 5307494.44
N =30 -0.8 813944878
813944878
var=
30−1
¿ 28067064.8
Variance for Expense Ratio
Expense Assets - Mean square of
Ratio Value Revenue - Mean
(0.993) value
1.51 0.517 0.267289
1.28 0.287 0.082369
0.8 -0.193 0.037249
1.5 0.507 0.257049
0.63 -0.363 0.131769
1.22 0.227 0.051529
0.86 -0.133 0.017689
1.13 0.137 0.018769
0.72 -0.273 0.074529
1.36 0.367 0.134689
1.09 0.097 0.009409
0.41 -0.583 0.339889
0.46 -0.533 0.284089
1.42 0.427 0.182329
0.93 -0.063 0.003969
1.33 0.337 0.113569
0.15 -0.843 0.710649
1.15 0.157 0.024649
1.12 0.127 0.016129
1.19 0.197 0.038809
0.56 -0.433 0.187489
1.34 0.347 0.120409
0.73 -0.263 0.069169
0.45 -0.543 0.294849
1.41 0.417 0.173889
0.74 -0.253 0.064009
0.87 -0.123 0.015129
1.25 0.257 0.066049
1.18 0.187 0.034969
1 0.007 4.9E-05
N =30 3.82243
3.82243
var=
30−1
¿ 0.1318
Variance for Return 2006
Retur Assets - square of
n Mean Value Revenue -
2006 (10.39) Mean value
4.6 -5.79 33.5241
8.5 -1.89 3.5721
13.1 2.71 7.3441
11.6 1.21 1.4641
10.9 0.51 0.2601
7.1 -3.29 10.8241
12.3 1.91 3.6481
12.3 1.91 3.6481
14 3.61 13.0321
8.6 -1.79 3.2041
7.5 -2.89 8.3521
11.2 0.81 0.6561
12.3 1.91 3.6481
4.4 -5.99 35.8801
8 -2.39 5.7121
6.5 -3.89 15.1321
15.4 5.01 25.1001
11.2 0.81 0.6561
13.2 2.81 7.8961
14.2 3.81 14.5161
13.7 3.31 10.9561
12.4 2.01 4.0401
13 2.61 6.8121
13.2 2.81 7.8961
3.3 -7.09 50.2681
8.1 -2.29 5.2441
7.8 -2.59 6.7081
14.6 4.21 17.7241
9.2 -1.19 1.4161
9.7 -0.69 0.4761
N =30 309.611
309.611
var=
30−1
¿ 10.67
Variance for 3 – year Return
3- Assets - square of
Year Mean Value Revenue -
Retur (10.32) Mean value
n
10.7 0.38 0.1444
11.9 1.58 2.4964
10.4 0.08 0.0064
10.3 -0.02 0.0004
12.4 2.08 4.3264
10.2 -0.12 0.0144
15 4.68 21.9024
11 0.68 0.4624
10.2 -0.12 0.0144
12 1.68 2.8224
12.8 2.48 6.1504
10.2 -0.12 0.0144
13 2.68 7.1824
10.3 -0.02 0.0004
10.1 -0.22 0.0484
9.4 -0.92 0.8464
6.6 -3.72 13.8384
9.3 -1.02 1.0404
8.9 -1.42 2.0164
12.3 1.98 3.9204
9.6 -0.72 0.5184
9.6 -0.72 0.5184
8.9 -1.42 2.0164
9.6 -0.72 0.5184
7.8 -2.52 6.3504
10.8 0.48 0.2304
10.7 0.38 0.1444
8.2 -2.12 4.4944
9.7 -0.62 0.3844
7.9 -2.42 5.8564
N =30 88.28
88.28
var=
30−1
¿ 3.04
Variance for 5 – year return
5– Assets - square of
Year Mean Value Revenue -
Retur (6.06) Mean value
n
8.1 2.04 4.1616
7.3 1.24 1.5376
6.3 0.24 0.0576
6.4 0.34 0.1156
8 1.94 3.7636
8 1.94 3.7636
7.7 1.64 2.6896
6.2 0.14 0.0196
6.2 0.14 0.0196
7.3 1.24 1.5376
7.2 1.14 1.2996
6.5 0.44 0.1936
8.4 2.34 5.4756
6.6 0.54 0.2916
4.3 -1.76 3.0976
5.4 -0.66 0.4356
5 -1.06 1.1236
4.5 -1.56 2.4336
4.7 -1.36 1.8496
7.1 1.04 1.0816
5.5 -0.56 0.3136
4.6 -1.46 2.1316
4.5 -1.56 2.4336
4 -2.06 4.2436
5.3 -0.76 0.5776
5.7 -0.36 0.1296
5.8 -0.26 0.0676
5.8 -0.26 0.0676
5.6 -0.46 0.2116
3.8 -2.26 5.1076
N =30 50.232
50.232
var=
30−1
¿ 1.73
Standard deviation: To find standard deviation we use the following formula
Or √ variance
Standard deviation for Assets
¿ √ 28067064.75
¿ 5297.835
Standard deviation for Expense ratio
¿ √ 0.13180793
¿ 0.3630
Standard deviation for Return 2006
¿ √ 10.676
¿ 3.267
Standard deviation for 3 - Year Return
¿ √ 3.04413
¿ 1.744
Standard deviation for 5 - year Return
¿ √ 1.73213
¿ 1.316
Range: The range of a set of data is the difference between the highest and lowest
values in the set. To find the range, first order the data from least to greatest. Then
subtract the smallest value from the largest value in the set.
Range for Assets
¿ 22593−47.2
¿ 22546
Range for Expense Ratio
¿ 1.51−0.15
¿ 1.3 6
Range for Return 2006
¿ 15.4−3.3
¿ 12.1
Range for 3 - Year Return
¿ 15−6.6
¿ 8.4
Range for 5 - Year Return
¿ 8.4−3.8
¿ 4.6
ii. Provide the five-number summary i.e. the minimum, 1st quartile, median, 3rd quartile
and maximum value for asset size.
Interpret the above results and comment on how the data is distributed. (10 Marks)
Solution: five-number summary i.e. the minimum, 1st quartile, median, 3rd quartile and
maximum value for asset size.
The minimum value of a data set is the least value in the set. Hence the Minimum is 47.2.
The first quartile (or lower quartile or 25th percentile) is the median of the bottom half of
the numbers. So, to find the first quartile, we need to place the numbers in value order and
find the bottom half.
47.2 52.2 71.2 85.7 135.4 142 152.1 166.1 221.6 233.3 240.8 282.3 424.8 434.
9 497.9 506.9 547.3 601.8 675.9 815.4 904.8 909.7 2403.4 5527.1 6955.2 7834
.2 8411.5 9870.7 15422.9 22592.9
So, the bottom half is
47.2 52.2 71.2 85.7 135.4 142 152.1 166.1 221.6 233.3 240.8 282.3 424.8 434.
9 497.9
The median of these numbers is 166.1.
Median: The median is also the number that is halfway into the set. To find the median, the
data should be arranged in order from least to greatest. If there is an even number of items
in the data set, then the median is found by taking the mean (average) of the two
middlemost numbers.
Hence median is
497.9+506.9
¿
2
¿ 502.4
3rd quartile
The third quartile (or upper quartile or 75th percentile) is the median of the upper half of
the numbers. So, to find the third quartile, we need to place the numbers in value order and
find the upper half.
47.2 52.2 71.2 85.7 135.4 142 152.1 166.1 221.6 233.3 240.8 282.3 424.8 434.
9 497.9 506.9 547.3 601.8 675.9 815.4 904.8 909.7 2403.4 5527.1 6955.2 7834
.2 8411.5 9870.7 15422.9 22592.9
So, the upper half is
506.9 547.3 601.8 675.9 815.4 904.8 909.7 2403.4 5527.1 6955.2 7834.2 8411.
5 9870.7 15422.9 22592.9
The median of these numbers is 2403.4.
Maximum value
The maximum value of a data set is the greatest value in the set. Hence the Maximum value
is 22593 for the Assets.
The distribution of a statistical data set (or a population) is a listing or function showing all
the possible values (or intervals) of the data and how often they occur. When a distribution
of categorical data is organized, you see the number or percentage of individuals in each
group. When a distribution of numerical data is organized, they’re often ordered from
smallest to largest, broken into reasonably sized groups (if appropriate), and then put into
graphs and charts to examine the shape, center, and amount of variability in the data. One
of the most well-known distributions is called the normal distribution, also known as the
bell-shaped curve. The normal distribution is based on numerical data that is continuous;
its possible values lie on the entire real number line. Its overall shape, when the data are
organized in graph form, is a symmetric bell-shape. In other words, most (around 68%) of
the data are centered around the mean (giving you the middle part of the bell), and as you
move farther out on either side of the mean, you find fewer and fewer values (representing
the downward sloping sides on either side of the bell).
Due to symmetry, the mean and the median lie at the same point, directly in the center of
the normal distribution. The standard deviation is measured by the distance from the mean
to the inflection point (where the curvature of the bell changes from concave up to concave
down). Because every distinct population of data has a different mean and standard
deviation, an infinite number of normal distributions exist, each with its own mean and its
own standard deviation to characterize it.
2. For the same data on mutual funds given above:
i. Is there a strong association between asset size and expense ratio?
Solution: A correlation coefficient measures the strength of the association between two
variables.
Formula to calculate correlation coefficient
Assets X Expense XY XX YY
Ratio Y
904.8 1.51 1366.248 818663.04 2.2801
675.9 1.28 865.152 456840.81 1.6384
909.7 0.8 727.76 827554.09 0.64
52.2 1.5 78.3 2724.84 2.25
8411.5 0.63 5299.245 70753332.3 0.3969
282.3 1.22 344.406 79693.29 1.4884
9870.7 0.86 8488.802 97430718.5 0.7396
424.8 1.13 480.024 180455.04 1.2769
15422.9 0.72 11104.488 237865844 0.5184
497.9 1.36 677.144 247904.41 1.8496
547.3 1.09 596.557 299537.29 1.1881
5527.1 0.41 2266.111 30548834.4 0.1681
22592.9 0.46 10392.734 510439130 0.2116
240.8 1.42 341.936 57984.64 2.0164
2403.4 0.93 2235.162 5776331.56 0.8649
233.3 1.33 310.289 54428.89 1.7689
71.2 0.15 10.68 5069.44 0.0225
506.9 1.15 582.935 256947.61 1.3225
221.6 1.12 248.192 49106.56 1.2544
434.9 1.19 517.531 189138.01 1.4161
7834.2 0.56 4387.152 61374689.6 0.3136
152.1 1.34 203.814 23134.41 1.7956
815.4 0.73 595.242 664877.16 0.5329
85.7 0.45 38.565 7344.49 0.2025
166.1 1.41 234.201 27589.21 1.9881
47.2 0.74 34.928 2227.84 0.5476
6955.2 0.87 6051.024 48374807 0.7569
135.4 1.25 169.25 18333.16 1.5625
142 1.18 167.56 20164 1.3924
601.8 1 601.8 362163.24 1
∑ X=¿ ¿8 ∑ Y =¿ ¿29. ∑ X .Y =¿ ¿ ∑ X . X =¿ ¿ ∑ Y . Y =¿ ¿
7167.2 79 59417.232 1067215570 33.4039
30∗59417.23−87167.2∗29.79
¿
√[ 30∗1067215570−87167.2∗87167.2 ]∗[ 30∗33.40−29.79∗29.79 ]
1782516.96−2596710.89
√ [ 32016467090−7598120756 ]∗[ 1002.117−887.444 ]
−814193.92
¿
√24418346335∗114.6729
−814193.92
¿
1673356.683
¿−0.486
The relationship between two variables is generally considered strong when their r
value is larger than 0.7.
Hence, there is a weak correlation between these two variables as it has negative value.
ii. Create a scatterplot diagram depicting the association between the two variables.
A scatter plot (or scatter diagram) is a two-dimensional graphical representation of a set of
data. Each x/y variable is represented on the graph as a dot or a cross. This type of chart
can be used in to visually describe relationships (correlation) between two numerical
parameters or to represent distributions.
12
10
6 Assets
Expense Ratio
4
0
0 2 4 6 8 10 12
iii. Using the regression equation, predict the 5-year return of a fund whose 3-year return
was 8%. (10 Marks)
Solution: Regression eqution Y on X is Y =a+bX
We can take the value from the above table of correlation coefficient
3 - Year Return 5 – Year XY XX YY
Return
10.7 8.1 86.67 114.49 65.61
11.9 7.3 86.87 141.61 53.29
10.4 6.3 65.52 108.16 39.69
10.3 6.4 65.92 106.09 40.96
12.4 8 99.2 153.76 64
10.2 8 81.6 104.04 64
15 7.7 115.5 225 59.29
11 6.2 68.2 121 38.44
10.2 6.2 63.24 104.04 38.44
12 7.3 87.6 144 53.29
12.8 7.2 92.16 163.84 51.84
10.2 6.5 66.3 104.04 42.25
13 8.4 109.2 169 70.56
10.3 6.6 67.98 106.09 43.56
10.1 4.3 43.43 102.01 18.49
9.4 5.4 50.76 88.36 29.16
6.6 5 33 43.56 25
9.3 4.5 41.85 86.49 20.25
8.9 4.7 41.83 79.21 22.09
12.3 7.1 87.33 151.29 50.41
9.6 5.5 52.8 92.16 30.25
9.6 4.6 44.16 92.16 21.16
8.9 4.5 40.05 79.21 20.25
9.6 4 38.4 92.16 16
7.8 5.3 41.34 60.84 28.09
10.8 5.7 61.56 116.64 32.49
10.7 5.8 62.06 114.49 33.64
8.2 5.8 47.56 67.24 33.64
9.7 5.6 54.32 94.09 31.36
7.9 3.8 30.02 62.41 14.44
∑ X=¿ ¿ ∑ Y =¿ ¿ ∑ X .Y =¿ ¿ ∑ X . X =¿ ¿ ∑ Y . Y =¿ ¿
309.8 181.8 1926.43 3287.48 1151.94
∑ X=¿ ¿ ∑ Y =¿ ¿ ∑ X .Y =¿ ¿ ∑ X . X =¿ ¿ ∑ Y . Y =¿ ¿
309.8 181.8 1926.43 3287.48 1151.94
181.8∗3287.48−309.8∗1926.43
a=
30∗3287.48−309.8∗309.8
597663.864−596808.014
¿
98624.4−95976.04
855.85
¿
2648.36
¿ 0.32
30∗1926.43−309.8∗181.8
b=
30∗3287.48−309.8∗309.8
57792.9−56321.64
¿
98624.4−95976.04
1471.26
¿
2648.36
¿ 0.55
Substitute a and b in regression equation formula
Y =0.32+ 0.55 X
Since 3-year return was 8%
X = 0.08
Y =0.32+ 0.55 X
Y =0.32+ 0.55 ( 0.08 )
¿ 0.32+0.044
¿ 0.36
3. Assume there are 400 athletes in a training camp, who are required to attend the
morning drill starting at 4 am. The attendance in morning drills is 70%, i.e. on an average,
280 athletes are present. Fifty new athletes are admitted in this batch.
a. What is the probability of attendance being at least 70% among the new athletes, thus
ensuring the overall attendance does not fall below 70%? (5 Marks)
Solution:
Total Athletes = 400
Attendance in morning drills = 70% of 400 i.e. 280
New Athletes are admitted in this batch = 50
50 new Athletes are admitted=400+50=450
New Athletes attendance=70*50/100=35
So now overall attendance=280+35=315
Now calculate 70% of overall Athletes attendance=450*70/100=315
There should be minimum 315 Athletes are present in morning drill for fulfill 70%
attendance criteria after added fifty new student.
If one batch has 50 Athletes, then 35 Athletes will be in morning drill at a time in a batch
So total 9batches required for 450 Athletes
So present Athletes = 35*9=315
b. The training coach thinks that this probability will increase, if the new batch size is 40
instead of 50 students. Is he right in assuming so? (5 Marks)
Solution:
If the Training coach thinks that this probability will increase, of the new batch size is 40
instead of 50 students
Yes he is right, if we added 40 student then 70% of attendance=70*40/100=28
So overall student present in a batches =308
So total 440 students having batch=440/40=11
So, if we add 40 students instead of 50 batch sizes would be 11 which is more than 9 so
training coach thinks right.
Hence, his assumption is correct as probability will increase if he will add 40 students
instead of 50 students.