2: Methodology:: Name: Hassan Raza Roll:20-10755 Math-107-Project Section: C
2: Methodology:: Name: Hassan Raza Roll:20-10755 Math-107-Project Section: C
Roll:20-10755
Math-107-Project
Section: C
Data Set:3
1: Introduction: The data provided to me is the data taken from 51 people and the data is the
gross and budget of different Bollywood movies released. The highest value in gross is
500.75(Baahubali) while the highest value in data budget is of 130.The data has no outliners and
is more continuous.
2: Methodology:
Finding Number of classes
No of Classes: 2k>n
2k>50
26>50
K=6
No of classes (For both the variables) = 6
Finding class interval: h= highest value – lowest value
N
For Data A: 500.75-2 For Data B: 130-6
50 50
For A Class interval= 85 For Data B Class interval=22
1, Frequency Distribution
Frequency(F) Frequency(F)
40 16
35 14
30 12
25 10
20 8
15 6
10 4
5 2
0 0
44 129 214 299 384 469 16.5 38.5 60.5 82.5 104.5 126.5
Data A Data B
Interpretation: Both the variables have totally different frequency curves or polygons. The
curve of Data A is high and then flat while the curve of Data B is in zig zag shape.
Key: The median is indicated with the blue line and the mode is indicated with red line in both
graphs.
Mean= ∑Fx
∑F
Mean= 3859/51 = 75.66
median= L + h/f n/2-c
Median class = n/2 = 51/2 = 25.5(lies in 1st row (2-86)
Median= 1.5 + 85/36 25.5-36
Median=1.5+24.78
Median = 26.28
Mode= L + fm – f1 *h
(fm-f1) + fm-f2)-
Modal class = Highest frequency class =36= 2-86
Mode= 1.5 + 36-0 * 85
(35-0) + (36-14)
Mode=1.5 + 53.68
Mode= 55.18
The data is symmetrical as the mean is higher than the median which is less than the mode. I.
e (75.66>26.28<55.18)
Mean is the best appropriate measure for the central tendency as this data is continuous and
have no outliners. So, mean poses the best measure for central tendency.
Measures of Dispersion:
1:Range: largest value – lowest value
Range=500.75-2
Range=498.75
Coefficient of Range = L – S
L+S
Coefficient of range = 500.75-2
500.75+2
Coefficient of range= 498.75
502.75
Coefficient of range = 0.992
Quartile deviation
Quartile Deviation = Q3 – Q1
2
Q3= L + h/f (3(n/4 – C)
Quartile class = 3n/4 = 3(51/4) = 38.25 (lies in 2nd class 87-171)
Q3= 86.4+ 85/14 (38.25-36)
Q3= 86.5 +6.07(38.25-36)
Q3= 100.15
Class=n/4= 12.5 lies in 1st row (2-86)
Q1= L + h/f (n/4- C)
Q1= 1.5 + 85/36(12.5-0)
Q1= 31.01
Quartile Deviation = Q3 – Q1
2
Q.D = 100.15 – 31.01
2
Q.D = 34.57
Coefficient of Quartile deviation: Q3-Q1
Q3+Q1
= 100.15-31.01
100.15+31.01
= 34.57
131.16
=0.263
Mean Deviation (About mean) = ∑F(X-mean) = 2280/51 = 44.70
∑F
By calculating the first four moments about mean it shows that the data is
positively skewed, and the results coincide with the mean, median and mode
of the data. 4792959053.4
𝑏 1 = 𝑚3 2 = 26665.14 2= 711029691.21= 0
𝑚2 3 69231.20 3 331820389639663
𝑏 2 = 𝑚4 = 10481160254 = 2.186
𝑚2 2 69231.202
Data: B
Mean= ∑Fx
∑Fs
Mean= 2535.5/51 = 49.71 Crores
Median class = n/2 = 51/2 = 25.5 (lies in 2nd row (28-49)
Median= 27.5 + 22/15 (25.5-15)
Median=42.90Crores
Mode= L + fm – f1 *h
(fm-f1) + fm-f2)
Modal class = Highest frequency class =15= 28-49
Measures of Dispersion:
Range: largest value – lowest value
Range:130-6
Range:124
Coefficient of Range = L – S
L+S
Coefficient of range = 130-6
130+6
Coefficient of range= 124
136
Coefficient of range = 0.911
Quartile Deviation = Q3 – Q1
2
Q3= L + h/f (3(n/4 – C)
Quartile class = 3n/4 = 50/4= 37.5 (lies in 4th class 72-93)
Q3= 71.5+ 22/12 (3(12.5-6)
Q3= 71.5 + 1.83(37.5 -6)
Q3= 129.145
Q1= L + h/f (n/4- C)
Q1= 5.5+ 22/15(12.5-0)
Q1= 23.833
Quartile Deviation = Q3 – Q1
2
Q.D = 129.145-23.833
2
Q.D = 52.65
Coefficient of Quartile deviation: Q3-Q1
Q3+Q1
Coefficient of Q.D = 129.145-23.833
129.145+23.833
Coefficient of Q. D= 0.829
Mean Deviation (About mean) = ∑F(X-mean) = 1735.05/51 = 34.02
∑F
Coefficient of mean deviation: = mean deviation = 34.02 = 0.6843
Mean 49.715
Standard Deviation: Σ𝑓(𝑋-Mean)2 =√ 246819.36 = 70.25
Σ𝑓 51
Coefficient of Variance = S/Mean * 100 = 70.25/49.715 * 100 = 141.31
Skewness
M1=∑F(X-mean) = -0/51 = 0
∑F
M2=∑F(X-mean)2 = 47136.15/51= 924.238
∑F
M3=∑F(X-mean)3 = 939920.33/51 = -18429.81
∑F
M4= ∑F(X-mean)4 = 111199011.81/51 = 2180372.7
∑F
By calculating the first four moments about mean it shows that the data is
positively skewed, and the results coincide with the mean, median and mode of the
data.
𝑏 1 = 𝑚3 2 = 18429.812 339657896.63= 0.43
𝑚2 3 924.2383 789498777.094
𝑚2 2 924.2382 854215.88
As 𝑏2 < 3, the distribution is said to be flat topped and the curve is platykurtic
Variable B have data closer to normal distribution as its value is closer to 4 and normal
distributing has value of 4.
Conclusion: I study the data and applied various measures such as measures of central tendency
(Mean, Median, Mode). Measures of dispersion (Quartile deviation) etc. And further the
skewness was calculated which found to be positive for set A and for Set Both the data are
related to each other and have about same results but Data A have slightly difference due to a
large value of 500.25 Crores.