0% found this document useful (0 votes)
40 views

2: Methodology:: Name: Hassan Raza Roll:20-10755 Math-107-Project Section: C

The data set contains information on 51 Bollywood movies, including their gross revenue and budget. It finds that the highest grossing movie was Baahubali at 500.75, while the highest budget was 130. It divides the data into 6 equal class intervals and constructs frequency distributions and polygons for both variables. The mean, median, and mode are calculated for the gross revenue data to assess the central tendency. The mean of 75.66 is the best measure of central tendency as the data is continuous with no outliers. Measures of dispersion like range, coefficient of range, quartile deviation, and coefficient of quartile deviation are also calculated to analyze the spread of the data.

Uploaded by

UMAIR AHMED
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views

2: Methodology:: Name: Hassan Raza Roll:20-10755 Math-107-Project Section: C

The data set contains information on 51 Bollywood movies, including their gross revenue and budget. It finds that the highest grossing movie was Baahubali at 500.75, while the highest budget was 130. It divides the data into 6 equal class intervals and constructs frequency distributions and polygons for both variables. The mean, median, and mode are calculated for the gross revenue data to assess the central tendency. The mean of 75.66 is the best measure of central tendency as the data is continuous with no outliers. Measures of dispersion like range, coefficient of range, quartile deviation, and coefficient of quartile deviation are also calculated to analyze the spread of the data.

Uploaded by

UMAIR AHMED
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Name: Hassan Raza

Roll:20-10755
Math-107-Project
Section: C
Data Set:3
1: Introduction: The data provided to me is the data taken from 51 people and the data is the
gross and budget of different Bollywood movies released. The highest value in gross is
500.75(Baahubali) while the highest value in data budget is of 130.The data has no outliners and
is more continuous.

2: Methodology:
Finding Number of classes

No of Classes: 2k>n
2k>50
26>50
K=6
No of classes (For both the variables) = 6
Finding class interval: h= highest value – lowest value
N
For Data A: 500.75-2 For Data B: 130-6
50 50
For A Class interval= 85 For Data B Class interval=22

1, Frequency Distribution

For Data A for Data B

Class Midpoint(x) Class Midpoint(x)


Intervals Frequency(F) Intervals Frequency(F)
2-86 44 36 6-27 16.5 15
87-171 129 14 28-49 38.5 15
172-256 214 0 50-71 60.5 6
257-341 299 0 72-93 82.5 12
342-426 384 0 94-115 104.5 1
427-511 469 1 116-137 126.5 2
∑F= 51 ∑F= 51
Frequency Polygons:

Frequency(F) Frequency(F)
40 16
35 14
30 12
25 10
20 8
15 6
10 4
5 2
0 0
44 129 214 299 384 469 16.5 38.5 60.5 82.5 104.5 126.5

Data A Data B

Interpretation: Both the variables have totally different frequency curves or polygons. The
curve of Data A is high and then flat while the curve of Data B is in zig zag shape.
Key: The median is indicated with the blue line and the mode is indicated with red line in both
graphs.

2: Measures of Central Tendency


Class
Frequency(F Class X- F(X- (X- Midpoints(x
Interval C. F(X-mean)2
) Fx limits mean mean) Mean)2 )
s F
1.5- 1139.7 1002.35
2--86 36 1584 36 31.66 36084.78 44
86.5 6 5
86.5- 2845.15
87-171 14 1806 50 53.34 746.76 39832 129
171.5 5
171.5
138.3 19137.9
172-256 0 0 50 - 0 0 214
4 5
256.5
256.5
223.3
257-341 0 0 50 - 0 49880 0 299
4
341.5
341.5
308.3
342-426 0 0 50 - 0 95073 0 384
4
426.5
426.5
393.3
427-511 1 469 51 - 393.34 154716 154716 469
4
511.5
51 ∑F(X- ∑F(X-
∑ ∑Fx
     
mean)   mean)2=230,632.9 ∑X=207
F=   =3859
=2280 5
Mean, median and Mode
For Data A

Mean= ∑Fx
∑F
Mean= 3859/51 = 75.66
median= L + h/f n/2-c
Median class = n/2 = 51/2 = 25.5(lies in 1st row (2-86)
Median= 1.5 + 85/36 25.5-36
Median=1.5+24.78
Median = 26.28
Mode= L + fm – f1 *h
(fm-f1) + fm-f2)-
Modal class = Highest frequency class =36= 2-86
Mode= 1.5 + 36-0 * 85
(35-0) + (36-14)
Mode=1.5 + 53.68
Mode= 55.18

The data is symmetrical as the mean is higher than the median which is less than the mode. I.
e (75.66>26.28<55.18)
Mean is the best appropriate measure for the central tendency as this data is continuous and
have no outliners. So, mean poses the best measure for central tendency.

Measures of Dispersion:
1:Range: largest value – lowest value
Range=500.75-2
Range=498.75
Coefficient of Range = L – S
L+S
Coefficient of range = 500.75-2
500.75+2
Coefficient of range= 498.75
502.75
Coefficient of range = 0.992
Quartile deviation
Quartile Deviation = Q3 – Q1
2
Q3= L + h/f (3(n/4 – C)
Quartile class = 3n/4 = 3(51/4) = 38.25 (lies in 2nd class 87-171)
Q3= 86.4+ 85/14 (38.25-36)
Q3= 86.5 +6.07(38.25-36)
Q3= 100.15
Class=n/4= 12.5 lies in 1st row (2-86)
Q1= L + h/f (n/4- C)
Q1= 1.5 + 85/36(12.5-0)
Q1= 31.01
Quartile Deviation = Q3 – Q1
2
Q.D = 100.15 – 31.01
2

Q.D = 34.57
Coefficient of Quartile deviation: Q3-Q1
Q3+Q1

= 100.15-31.01
100.15+31.01
= 34.57
131.16
=0.263
Mean Deviation (About mean) = ∑F(X-mean) = 2280/51 = 44.70
∑F

Coefficient of Mean deviation= mean deviation = 44,70 = 0.59


Mean 75.66
Standard Deviation: Σ𝑓(𝑋-Mean)2 =√230632.95 = 67.91
Σ𝑓 51
Coefficient of Variance= S/Mean * 100 = 67.91/75.66 * 100 = 89.75

𝑠𝑘=𝑚𝑒𝑎𝑛−𝑚𝑜𝑑𝑒 = 75.66 – 55.18 = 0.301


𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 67.91

The value of skewness indicates that the data is positively skewed.


Standard deviation is best measure for dispersion for the data as it gives clear view
about the skewness of data.
Moments
X - Mean F(x-mean) F (X – Mean)2 F (X – Mean)3 F(X-Mean)4

-31.66 -1139.76 36084.80 -1142444.76 36169801.10


53.34 746.76 39832.17 2124647.94 113328721.11
138.34 0 0 0 0
223.34 0 0 0 0
308.34 0 0 0 0
393.34 393.34 3454,874.35 1358940276.82 534389674456.69
∑F(X-mean) =0 ∑F(X- ∑F(X-mean)3= ∑F(X-mean)4=
mean)2=3530791.32 1359922480 534539172978.69
M1=∑F(X-mean) = 0/51 = 0
∑F
M2=∑F(X-mean)2 = 3530791.32/51 = 69231.20
∑F
M3=∑F(X-mean)3 = 1359922480/51= 26665.14
∑F
M4= ∑F(X-mean)4 =534539172978.69/51 = 10481160254
∑F

By calculating the first four moments about mean it shows that the data is
positively skewed, and the results coincide with the mean, median and mode
of the data. 4792959053.4
𝑏 1 = 𝑚3 2 = 26665.14 2= 711029691.21= 0

𝑚2 3 69231.20 3 331820389639663

𝑏 2 = 𝑚4 = 10481160254 = 2.186

𝑚2 2 69231.202

Class Class X- (X-


Midpoint(x) Frequency(F) Fx C.F F(X-Mean) F(X-M
Intervals limits Mean Mean)2
5.5-
6--27 16.5 15 247.5 15 33.21 498.15 1102.9 1
27.5
27.5-
28-49 38.5 15 577.5 30 11.215 168.225 125.77 1
49.5
49.5-
50-71 60.5 6 363 36 10.785 64.71 116.31
71.5
71.5-
72-93 82.5 12 990 48 32.785 393.42 1074.85 1
93.5
93.5-
94-115 104.5 1 104.5 49 54.785 104.5 3001.39 3
115.5
115.5-
116-137 126.5 2 253 51 76.785 506 5895.93 1
137.5
∑F(X-
∑Fx=2
∑F=   51       Mean)  
535.5 Mean
=1735.005
. As 𝑏2 < 3, the distribution is said to be flat topped and the curve is platykurtic

Data: B

Mean= ∑Fx
∑Fs
Mean= 2535.5/51 = 49.71 Crores
Median class = n/2 = 51/2 = 25.5 (lies in 2nd row (28-49)
Median= 27.5 + 22/15 (25.5-15)
Median=42.90Crores
Mode= L + fm – f1 *h
(fm-f1) + fm-f2)
Modal class = Highest frequency class =15= 28-49

Mode= 27.5+ 15-15 * 14


(35-15) + (15-6)
Mode=27.5 + 0
Mode= 27.5 Crores
The data is skewed to the left as the mean is greater than the median which is greater than the
mode. I. e (49.715>42.90>27.5).
Mean is the best appropriate measure for the central tendency as this data is continuous and
have no outliners. So, mean poses the best measure for central tendency.

Measures of Dispersion:
Range: largest value – lowest value
Range:130-6
Range:124
Coefficient of Range = L – S
L+S
Coefficient of range = 130-6
130+6
Coefficient of range= 124
136
Coefficient of range = 0.911

Quartile Deviation = Q3 – Q1
2
Q3= L + h/f (3(n/4 – C)
Quartile class = 3n/4 = 50/4= 37.5 (lies in 4th class 72-93)
Q3= 71.5+ 22/12 (3(12.5-6)
Q3= 71.5 + 1.83(37.5 -6)
Q3= 129.145
Q1= L + h/f (n/4- C)
Q1= 5.5+ 22/15(12.5-0)
Q1= 23.833
Quartile Deviation = Q3 – Q1
2
Q.D = 129.145-23.833
2
Q.D = 52.65
Coefficient of Quartile deviation: Q3-Q1
Q3+Q1
Coefficient of Q.D = 129.145-23.833
129.145+23.833
Coefficient of Q. D= 0.829
Mean Deviation (About mean) = ∑F(X-mean) = 1735.05/51 = 34.02
∑F
Coefficient of mean deviation: = mean deviation = 34.02 = 0.6843
Mean 49.715
Standard Deviation: Σ𝑓(𝑋-Mean)2 =√ 246819.36 = 70.25
Σ𝑓 51
Coefficient of Variance = S/Mean * 100 = 70.25/49.715 * 100 = 141.31

Skewness

𝑠𝑘=𝑚𝑒𝑎𝑛−𝑚𝑜𝑑𝑒 = 49.715 – 27.5 = 0.316


𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 70.25
The value of skewness indicates that the data is positively skewed
Standard deviation is best measure for dispersion for the data as it gives clear view
about the skewness of data.
Moments
X - Mean F(x-mean) F (X – Mean)2 F(X – Mean)3 F(X-Mean)4

-33.21 -498.15 16543.56 -549412 18245972.52


-11.21 -168.15 1883.2 -21110.67 236650.61
10.78 64.68 697.25 7460.57 80418.8
32.78 403.36 13222.14 433417.36 14129405.93
54.78 54.78 3000 164340 9002545
76.78 153.56 11790.33 905236 69504020
∑F(X-mean) =-0 ∑F(X-mean)2=- ∑F(X-mean)3= ∑F(X-
47136.15 939920.33 mean)4=111199011.81

M1=∑F(X-mean) = -0/51 = 0
∑F
M2=∑F(X-mean)2 = 47136.15/51= 924.238
∑F
M3=∑F(X-mean)3 = 939920.33/51 = -18429.81
∑F
M4= ∑F(X-mean)4 = 111199011.81/51 = 2180372.7
∑F

By calculating the first four moments about mean it shows that the data is
positively skewed, and the results coincide with the mean, median and mode of the
data.
𝑏 1 = 𝑚3 2 = 18429.812 339657896.63= 0.43

𝑚2 3 924.2383 789498777.094

𝑏 2 = 𝑚4 = 2180372.7= 2180372.7 = 2.55

𝑚2 2 924.2382 854215.88

As 𝑏2 < 3, the distribution is said to be flat topped and the curve is platykurtic
Variable B have data closer to normal distribution as its value is closer to 4 and normal
distributing has value of 4.
Conclusion: I study the data and applied various measures such as measures of central tendency
(Mean, Median, Mode). Measures of dispersion (Quartile deviation) etc. And further the
skewness was calculated which found to be positive for set A and for Set Both the data are
related to each other and have about same results but Data A have slightly difference due to a
large value of 500.25 Crores.

You might also like