0% found this document useful (0 votes)
262 views21 pages

Numerical Descriptive Measures

- Numerical descriptive measures include central tendency (typical values), variation (dispersion of data), and shape/skewness (distribution pattern). - Measures of central tendency include the mean (average), median, and mode. The mean is the sum of all values divided by the total number of values and can be affected by outliers. The median is the middle value when data is arranged in order and is not affected by outliers. - Other measures include the weighted mean, geometric mean, and harmonic mean which are used in specific situations depending on the type of data. Disadvantages of the mean include being affected by outliers and not able to be used for grouped or categorical data.

Uploaded by

ANKUR ARYA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
262 views21 pages

Numerical Descriptive Measures

- Numerical descriptive measures include central tendency (typical values), variation (dispersion of data), and shape/skewness (distribution pattern). - Measures of central tendency include the mean (average), median, and mode. The mean is the sum of all values divided by the total number of values and can be affected by outliers. The median is the middle value when data is arranged in order and is not affected by outliers. - Other measures include the weighted mean, geometric mean, and harmonic mean which are used in specific situations depending on the type of data. Disadvantages of the mean include being affected by outliers and not able to be used for grouped or categorical data.

Uploaded by

ANKUR ARYA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Numerical Descriptive Measures

➢The Central Tendency is the extent to which all the data values group around a typical or
central value.
➢The variation is amount of dispersion or scattering of data points
➢The shape/skewness is the pattern of distribution of values from the lowest value to the
highest value
Measure of Central Tendency: The Mean
➢ The arithmetic mean(AM) (often just called as the mean or the average) is the most
common central tendency
➢ For a sample(population) of size n, with elements, 𝑋1 , 𝑋2 , … , 𝑋𝑛
➢ The AM denoted by 𝑋ത is given by
Read as i-th
Sigma observation

Read as
X-bar σ𝑛
𝑖=1 𝑋_𝑖 𝑋1 +𝑋2 +⋯+𝑋_𝑛

𝑋= =
𝑛 𝑛

Sample size
➢ AM=Sum of the values divided by the no of the values
Measure of Central Tendency: The Mean(continued)
➢ AM is affected by extreme values (outliers)-Disadvantage
➢ Data 1- 1, 2, 3, 4, 5
1+2+3+4+5
Mean=𝑋 = ത =3
5
➢ Data 2- 1, 2, 3, 4, 20

1+2+3+4+20
Mean=𝑋ത = =6
5

❖Mean for grouped data


𝛴𝑓𝑋
• ത
Formula for the mean for grouped data is given by 𝑋= σ
𝑛= 𝑓
• ത
Where 𝑋=Mean

• 𝛴𝑓𝑋=sum of cross products of frequency of each class with mid point X of each class
• n=total no. of observation= σ 𝑓=total frequency
➢ Solution
Lower Upper Midpoints(x) Frequency(f) fx
0 1 0.5 1 0.5
1 2 1.5 4 6
2 3 2.5 8 20
3 4 3.5 7 24.5
4 5 4.5 3 13.5
5 6 5.5 2 11
Total 25 75.5

Mean 3.02
Example of Mean

Class Interval Class Interval


Lower Upper f Lower Upper f x fx
0 50 78
0 50 78 25 1950
50 100 123
50 100 123 75 9225
100 150 187
100 150 187 125 23375
150 200 82
150 200 82 175 14350
200 250 51
200 250 51 225 11475
250 300 47
250 300 47 275 12925
300 350 13
350 400 9 300 350 13 325 4225

400 450 6 350 400 9 375 3375

450 500 4 400 450 6 425 2550


Total 600 450 500 4 475 1900

Total 600 85350

Mean 142.25
Mean using coding
• Mean=x0+w*(uf)/n
• where, x0=Value of the mid point assigned code 0
• w= width of the class interval
• u= code value
• f= frequency
• n= Total frequency
Class f code(u) u*f u1 u1*f u2 u2*f
0-8 2 -3 -6 -2 -4 -4 -8
008-16 6 -2 -12 -1 -6 -3 -18
16-24 3 -1 -3 0 0 -2 -6
24-32 5 0 0 1 5 -1 -5
32-40 2 1 2 2 4 0 0
40-48 2 2 4 3 6 1 2
Total 20 -15 5 -35

Mean 22 22 22
Weighted Mean
• A weighted mean is a kind of average. Instead of each data point contributing equally to the
final mean, some data points contribute “more weights” than others.
• To calculate the average, that takes into account the importance of each value to the overall.
• Let 𝑥1 , 𝑥2 , … , 𝑥𝑛 are the data points with weights 𝑤1 , 𝑤2 , … , 𝑤𝑛 . Then the weighted average
mean is given by
(𝑤1 𝑥1 + 𝑤2 𝑥2 + ⋯ + 𝑤𝑛 𝑥𝑛 )/𝑛

• Find average cost of labour per hour for each of the product
Grade of labour Hourly Labours Labours hrs
wage hrs Product Product 2
1
Unskilled 4 2 4
Semiskilled 6 3 3
Skilled 8 5 2
Weighted Mean

• Simple Arithmetic Mean=(4+6+8)/3=6,


using this labour cost of 1 unit of product 1 to be=6(2+3+5)=60
that for product 2=6(4+3+2)=54

• Weighted Avg cost of labour per hour for product1=(4*2+6*3+8*5)/10=6.6


• Wtd Avg cost of labour per hour for product 2=(4*4+6*3+8*2)/9=5.55
• Labour cost per unit product 1=66 and product 2=50
Geometric Mean
• Sometimes when dealing with quantities that change over a period of time. We need to
know an average rate of change. In such cases AM is inappropriate.
• The GM for quantities 𝑥1 , 𝑥2 , … , 𝑥𝑛 is given by
𝑛 𝑥1 ∗ 𝑥2 ∗ ⋯ ∗ 𝑥𝑛 =(𝑥1 ∗ 𝑥2 ∗ ⋯ ∗ 𝑥𝑛 )^(1/n)
➢Example, Rs 100 deposited in a saving account

GM==(7∗ 8 ∗ 10 ∗
Year Interest Rate(%) Return at the end of year
12 ∗18)^(1/5)=10.388
1 7 107
2 8 115.56
3 10 127.12
4 12 142.37 Return with agg rate
5 18 168
GM 10.388 163.91

➢It can not be computed for grouped data


Harmonic Mean
• The HM for quantities 𝑥1 , 𝑥2 , … , 𝑥𝑛 is given by
1 1 1 1 1 1
(( + + ⋯ + )/𝑛)^(−1) = 𝑛/( + + ⋯+ )
𝑥1 𝑥2 𝑥𝑛 𝑥1 𝑥2 𝑥𝑛
2 12
➢HM for 2 and 3 is 1 1 = = 2.4
+ 5
2 3
2𝑥1 𝑥2
➢HM for two quantities 𝑥1 , 𝑥2 is given by
𝑥1 +𝑥2
➢HM is used when data are more dispersed.

➢Can not compute in case of grouped data


Disadvantages of Mean
➢It may be affected by extreme values
➢Tedious to compute in case of large data
➢Can not compute in case of open class
➢Can not compute in case of categorical data

➢Relation

𝐴𝑀 ≥ 𝐺𝑀 ≥ 𝐻𝑀
Measure of Central Tendency: The Median
➢In an ordered array the middle number is median i.e., 50% data are above and 50% below
➢ Data 1- 1, 2, 3, 4, 5 Median=3

➢ Data 2- 1, 2, 3, 4, 20 Median=3

➢It is not affected by outliers


➢Arrange the numbers in ascending order, the middle entry is the median
𝑛+1
➢If the number of values are odd i.e. n is odd, median= 𝑡ℎ 𝑒𝑛𝑡𝑟𝑦
2
𝑛 𝑛
➢If n is even, median= Avg of( 𝑡ℎ 𝑒𝑛𝑡𝑟𝑦 + ( + 1)𝑡ℎ 𝑒𝑛𝑡𝑟𝑦)
2 2
➢Note that these are positions of median but not the median
Median for grouped data
➢The median for grouped data is given
𝑛
2
−𝑚
Median=𝐿 + ∗𝑐
𝑓

➢ L=Lower limit of the median class


➢ n=Total no of observations=𝛴𝑓
➢ m= Cumulative frequency preceding the median class
➢ f= frequency of median class
➢ c= class interval for median class
Median for grouped data
➢ Find the median for the following continuous frequency distribution

Class Frequency Class Frequency CF


0-100 5 5
100-200 8 13
0-100 5 200-300 4 17
300-400 6 23
100-200 8 400-500 2 25
500-600 11 36
200-300 4 600-700 5 41
Total 41
300-400 6
Median Position 21st positon
400-500 2
500-600 11
600-700 5

41
−17
➢ Median=300 + 2
∗ 100 = 358.3333
6
Median
➢Advantages
• Not affected by extreme values
• Can be computed in case of open class
• Can be computed in case of categorical data
➢Disadvantage
• Arraying of data is time consuming
• To estimate population parameter mean is easier
Measure of Central Tendency: The Mode
➢Value that occurs most often
• Not affected by extreme values
• Can be computed in case of categorical data or numerical data
➢There may be no mode
• There may be several modes
Mode for grouped data
𝑓1 −𝑓0
Mode=𝐿 + ∗𝑐
2𝑓1 −𝑓0 −𝑓2
where 𝐿 =lower limit of modal class
11−2
𝑓1 =frequency of the modal class Mode=500 + ∗ 100 = 560
2∗11−2−5
𝑓0 =frequency preceding the modal class
𝑓2 =frequency succeeding the modal class
c= class interval of the modal class
Class Frequency
0-100 5
100-200 8
200-300 4
300-400 6
400-500 2
500-600 11
600-700 5
Measure of Central Tendency: Which
measure to use?
• The mean is generally used unless extreme value(s) exists
• Median is used, when there is outliers in data
• In some situations it make sense to use both the mean and the median
𝛴 𝑋𝑖 −𝑋ത 3
Formulae: (Mean-Mode)/SD, (Mean-Median)/SD, ⋅
𝑛−1 𝑆 3 ⋅
Locating Extreme Outliers: Z-score
Example-

You might also like