Numerical Descriptive Measures
Numerical Descriptive Measures
➢The Central Tendency is the extent to which all the data values group around a typical or
central value.
➢The variation is amount of dispersion or scattering of data points
➢The shape/skewness is the pattern of distribution of values from the lowest value to the
highest value
Measure of Central Tendency: The Mean
➢ The arithmetic mean(AM) (often just called as the mean or the average) is the most
common central tendency
➢ For a sample(population) of size n, with elements, 𝑋1 , 𝑋2 , … , 𝑋𝑛
➢ The AM denoted by 𝑋ത is given by
Read as i-th
Sigma observation
Read as
X-bar σ𝑛
𝑖=1 𝑋_𝑖 𝑋1 +𝑋2 +⋯+𝑋_𝑛
ത
𝑋= =
𝑛 𝑛
Sample size
➢ AM=Sum of the values divided by the no of the values
Measure of Central Tendency: The Mean(continued)
➢ AM is affected by extreme values (outliers)-Disadvantage
➢ Data 1- 1, 2, 3, 4, 5
1+2+3+4+5
Mean=𝑋 = ത =3
5
➢ Data 2- 1, 2, 3, 4, 20
1+2+3+4+20
Mean=𝑋ത = =6
5
• 𝛴𝑓𝑋=sum of cross products of frequency of each class with mid point X of each class
• n=total no. of observation= σ 𝑓=total frequency
➢ Solution
Lower Upper Midpoints(x) Frequency(f) fx
0 1 0.5 1 0.5
1 2 1.5 4 6
2 3 2.5 8 20
3 4 3.5 7 24.5
4 5 4.5 3 13.5
5 6 5.5 2 11
Total 25 75.5
Mean 3.02
Example of Mean
Mean 142.25
Mean using coding
• Mean=x0+w*(uf)/n
• where, x0=Value of the mid point assigned code 0
• w= width of the class interval
• u= code value
• f= frequency
• n= Total frequency
Class f code(u) u*f u1 u1*f u2 u2*f
0-8 2 -3 -6 -2 -4 -4 -8
008-16 6 -2 -12 -1 -6 -3 -18
16-24 3 -1 -3 0 0 -2 -6
24-32 5 0 0 1 5 -1 -5
32-40 2 1 2 2 4 0 0
40-48 2 2 4 3 6 1 2
Total 20 -15 5 -35
Mean 22 22 22
Weighted Mean
• A weighted mean is a kind of average. Instead of each data point contributing equally to the
final mean, some data points contribute “more weights” than others.
• To calculate the average, that takes into account the importance of each value to the overall.
• Let 𝑥1 , 𝑥2 , … , 𝑥𝑛 are the data points with weights 𝑤1 , 𝑤2 , … , 𝑤𝑛 . Then the weighted average
mean is given by
(𝑤1 𝑥1 + 𝑤2 𝑥2 + ⋯ + 𝑤𝑛 𝑥𝑛 )/𝑛
• Find average cost of labour per hour for each of the product
Grade of labour Hourly Labours Labours hrs
wage hrs Product Product 2
1
Unskilled 4 2 4
Semiskilled 6 3 3
Skilled 8 5 2
Weighted Mean
GM==(7∗ 8 ∗ 10 ∗
Year Interest Rate(%) Return at the end of year
12 ∗18)^(1/5)=10.388
1 7 107
2 8 115.56
3 10 127.12
4 12 142.37 Return with agg rate
5 18 168
GM 10.388 163.91
➢Relation
𝐴𝑀 ≥ 𝐺𝑀 ≥ 𝐻𝑀
Measure of Central Tendency: The Median
➢In an ordered array the middle number is median i.e., 50% data are above and 50% below
➢ Data 1- 1, 2, 3, 4, 5 Median=3
➢ Data 2- 1, 2, 3, 4, 20 Median=3
41
−17
➢ Median=300 + 2
∗ 100 = 358.3333
6
Median
➢Advantages
• Not affected by extreme values
• Can be computed in case of open class
• Can be computed in case of categorical data
➢Disadvantage
• Arraying of data is time consuming
• To estimate population parameter mean is easier
Measure of Central Tendency: The Mode
➢Value that occurs most often
• Not affected by extreme values
• Can be computed in case of categorical data or numerical data
➢There may be no mode
• There may be several modes
Mode for grouped data
𝑓1 −𝑓0
Mode=𝐿 + ∗𝑐
2𝑓1 −𝑓0 −𝑓2
where 𝐿 =lower limit of modal class
11−2
𝑓1 =frequency of the modal class Mode=500 + ∗ 100 = 560
2∗11−2−5
𝑓0 =frequency preceding the modal class
𝑓2 =frequency succeeding the modal class
c= class interval of the modal class
Class Frequency
0-100 5
100-200 8
200-300 4
300-400 6
400-500 2
500-600 11
600-700 5
Measure of Central Tendency: Which
measure to use?
• The mean is generally used unless extreme value(s) exists
• Median is used, when there is outliers in data
• In some situations it make sense to use both the mean and the median
𝛴 𝑋𝑖 −𝑋ത 3
Formulae: (Mean-Mode)/SD, (Mean-Median)/SD, ⋅
𝑛−1 𝑆 3 ⋅
Locating Extreme Outliers: Z-score
Example-