MEASURES OF CENTRAL TENDENCY AND DISPERSION
Measure of Central Tendency
The different methods that are used to find the central position of a set of data are called
measures of central tendency or measures of locations. These measures are mean, median,
mode, geometric mean, and harmonic mean. The central tendency is also called average.
An average is the single value that represents the whole set of values. Since this value tends to
lie in the center of data therefore, we call it central tendency.
Properties of Good Average
i. It should be easy to calculate.
ii. It should be simple to understand.
iii. It should be well defined preferably by a mathematical formula.
iv. It should be based on all the observations.
v. It should not be unduly affected by extreme observations.
vi. It should be least affected by fluctuations of sampling.
Arithmetic Mean: It is defined as the total of all the observations divided by the number of
observations. For example, calculating the average monthly rainfall in a region over several
years and calculating the average score of students in a class.
𝛴𝑋
𝑋̅ = 𝐹𝑜𝑟 𝑈𝑛𝑔𝑟𝑜𝑢𝑝 𝐷𝑎𝑡𝑎
𝑛
𝛴𝑓𝑋
𝑋̅ = 𝐹𝑜𝑟 𝐺𝑟𝑜𝑢𝑝 𝐷𝑎𝑡𝑎
𝛴𝑓
Merits of Mean:
i. It is easy to calculate.
ii. It is based on all observations.
iii. It is least affected by fluctuations of sampling.
Demerits of Mean:
i. It is highly affected by extreme values.
ii. It cannot be computed if any observation is missing.
iii. It is not appropriate for highly skewed distributions.
1
iv. It is affected by changes of origin and scale.
Weighted Arithmetic Mean: It is used when all the observations in the dataset do not have
equal importance. It is denoted by 𝑋̅𝑤 . For example, computing the weighted mean of exam
scores in a class where different exams have different importances or weights.
𝛴𝑊𝑋
𝑋̅𝑤 =
𝛴𝑊
Combined Mean:
𝛴𝑛𝑋̅
𝑋̅𝑐 =
𝛴𝑛
Median: The median of the data is the middle item when the items are arranged in ascending
order or descending order. For example, determining the median age of an employee in a
company and finding the median income in a neighborhood. The median helps assess the
typical income in the area, regardless of any extremely high or low-income outliers.
𝑛+1
Median = 𝑋̃ = 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 ( ) 𝑡ℎ 𝑖𝑡𝑒𝑚 𝐹𝑜𝑟 𝑈𝑛𝐺𝑟𝑜𝑢𝑝 𝐷𝑎𝑡𝑎
2
ℎ 𝑛
Median = 𝑋̃ = 𝑙 + [ − 𝑐] 𝐹𝑜𝑟 𝐺𝑟𝑜𝑢𝑝 𝐷𝑎𝑡𝑎
𝑓 2
Where; l is the lower limit of the median class, h is the size of the class interval, f is frequency,
n is the sum of frequencies, and c is the cumulative frequency of the class preceding the median
class.
Merits of Median:
i. It is easy to understand.
ii. It is not affected by extreme values.
iii. It is an appropriate average in a highly skewed distribution.
Demerits of Median:
i. It is necessary to arrange the values in an arrangement before finding the median.
ii. The median is more likely to be affected by the fluctuation of sampling than the
arithmetic mean.
2
iii. It is not capable of further mathematical treatment. For example, if we know the
medians of two sets of data, we cannot find the combined median for both sets.
iv. It is affected by changes of origin and scale.
Quantiles: When the number of observations is quite large, the principle according to which a
distribution or an ordered data set is divided into two equal parts may be extended to any
number of divisions. Quartiles, deciles, percentiles, and other values obtained by equal
subdivision of the given set of data, are collectively called Quantiles or sometimes Fractiles.
i. There are three quartiles called first quartile (Q1), second quartile (Q2), and third
quartile (Q3). These quartiles divide the set of observations into four equal parts. The
second quartile is equal to the median, where the third quartile is also called the upper
quartile and the first quartile is also called the lower quartile.
𝑛+1 ℎ 𝑛
𝑄1 = value of ( ) th item = 𝑙 + ( − 𝑐)
4 𝑓 4
2(𝑛 + 1) ℎ 2𝑛
𝑄2 = value of ( ) th item = 𝑙 + ( − 𝑐)
4 𝑓 4
3(𝑛 + 1) ℎ 3𝑛
𝑄3 = value of ( ) th item = 𝑙 + ( − 𝑐)
4 𝑓 4
ii. The deciles are the partition values that divide the set of observations into ten equal
parts. There are nine deciles, namely D1, D2, D3, …, D9.
𝑛+1 ℎ 𝑛
𝐷1 = value of ( ) th item = 𝑙 + ( − 𝑐)
10 𝑓 10
2(𝑛 + 1) ℎ 2𝑛
𝐷2 = value of ( ) th item = 𝑙 + ( − 𝑐)
10 𝑓 10
:
:
9(𝑛 + 1) ℎ 9𝑛
𝐷9 = value of ( ) th item = 𝑙 + ( − 𝑐)
10 𝑓 10
iii. The percentiles are the points that divide the set of observations into one hundred equal
parts. These points are denoted by P1, P2, P3, …, P99. The percentiles are calculated for
a very large number of observations, like workers in factories and the population in
provinces or countries.
𝑛+1 ℎ 𝑛
𝑃1 = value of ( ) th item = 𝑙 + ( − 𝑐)
100 𝑓 100
3
2(𝑛 + 1) ℎ 2𝑛
𝑃2 = value of ( ) th item = 𝑙 + ( − 𝑐)
100 𝑓 100
:
:
99(𝑛 + 1) ℎ 99𝑛
𝑃99 = value of ( ) th item = 𝑙 + ( − 𝑐)
100 𝑓 100
Mode: Mode is an observation that occurs the maximum number of times in a set of data. For
example, identifying the mode of transportation (car, bus, bike, etc.) used by employees to
commute to work in a city, identifying the mode of payment (credit card, cash, mobile wallet)
used by customers at a retail store. The mode helps in understanding the most common choice.
(𝑓𝑚 − 𝑓1 ) (𝑓𝑚 − 𝑓1 )
Mode = 𝑙 + ×ℎ= 𝑙+ ×ℎ
(𝑓𝑚 − 𝑓1 ) + (𝑓𝑚 − 𝑓2 ) 2𝑓𝑚 − 𝑓1 − 𝑓2
Where; l is the lower limit of the modal class, fm is the maximum frequency, f1 is the frequency
of the class preceding the modal class, f2 is the frequency of the class following the modal class,
and h is the length of the class interval.
Merits of Mode:
i. It is easily located.
ii. It is not affected by the extreme values.
iii. It is a suitable average for qualitative data.
Demerits of Mode:
i. It is not based on all the observations.
ii. Sometimes a distribution may have more than one mode. In this case, mode should
not be calculated.
iii. It is affected by changes of origin and scale.
4
Measure of Dispersion
The degree to which numerical data tend to spread about an average value is called variation
or dispersion of data. Dispersion is the measure of the variation of the item. There are two types
of dispersion.
i. Absolute Measure of Dispersion: An absolute measure of dispersion is that which
measures the variation present between the observations in the unit of the variable.
It gives the answers in the same units as the units of the original observations. For
example, if the observations are in kilograms, the absolute measure of dispersion is
also in kilograms. The absolute measures of dispersion that are commonly used are:
a)- Range
b)- Quartile Deviation
c)- Mean Deviation
d)- Standard Deviation or Variance
ii. Relative Measure of Dispersion: A relative measure of dispersion is that which
measures the variation present between the observations relative to their average. It
is used to compare the variation between two or more sets of data. These measures
are free of units in which the original data is measured. For example, if the original
data is in rupees or kilometers, we do not use these units with a relative measure of
dispersion. Each absolute measure of dispersion can be converted into its relative
measure. The relative measures of dispersion that are commonly used are:
a)- Coefficient of Range
b)- Coefficient of Quartile Deviation
c)- Coefficient of Mean Deviation
d)- Coefficient of Variation
Range: It is defined as the difference between the maximum and the minimum observation of
the given dataset. If Xm denotes the maximum observation, X0 denotes the minimum
observation, then the range is defined as:
R = Xm – X0
In the case of grouped data, the range is the difference between the upper boundary of the
highest class and the lower boundary of the lowest class. It is also calculated by using the
difference between the midpoints of the highest and lowest classes.
5
Coefficient of Range: It is a relative measure of dispersion and is based on the value of range.
It is also called the range coefficient of dispersion. It is defined as:
Xm – X0
Coefficient of Range = Xm+X0
Mean Deviation: M.D., or average deviation, is defined as the mean of the absolute deviation
of observations from some suitable average, which may be the arithmetic mean, the median, or
the mode. The mean deviation is based on all the observations. A serious drawback of the mean
deviation is that it cannot be used in statistical inference.
Ungrouped Data Grouped Data
Mean Deviation from Mean ∑ |𝑋 − 𝑋̅| ∑ 𝑓|𝑋 − 𝑋̅|
M. D = M. D =
𝑛 ∑𝑓
Mean Deviation from Median ∑ |𝑋 − 𝑀𝑒𝑑𝑖𝑎𝑛| ∑ 𝑓|𝑋 − 𝑀𝑒𝑑𝑖𝑎𝑛|
M. D = M. D =
𝑛 ∑𝑓
Mean Deviation from Mode ∑ |𝑋 − 𝑀𝑜𝑑𝑒| ∑ 𝑓|𝑋 − 𝑀𝑜𝑑𝑒|
M. D = M. D =
𝑛 ∑𝑓
Coefficient of Mean Deviation: It is a relative measure of dispersion based on the mean
deviation; is called the coefficient of the mean deviation or the mean coefficient of dispersion.
It is defined as the ratio of the mean deviation to the average used in the calculation of the mean
deviation.
Coefficient of Mean Deviation from Mean = Mean Deviation from Mean
𝑀𝑒𝑎𝑛
Coefficient of Mean Deviation from Median = Mean Deviation from Median
𝑀𝑒𝑑𝑖𝑎𝑛
Coefficient of Mean Deviation from Mode = Mean Deviation from Mode
𝑀𝑜𝑑𝑒
Variance: It is defined as the mean of the squares of deviations of observations taken from the
mean of the observations.
6
Standard Deviation: It is defined as the positive square root of the mean of the squares of the
observations from their mean.
Formula
Variance Standard Deviation
∑(𝑋 − 𝑋̅)2
2
𝑆 = ∑(𝑋 − 𝑋̅)2
𝑛 𝑆=√
𝑛
∑ 𝑓(𝑋 − 𝑋̅)2
𝑆2 = ∑ 𝑓(𝑋 − 𝑋̅)2
∑𝑓 𝑆=√
∑𝑓
∑(𝑋 − 𝑋̅)2
2
𝑆 = ∑(𝑋 − 𝑋̅)2
𝑛−1 𝑆=√
𝑛−1
∑(𝑋 − 𝜇)2
𝜎2 = ∑(𝑋 − 𝜇)2
𝑁 𝜎=√
𝑁
Properties of Variance and Standard Deviation
i. The variance and the S.D. are positive quantities.
ii. The variance is expressed in the square of the units of the observations. The S.D. is
expressed in the same units as the units of observations.
iii. The variance and the S.D. are zero if all the observations have some constant value.
If “C” is constant:
Var (C) = 0 and S.D. (C) = 0.
iv. The variance and the S.D. do not change by change of origin.
Var (X+A) = Var (X) and S.D.(X+A) = S.D. (X).
Var (X-A) = Var (X) and S.D.(X-A) = S.D. (X).
v. The variance and the S.D. are affected by the change of scale.
𝑉𝑎𝑟 (𝑏𝑋) = 𝑏 2 𝑉𝑎𝑟(𝑋) 𝑎𝑛𝑑 𝑆. 𝐷. (𝑏𝑋) = |𝑏|𝑆. 𝐷. (𝑋).
𝑋 1 𝑋 1
𝑉𝑎𝑟 ( ) = 2 𝑉𝑎𝑟(𝑋) 𝑎𝑛𝑑 𝑆. 𝐷. ( ) = 𝑆. 𝐷. (𝑋).
𝑏 𝑏 𝑏 |𝑏|
vi. If X and Y are independent random variables then:
Var (X+Y) = Var (X+Y) and Var (X-Y) = Var (X)+Var(Y).
7
Coefficient of Variation (C.V.): The most important measure of dispersion of all the relative
measures of dispersion is the coefficient of variation. The word is variation, not variance. It is
defined as:
𝑆
C.V. = 𝑋̅ ∗ 100
Standard Error of the Mean: It is a statistical measure that quantifies the precision of the
sample mean (𝑋̅) as an estimate of the true population mean (μ). It tells us how much the sample
mean is expected to fluctuate around the population mean due to random sampling variation.
A smaller SEM indicates a more precise estimate of the population mean.
If the population standard deviation is known:
𝜎
𝑆. 𝐸. (𝑋̅) =
√𝑛
If the population standard deviation is unknown, use the sample standard deviation:
𝑠
𝑆. 𝐸. (𝑋̅) =
√𝑛
Question: Make a frequency distribution for the following dataset. For both group and ungroup
data, find the mean, median, mode, standard deviation, variance, range, mean deviation,
coefficient of range, mean deviation, and variation.
51 66 68 65 67 66 53 63 52 56
71 75 78 70 80 85 89 81 87 83
73 77 74 73 78 85 82 88 83 85
74 78 77 77 72 83 80 80 85 82