L1 - Central Tendency - Corrected
L1 - Central Tendency - Corrected
We've learned how to condense and summarize data into frequency distribution tables
and represent it using various diagrams and graphs. While these visual aids are powerful
for presenting statistical data, they have limitations when it comes to in-depth data
analysis.
Next, we'll explore arithmetic procedures for analyzing and interpreting data. These
procedures revolve around the characteristics of data, such as measures of (i) location or
central tendency, (ii) dispersion, (iii) skewness, and (iv) kurtosis.
In this chapter, we'll focus on measuring the first characteristic: central tendency.
Central Tendency:
Central tendency refers to the tendency of data points in a dataset to cluster around a
central value. It provides a single value that best represents the entire dataset. The three
most commonly used measures of central tendency are:
(i) Mean: The average value of a set of numbers, obtained by dividing the sum of all values
by the total count of values.
(ii) Median: The middle value when the data points are arranged in ascending or
descending order.
(iii) Mode: The value that appears most frequently in the dataset.
(i) Arithmetic mean: The sum of all values in the dataset divided by the number of values.
(ii) Geometric mean: The nth root of the product of all values in the dataset, where n is the
number of values.
(iii) Harmonic mean: The reciprocal of the arithmetic mean of the reciprocals of all values
in the dataset.
It's important to note that all these measures have the same unit as that of the variable
being measured. For instance, if the variable "height" is measured in centimeters, then the
mean, median, or mode will also be in centimeters.
Q2. What are the characteristics of a good measure of location or central tendency?
According Yule and Kendall, a good measure of central tendency should have the
following characteristics:
iii) Rigorous Definition: The measure should have a precise and unambiguous definition
to ensure consistency in its application and interpretation.
iv) Inclusivity: It should take into account all observations in the dataset to provide a
representative summary of the data.
v) Algebraic Treatment: The measure should lend itself well to further mathematical
analysis and manipulation, facilitating its use in statistical calculations and modeling.
vi) Sampling Stability: It should remain relatively stable across different samples drawn
from the same population, providing consistent results even with varying data subsets.
vii) Robustness to Extreme Values: The measure should not be unduly influenced by
outliers or extreme values in the dataset, as they may skew the central tendency estimate.
Arithmetic Mean for ungrouped data: If x1, x2, ....., xn are n observations of a sample,
then the sample mean or sample arithmetic mean is a statistic defined by
xi
x = .
n
Here n is the sample size. A statistic is usually represented by ordinary letters of the
English alphabet. A sample mean is denoted by x .
Arithmetic Mean for grouped data: Suppose x1, x2, ........., xk are the k mid-points of k
classes with their corresponding frequencies f1, f2, ........., fk, then the arithmetic mean is
defined as:
Σ fi x i Σ fi x i
x= = , where n = fi .
Σ fi n
Here mid-point of each class is taken as the representative value of the class.
Q4. The data relate to the sales in lakh Tk. of a large engineering company for the last
20 days: 22, 32, 7, 46, 27, 30, 11, 42, 16, 5, 26, 9, 31, 8, 23, 36, 12, 28, 14, 8 .
i) Construct a frequency table and calculate the average sales per day from the
frequency distribution.
ii) Calculate the average sales per day using ungrouped raw data and observe the
difference, if any.
Solution. (i)
Range (R) = 41
1–11 5
11–21 4
21–31 6
31–41 3
41–51 2
Total 20
Calculation table for the average sales
Class Mid-Point Frequency x i fi
interval (xi ) ( fi )
1 – 11 5.5 5 27.5
11 – 21 15.5 4 62.0
21 – 31 25.5 6 153.0
31 – 41 35.5 3 106.5
41 – 51 45.5 2 91.0
n = 20 fi x i = 440.0
fi x i 440.0
x= = = 22.
n 20
Thus, the average sales in the two cases are close but not exactly the same. This
discrepancy occurs because the grouped data involves approximation and estimation.
Q5. The following data refers to the marks obtained by a number of students in a class:
Number of students 3 7 ? 20 8 5
The average marks obtained by the students is 15.38. Find the number of students in
the class interval 13-15.
Solution:
Let f1 be the number of students in the class interval 13-15.
Computation of Missing Frequency
Mid-Point Frequency fx
(x) (f)
10 3 30
12 7 84
14 f1 14f1
16 20 320
18 8 144
20 5 100
fx
Here, x =
n
678 + 14 f1
or, 15.38 =
43 + f1
16.66
or, f1 = = 12.07 ~ 12 . Since frequency cannot take any fractional value.
1.38
Q6. Show that AM depends on the change of origin and scale of measurement.
Statement: Arithmetic mean depends on the shift of origin and change of scale.
Proof. Let x1, x2, ............, xn be n values of a variable x.
x
By definition, x= .
n
x ku nA
= +
n n n
Hence, x = ku + A .
Here it is seen that arithmetic mean x depends on both k and A. This means arithmetic
mean depends on the shift of origin and change of scale.
Remarks: This formula is used for finding arithmetic mean by short cut method.
Q7. Show with a numeric example that, the algebraic sum of the deviations of all the
observations about the arithmetic mean is always zero.
Statement: The algebraic sum of the deviations of all the observations about the
arithmetic mean is always zero. Symbolically, if x1, x2, ..........., xn are n observations of a
set of data and if x is the arithmetic mean, then (xi − x ) = 0.
X (x – x )
3 3–5 = –2
4 4–5 = –1
5 5–5 = 0
6 6–5 = 1
7 7–5 = 2
Σx = 25 ( x − x ) = 0
This property of arithmetic mean enables one to check its accuracy. If this property holds,
the computation of arithmetic mean is considered to be accurate, otherwise inaccurate.
This implies that arithmetic mean is amenable for further algebraic treatment.
Q8. Show with a numeric example that, the sum of the squared deviations of all the
observations from the arithmetic mean is the minimum
Statement: The sum of the squared deviations of all the observations from the arithmetic
mean is minimum. Symbolically, ( x − x )2 ( x − a)2 , where a is any arbitrary value other
than x .
Proof: Let us consider the five values of a variable are 4, 6, 8, 7, and 10. We have to show
that sum of square of deviation from the arithmetic mean is minimum.
5 + 6 + 8 + 7 + 9 35
The arithmetic mean of the five values is x= = = 7 .Let us take two more
5 5
values 6 and 8 for finding the deviations. The square deviations of the observations about
7,6 and 8 are computed and presented in the following table:
(x − 7 ) (x − 6) (x − 8)
2 2 2
It is easily seen that, .Hence the proof.
Q9. Describe the concept of Combined AM. Suppose on a farm, labor is classified into
three groups: adult males, adult females, and children. The average daily wage for
adult males is Tk. 200, for adult females it is Tk. 150, and for children it is Tk. 100.
Given the presence of 25 adult male workers, 50 adult female workers, and 15 child
workers on the farm, calculate the average daily wage for all workers. Solution:
Combined AM:
If x1 and x2 are two arithmetic means of two related sets of observations, and n 1 and n2
are the corresponding number of observations, then the combined arithmetic mean of
the two sets of observations is
n 1 x1 + n 2 x 2
x=
n1 + n2
If there are k such groups with means x1 , x2 , ……. , x k with number of observations n1,
n2, …….., nk respectively, then, the combined mean of the k groups is
n 1 x1 + n 2 x2 + ......+ n k xk n k xk
x= = .
n 1 + n 2 + ...... + n k nk
Solution:
Weighted mean: Suppose x1, x2, ........., xk are k values of a variable x whose relative
importance are measured by the weights w1, w2, ........., wk respectively, then the weighted
arithmetic mean is computed by the following formula:
wi xi
xw = .
wi
Here the numbers of skilled, Semi-skilled, and trainees are different. The appropriate
mean is the weighted mean. It is calculated as follows:
Worker Wage per day No. of workers wx
(x) (w)
skilled 500 20 10000
60 21000
wi xi 21000
xw =
wi
= 60
=Tk. 350 per day.
Practice (HW): Consider the grade point average (GPA) of a student for five courses be [3.41,
3.67, 3.55, 3.75, 3.77] with credits for each course [2, 2, 3, 3, 1]. Calculate the student's
Cumulative Grade Point Average (CGPA) using the appropriate mean.
Q11. Explain the concept of geometric mean and describe the situations where we need
to apply geometric mean. Find the appropriate mean from the following set of
observations:
Answer:
In agricultural engineering, the concept of geometric mean finds application in situations
where rates of change over time need to be analyzed. This measure is particularly useful
when dealing with percentage changes, growth rates, or other variables that exhibit
multiplicative relationships.
G.M. = n x 1 x 2 ....... x n .
𝟏⁄ 𝟏⁄ 𝟏
G𝑴 = 𝒏√𝒙𝟏 𝒙𝟐 … 𝒙𝒏 = (𝒙𝟏 𝒙𝟐 … 𝒙𝒏 ) 𝒏 = [∏𝒏𝒊=𝟏 𝒙𝒊 ] 𝒏 = 𝑨𝒏𝒕𝒊 𝒍𝒐𝒈 [𝒏 ∑ log 𝑥𝑖 ]
That is, for two positive observations, we take the square root of their product as a
geometric mean. If there are three observations, we take the cube root, and so on.
The geometric mean is necessarily zero if any value is zero, and may become imaginary
if odd numbers of negative values occur.
Here, n = 6 .
log x 5.4697
G.M . = Anti − log = Anti − log = Anti − log( 0.9116 ) = 8.159 .
n 6
Q12. Explain the concept of harmonic mean and describe the situations where we need
to apply harmonic mean. In an engineering project, a team of 4 workers has been
assigned to complete an order of 1,400 units of Sprayer. The productivity rates of the
four workers are given below:
Workers Productive rates
A 4 minutes per sprayer
B 6 minutes per sprayer
C 10 minutes per sprayer
D 15 minutes per sprayer
Calculate the average productivity rate of the workers per unit if the same amount of
time is assigned to each worker.
Answer:
Harmonic Mean:
Unlike some other measures of central tendency, such as the arithmetic mean or median,
the harmonic mean is less commonly utilized in analyzing agricultural engineering data.
The harmonic mean serves as a valuable tool when analyzing rates or ratios of various
components that contribute to overall efficiency or performance. Its application is
particularly relevant in scenarios where different factors play a role in determining a
combined rate or efficiency, such as speed, flow rates, or utilization rates. Additionally,
it can be employed to calculate the average price at which agricultural products have
been sold over a certain period.
One significant example of the harmonic mean's application in agricultural engineering
is in the assessment of irrigation systems. When evaluating the efficiency of irrigation
systems, factors like flow rates or water use efficiency play crucial roles. By employing
the harmonic mean, agricultural engineers can effectively compute the combined
efficiency or performance of these systems, taking into account the contributions of
various components.
Definition. Harmonic mean is defined as the reciprocal of the arithmetic mean of the
reciprocal of the individual observations. Suppose x1 , x2 ......., xn are n non-zero
observations of a data set, then it is computed by the formula:
n n
H.M. = = ; for ungrouped data.
1 1 1 1
+ + ........ +
x1 x 2 xn x
1 1 1
1
+ + ........ +
1 x1 x 2 xn x
or , = =
H.M n n
n
For grouped data, H.M. = .
f
1
x
Here x is the value of the variable in case of discrete data or mid-pint in case of continuous
data and f’s are frequencies of x’s. It is to be noted that the values of x must be non—zero
in computing harmonic mean.
That means, harmonic mean is the reciprocal of the arithmetic mean of the reciprocal of
the individuals’ observations.
In actual practice, the harmonic mean is most frequently used in averaging speeds for
various distances covered where the distances remain constant, and also in finding the
average cost of some commodity, such as mutual funds, when several different purchases
are made by investing the same amount of money each time.
Solution of the problem:
Here harmonic mean is the appropriate average. Hence,
4 60 48
H.M. = 1 1 41 1
= = =6
6
minutes per sprayer.
+ + + 35 7 7
4 6 10 15
Here, A.M. = Arithmetic mean, G.M. = Geometric mean and H.M. = Harmonic mean.
Proof. Suppose x 1 and x 2 are two positive and non-zero quantities. Then,
x1 + x 2 2
A.M. = , G.M. = x1 x2 and H.M. =
2 1 1
+
x1 x 2
Here, (x 1 − x2 ) 2
0 ; since x1 and x2 are positive.
or , x1 + x2 − 2 x1x2 0
or, x1 + x 2 2 x1 x 2
x1 + x 2
or, x1 x 2
2
Hence, A.M G.M ... (i)
2
1 1
Again, − 0
x x 2
1
or, 1 + 1 − 2 1
0
x1 x 2 x1 x 2
1 1 1
or, + 2
x1 x 2 x1 x 2
2
or, x1 x 2
1 1
+
x1 x 2
Q14. For two non-zero and positive quantities show that, G.M. = A.M. H.M.
Q15. Define Median and mode. How you calculate them for ungrouped raw data
Median:
The median is the middle value in a dataset when the values are arranged in ascending
or descending order.
If there is an odd number of values, the median is simply the middle value.
𝒏 + 𝟏 𝒕𝒉
𝐌𝐞 = [ ] 𝑽𝒂𝒍𝒖𝒆
𝟐
In the dataset {3, 5, 6, 7, 10}, the median is 6
If there is an even number of values, the median is the average of the two middle values.
𝒏 𝒕𝒉 𝒏 𝒕𝒉
(𝟐 ) + (𝟐 + 𝟏)
𝐌𝐞 = 𝒗𝒂𝒍𝒖𝒆
𝟐
In the dataset {2, 4, 6, 8}, the median is (4 + 6) / 2 = 5
Practice:
Find median from the below data
(i) 2,-1,0,3,-3
(ii) 5,3,6,2
(iii) -1,-4,-3,3,1
Mode: The mode is the value that appears most frequently in a dataset. A dataset can
have one mode (unimodal), two modes (bimodal), or more than two modes (multimodal).
If no value repeats, the dataset is considered to have no mode.
Example:
Q16. Describe the process of finding the median and mode for grouped data and how
to locate these graphically.
n / 2 −F
Me = L + c.
f
i. Draw two ogives- one by ‘less than’ method and the other by ‘more than’ method.
Draw a perpendicular on the x-axis from the point where the two curves intersect
each other. The point where this perpendicular touches the x-axis gives the value
of the median.
ii. Draw one ogive usually by ‘less than’ method. Plot the upper limit of the variable
on the x-axis and the cumulative frequency on the y-axis. Locate a point by n/2 on
the y-axis and from this point draw a horizontal line parallel to the x-axis on the
cumulative frequency curve. Draw a perpendicular on the x-axis from the point
where it meets the ogive. The point at which the perpendicular cuts the x-axis is
the median.
Locating Mode graphically.
First draw a histogram of the frequency distribution,
i. Then locate modal class and the rectangle over this class by inspecting highest
frequency,
ii. Then draw two lines diagonally on the inside of the modal class rectangle, starting
from each upper corner of the rectangle to the upper corner of the adjacent
rectangle, Draw a perpendicular line from the intersection of the two diagonal
lines to the X-axis, which gives us modal value
Q17. The following frequency distribution refers to the number of hours studied per
month by 50 students of Agricultural engineering:
Hours Spent Studying per 30- 55- 80- 105- 130- 155- 180-
Month 55 80 105 130 155 180 205
Number of Students 3 4 6 9 12 11 5
30 – 55 3 3
55 – 80 4 7
80 – 105 6 13
105 – 130 9 22
130 – 155 12 34
155 – 180 11 45
180 – 205 5 50
Here, n = 50, then 50 = 25th observation lies in the class 130–155. Hence the median class is
2
130–155. That is 25th observation is in cumulative frequency 34 and the corresponding
class is 130–155. Here, L = 130, n = 25, F = 22, f = 12 and c = 25. Hence
2
n / 2 −F. 25−22
Me = L + c = 130 + 25
f 12
Solution (ii): First we construct a ‘less than’ and ‘more than’ cumulative frequency table
Now plot the class intervals on the x-axis and the cumulative frequency on the y-axis.
(i) Draw two ogives by ’less than’ and by ’more than’ methods on the same graph paper.
Now draw a perpendicular from the intersecting point A on the x-axis. The point at
which the perpendicular cuts the x-axis is the median.
It is observed that two ogives intersect at A, and the value at which the perpendicular
from A to x-axis cuts the x-axis is about 136. So median is 136.
Now plot the class intervals on the x-axis and the’ less than’ cumulative frequency on the
y-axis. Plot points above the class intervals according to their cumulative frequencies.
Join the point’s free hand to get the required ogive. Locate a point n/2 = 50/2 = 25 on the
y-axis and from this point draw a line parallel to the x-axis on the ogive. Now draw
perpendicular on the x-axis from the point at which the line cuts on the ogive. The point
at which the perpendicular cuts the x-axis is the median. It is observed that the value at
which the perpendicular from A cuts the x-axis is 136.So the median is again is 136.
Solution (iii). It is obvious from the frequency table that the class 130-155 contains the
highest frequency. Hence the modal class is 130-155. The formula for finding mode is
1
Mo = L + i .
1 + 2
1
Mo = L + i = 130 + 3 25
1 + 2 3+ 1
Hence the modal studying time is 148.75 hours per month. That means, most of the
students studied for 148.75 hours.
Q18. Distinguish between mean, median, and mode. In which situation which measure
should be applied?
In which situation to apply each measure:
Mean: Use the mean when dealing with symmetrically distributed data without
significant outliers. For example, when calculating the average crop yield per acre in a
field with uniform growth conditions.
Median: Use the median when dealing with skewed distributions or datasets with
outliers. For instance, when determining the typical household income in an agricultural
community with a few high-income outliers.
Mode: Use the mode when dealing with categorical data or when identifying the most
common value in a dataset. For example, when determining the most prevalent crop type
grown in a region.
Q19. What are the merits and demerits of mean, median, and mode?
The merits and demerits of mean, median, and mode are provided below:
i) It is easy to understand,
ii) It is easy to calculate,
iii) It is based on all the observations,
iv) It is rigidly defined,
v) It is capable of further algebraic treatment,
vi) It is less affected by sampling fluctuation.
It is the best measure of average among all the averages. However, it has some limitations
too.
Demerits or limitations:
i) It is affected by extreme values,
ii) It cannot be computed in case of open–ended class interval of a frequency
distribution,
iii) It is not a good measure of central tendency in case of highly skewed distribution,
iv) It cannot be calculated for qualitative data,
v) It cannot be found graphically.
i) It is easy to understand.
ii) It is easy to calculate.
iii) It is not affected by extreme values. That is, it is unaffected by outliers.
iv) It can be computed in open-end frequency distribution.
v) It can be obtained from ogive. That means it can be found graphically.
vi) It is a suitable measure of location in case of very skewed distribution.
vii) The position of median can be easily located when a qualitative variable is
measured in ordinal scale.
viii) It is a unique value for a set of data like arithmetic mean.
Demerits or limitations:
i) It is not based on all the observations.
ii) It is not capable of algebraic treatment.
iii) It is more affected by sampling fluctuations.
iv) It cannot be calculated for nominal data.
i) It is easy to understand,
ii) It is easy to calculate,
iii) It is not affected by extreme values,
iv) It can be calculated for open-ended class interval,
v) It can be calculated graphically,
vi) It can be calculated both for qualitative and quantitative data.
Demerits:
i) It is not based on all the observations,
ii) Mode is not a rigidly defined measure as there are several formulae for finding
mode, all of which usually give somewhat different answers.
iii) It is not clearly defined in case of bimodal or multimodal distribution,
iv) Mode cannot be defined if each value of the variable occurs only once in a set of
data,
v) It is affected by sampling fluctuation,
vi) It is not suitable for further algebraic treatment,
vii) Mode cannot be calculated if the highest frequency lies in the first or last class in
a frequency distribution.
Among all the measures of location arithmetic mean is the best since it satisfies most of
the criteria of a good measure of central tendency. But it is highly affected by extreme
values. Mode or median is the best measure of location in presence of extreme values. But
mode and median may not be rigidly defined. Mode is the only measure of location for
categorical data measured in nominal scale.