LECTURE 3 MEASURES OF CENTRAL TENDENCY (Repaired)
LECTURE 3 MEASURES OF CENTRAL TENDENCY (Repaired)
The Mean ( )
The arithmetic mean (simply the mean) of a set of observations is the sum of all
the observations divided by the number of the observations, i.e. adding up all
the scores in the distribution and dividing the result by the number of scores.
The mean is computed with the formula
Which means that one should add all the scores (i.e. Xi) and divide by N.
In order words,
Ungroup data
Weight kg x1 x2 x3 x4 x5
2 3 6 8 9
5.6
Example.2
Calculate the mean weigh of a sample of 8 patients with the following weights:
84, 92, 37, 50, 50, 84, 40 and 98.
= 535/8= 66.88
It can also be observed that some of the values occur more than one. These are
84 and 50. We can now rewrite the mathematical expression for the mean by
grouping the value as:
66.88 kg
X F Fx
6 2 12
5 5 25
4 7 28
3 6 18
2 5 10
Total 25 93
Using the formula above, apply the hypothetical data given in the table: This
follows the same approach as a group data except that the x values are single
and not in the form of interval
In grouped data, we do not have the values of individual scores; hence, it is not
immediately straight forward to obtain the sum of all the values. To overcome
this problem, we assume that the middle score of the class interval represents
the other scores in the interval. Consequently, we use the middle score or mid-
point to compute the mean (x̄ )
We must thus define the class boundaries by taking 0.5 from the lower
boundary and adding it to the upper boundary as shown in the table below:
It is noticed that this method involves the handling of large figures. Those
beginners who are figure-phobic become jittery using this method since it
entails tedious calculations. To avoid this, the assumed-mean method is often
used especially when the distribution involves large scores.
General formula =
The steps to be used in the assumed-mean method are: The mid-point score of
the class interval with the highest frequency is often recommended
= 47 + 1.5
= 48.5
The Median
The Median, symbolized Md, is the point that divides the distribution into two
parts such that an equal number of scores fall above and below that point.
Alternatively, median is that value in a data set which divides the ranked or
ordered values into two equal sized groups. One of these groups consists of
values equal to or smaller than the median, the other consists of values equal to
or larger than the median. It is most appropriate to judge relative standing in the
distribution. There are variations in the computation of the median. Such
variations depend on whether there is an odd or even number of scores in the
distribution and whether there is a duplication of score values near the median
point. When the number of observations is odd, and the observations are
arranged in ascending order, the median is the 1/2 (n + 1) the observation or
simply the middle value, e.g. in 2, 3, 6, 5, 6, (2, 3, 5, 6, 6) the median is 5
whereas in 3, 5, 6, 7, 10 the median score is 6. When there is an even number of
scores in a distribution and there is no duplication near the median, the average
of the middle two scores is taken as the median. In a distribution 3 4 5 6 7 8 the
median is (5 + 6) / 2 = 5.5. Again in a distribution 3,5, 5, 6, 9, 17 the median is
5.5 because it is not near the median score.
The median is a more representative measure of central tendency than the mean
in those data sets that are skewed in one direction or other. The skewed data sets
are those containing one or more extreme values at one end than the other. The
median is less influenced by extreme values and thus presents the best measure
of central tendency in the case of skewed data.
The position of the median in relation of the mean
The direction of skewness is given by the position of the mean relative to the
median, if the position of the mean is to the left it means that the distribution is
also skewed to the left. If to the right the skewness is also to the right.
Procedures in determining the median from grouped data
General Rank =
6. Add this share to the real lower limit of the interval with the highest frequency
to get the median.
Let us use the earlier example of the ages of 100 subjects to calculate the
median
Mean =
Share =
= 4.01 years
Median = share + the real lower limit of the interval with the highest frequency.
= 4.01 + 44.5 (years)
= 48.51 years
Median for group data can also be calculated by making use the following
formula:
Median =
Where:
r//c = real lower limit of the median class
Pn = General rank = n+1/2
∑Fb = the sum of frequencies below the median class
fc = frequency of the median class
i = the width of the class interval.
Note: This formula can also be used to compute the percentile, quartile and
deciles ranks.
Apply the formula using all these values you have obtained:
Md =
= 44.5 + (152.5/38)
= 44.5 + 4.01 = 48.51
Md = 48.5
The Mode
The Mode, symbolized Mo, is the most frequently occurring score. It is the
score(s) that occur(s) most often or the point at which the largest number of
scores fall. Roughly, it indicates the centre of concentration of a distribution. In
the distribution 1 1 2 3 3 3 4 5, the mode is 3. This is a uni-modal distribution
because the distribution has one mode only. Sometimes a distribution may have
two modes, e.g. the distribution 6 67 7 7 7 8 8 8 9 9 9 9 10. 10, has 7 and 9 as its
modes. Such distribution is called a bi-modal distribution. If the distribution has
more than two modes, it is called a multi-modal distribution. While it is easy to
pick out the mode from an ungrouped data, it is not so easy with a grouped data.
It will require the use of a mathematical formula to determine it. The formula to
use is;
The fundamental difference between the mean and the median lies in the fact
that the mean reflects the value of each score in the distribution whereas the
median is based largely on where the mid-point falls without regard for the
particular value of many of the scores, especially when extreme results occur in
the distribution. In contrast, the mean considers the value of every score; any
change in any score instantly affects the value of the mean. The mode reflects
only the frequently occurring score. It is mostly useful in describing the central
tendency of nominal data, even though it could still be used in describing other
data. Generally, while the mode fluctuates widely, the median is found to
remain stable but non-reflective of changes in extreme scores. The mean on the
other hand, being sensitive to the numerical size of every score, changes as any
score in the distribution changes in size. This is why the mean is regarded as the
best measure of central tendency of any distribution that is homogenous
whereas the median is the best measure of central tendency when the
distribution is heterogonous
Measures of Variability
Another important attribute of any given distribution of scores is the
knowledge of its variability. Variability refers to the extent to which the scores
in a distribution differ from their central tendency. It concerns how each score
in the distribution disperses or moves away from the centre score/point. That is,
it expresses quantitatively the extent to which the scores in a distribution
disperse or cluster together. It is the summary description of the spread of
performance. The range, mean deviation, semi-inter-quartile range, standard
deviation, and variance are modes of determining variability.
Range
The range is the numerical distance of a given distribution. Simply put, it is the
difference between the highest score and the lowest score in the distribution.
Technically, the range is defined as the difference between the real upper limit
of the largest score minus the real lower limit of the smallest score. The
problem, hence weakness of the range, is that it ignores the nature of other
scores lying between the extreme scores. Its relevance lies in the fact that it is
used as a first step in collating data for analysis. It does not have the quality of
determining whether the distribution is heterogeneous or not. This is why it is
said to provide a crude measure of variability.
Mean Deviation
It is the mean of absolute values of all the deviations from the mean. It is
obtained by ignoring the signs of the deviation and regarding all of them as
positive. The formula used for its computation is
6 -2 4
8 0 0
8 0 0
10 2 4
32 8
∑X = 32
Mean Deviation =
However, the mean deviation does not have good mathematical property. It is
not used regularly in statistics because it is clumsy to achieve manually. It is a
good measure of variability especially where the automated process is used.
Semi-Inter-quartile Range, Q
Another way of measuring the variability of any given set of the test scores is
the semi-inter-quartile range. According to Nanty (1985), this is half of the
difference between the third (Q3) and the first (Q1) quartiles. The formula for
its computation is:
Where, Q1 = point in the distribution above which 25% of the scores lie
Q3= point in the distribution below which 75% of the score lie.
Standard Deviation
The most commonly used measure of variability in test scores is the standard
deviation. Like the mean (x̄ ), the standard deviation takes into account the
numerical size of every score. It takes account of the deviation of every score
from the mean (x̄ ). This is to say that every score contributes its influence on the
standard deviation. Standard deviation is fundamentally the average of deviation
of all the scores from the mean. Since the sum of these deviations is zero, they
are first of all squared before they are added up and then the sum is divided by
the number of scores. The square root of the resulting number is then taken. In
symbols, the standard deviation is defined as:
Example 1:
Calculate the variance of the data on incubation period for the sample of 10
typhoid patients 15, 12, 16, 14, 10, 12, 13, 10, 9, and 14.
Solution:
∑xi = 15 + 12 + 16 + 14 + 10 + 12 + 13 + 10 + 9 + 14 = 125
n = 10
Hence,
S=2.231
10 2 20 100 200
9 2 18 81 162
8 3 24 64 192
7 4 28 49 196
6 7 42 36 252
5 5 25 25 125
4 3 12 16 48
3 3 9 9 27
2 4 8 4 16
1 2 2 1 2
Using formula
The Variance
This is defined as the average of the squared deviations of the observations from
their arithmetic mean. The variance is another measure of the dispersion of a
frequency distribution. With it, each difference between the score and the mean
( ) is squared. These squared differences are summed up and divided by the
number of scores. It could be calculated using any of the formulae for standard
deviation. While the variance is perfectly adequate, mathematically, for
describing dispersion, it’s drawback lies in the fact that it is not on the same
level as the original unit of the raw data. This is because squaring was used in
changing the raw score to the variance estimate. Consequently, to get it back to
its original scale or non-inflated measure, the final variance estimate is usually
square-rooted. This is the standard deviation. Here we can distinguish two
variances namely:
Population variance which is denoted by ‘’
Sample variance, denoted by S2
n = sample size
Example 1:
Calculate the variance of the data on incubation period for the sample of 10
typhoid patients 15, 12, 16, 14, 10, 12, 13, 10, 9, and 14.
Solution:
∑xi = 15 + 12 + 16 + 14 + 10 + 12 + 13 + 10 + 9 + 14 = 125
n = 10
Hence,
S2 = 196.3157895
Coefficient of variation (CV)
This is a unit-free measure of dispersion expressed as a percentage. It is used to
compare variability in two or more variables with different population of data of
the same measurement, since in this case comparing the standard deviation as a
measure of variability may lead to fallacious results. This measure of coefficient
of variation expresses relative variation rather than absolute variation. It
expresses the standard deviation multiply by the 100% and divided by the mean.
Apart from its use in comparing variation in different set of data it is also useful
in comparing the results obtained by different person making the same
measurements.
The standard deviation is the most accurate and reliable measure of variability
and it is very useful in describing test data. Along with the mean ( ) it provides
a reliable description of most distributions. It follows that a set of data is
considered adequately described if its mean ( ) and standard deviation (std) are
given. The median and the semi-quartile range are also always used together to
describe especially those distributions that are skewed. When we consider
together the mean ( ) and the standard deviation, it helps us to compare the
different distribution of scores. We may decide, for instance, to compare the
mean ( ) score and variability of male and female scores in one class or of
students' scores on one test or testing on another. The performance of a student
could be compared to that of a class by determining how far away from the
group mean ( ) his score stands. You can also correctly describe the students'
score mathematically by indicating how many standard deviations away from
the class mean ( ) his score lies. The mean and standard deviation enable a
relative interpretation of individuals and group performances.