0% found this document useful (0 votes)
35 views19 pages

LECTURE 3 MEASURES OF CENTRAL TENDENCY (Repaired)

This document discusses different measures of central tendency including the mean, median and mode. It provides detailed explanations and examples of calculating each measure from both ungrouped and grouped data. Formulas and step-by-step processes are given for finding the mean, median and mode of data sets.

Uploaded by

kwameasilevi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views19 pages

LECTURE 3 MEASURES OF CENTRAL TENDENCY (Repaired)

This document discusses different measures of central tendency including the mean, median and mode. It provides detailed explanations and examples of calculating each measure from both ungrouped and grouped data. Formulas and step-by-step processes are given for finding the mean, median and mode of data sets.

Uploaded by

kwameasilevi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 19

MEASURES OF CENTRAL TENDENCY

Outside visual summary and description of data, the nature of the


distribution can be known through mathematical summaries/ statements which
will allow us to make inferences concerning the group even better and more
concisely than graphs. Central tendency of a set of distributions is one way of
achieving this objective.' This implies the degree to which every score within
the distribution clusters around the central point or the centre of that
distribution. Generally, it implies using a single number to represent a feature of
a set of data. The advantages of using such single number to represent a set of
data are that it facilitates comparison and, when well chosen, a single number
very often conveys the same impression as all the numbers in the data. The
three most commonly used measures of central tendency or location or averages
or representative values are the Mean, Median and Mode.

The Mean ( )

The arithmetic mean (simply the mean) of a set of observations is the sum of all
the observations divided by the number of the observations, i.e. adding up all
the scores in the distribution and dividing the result by the number of scores.
The mean is computed with the formula

Which means that one should add all the scores (i.e. Xi) and divide by N.

In order words,

Ungroup data

Weight kg x1 x2 x3 x4 x5
2 3 6 8 9
5.6
Example.2

Calculate the mean weigh of a sample of 8 patients with the following weights:
84, 92, 37, 50, 50, 84, 40 and 98.

= 535/8= 66.88

It can also be observed that some of the values occur more than one. These are
84 and 50. We can now rewrite the mathematical expression for the mean by
grouping the value as:

66.88 kg

Frequency distribution table of ungrouped data with frequency

X F Fx
6 2 12
5 5 25
4 7 28
3 6 18
2 5 10
Total 25 93
Using the formula above, apply the hypothetical data given in the table: This
follows the same approach as a group data except that the x values are single
and not in the form of interval

Mean for grouped data with frequencies

In grouped data, we do not have the values of individual scores; hence, it is not
immediately straight forward to obtain the sum of all the values. To overcome
this problem, we assume that the middle score of the class interval represents
the other scores in the interval. Consequently, we use the middle score or mid-
point to compute the mean (x̄ )

We must thus define the class boundaries by taking 0.5 from the lower
boundary and adding it to the upper boundary as shown in the table below:

Age Freq(fi) Class Class fx


(Years) Boundary midpoint X
35 – 39 5 34.5 – 39.5 37.0 185
40 – 44 15 39.5 – 44.5 42.0 630
45 – 49 38 44.5 – 49.5 47.0 1786
50 – 54 29 49.5 – 54.5 52.0 1508
55 – 59 13 54.5 – 59.5 57.0 741
∑fi = 100 ∑fx =
4850
When the distribution has some scores occurring more than once, the resultant
table will reflect a frequency distribution. The steps to be used in determining
the mean are as follows. It follows the same method just like the determination
of mean for ungroup data with frequencies. Here, to determine the X values, the
class midpoint is obtained by adding the two class limits and dividing by 2. The
class midpoint than becomes the X values
 For the purpose of calculating mean from a grouped data, a third column
fx is created on the frequency table:
 In the case of data classified into intervals, the frequencies are multiplied
with the class mid-point to get fx. The class midpoint is obtained by
adding the two class limits and dividing by 2. The class midpoint
becomes xi.
 For example: A sample of 100 mothers interviewed in a survey had the
following age distribution:

Frequency distribution table of grouped data

Classical Mid-Point Frequency fx


Interval
24 – 26 25 3 75
21 – 23 22 4 88
18 – 20 19 7 133
15 – 17 16 6 96
12 – 14 13 5 65
9 - 11 10 3 30
28 487

Assumed mean Method

It is noticed that this method involves the handling of large figures. Those
beginners who are figure-phobic become jittery using this method since it
entails tedious calculations. To avoid this, the assumed-mean method is often
used especially when the distribution involves large scores.

In assumed-mean method (also called coding), you can reasonably choose


any number within the distribution as the assumed mean. Any of the mid-points
scores can be chosen. Even though the mid-point of the class interval with the
highest frequency is recommended, any mid-point score will yield the same
result. After establishing the assumed-mean, code each of the other intervals
serially in accordance with the-intervals they used as the assumed mean. A
negative code will result if the mid-point of the interval is less that the assumed-
mean while a positive code is given if it is bigger. Computation is done using
the formula

 General formula =

 Where; a = assumed mean


 c = constant i.e the difference between class boundary
 n = sum total of frequencies
 x11 = the difference between the assumed mean and the x values divided
by the constant
 Let ‘a’ = assumed mean

 Remember, the choice of the assumed mean is based on the midpoint or x


value or score that corresponds to the highest frequency. In the example
below, let us choose an assumed mean of 47.0.

The steps to be used in the assumed-mean method are: The mid-point score of
the class interval with the highest frequency is often recommended

Age Freq Class Class fx CF


(Years) (f) Boundary midpoint
35 – 39 5 34.5– 39.5 37.0 185 37 – 47 = -10 5
–10
40 – 44 38 39.5– 44.5 42.0 630 42– 47 = -15 43
–5
45 – 49 38 44.5– 49.5 47.0 1786 47– 47 = 0 81
0
50 – 54 29 49.5– 54.5 52.0 1508 52– 47 = 29 110
5
55 – 59 13 54.5– 59.5 57.0 741 57– 47 = 26 123
10
∑fi = ∑fx
100 =
4850

 = 47 + 1.5
 = 48.5
The Median

The Median, symbolized Md, is the point that divides the distribution into two
parts such that an equal number of scores fall above and below that point.
Alternatively, median is that value in a data set which divides the ranked or
ordered values into two equal sized groups. One of these groups consists of
values equal to or smaller than the median, the other consists of values equal to
or larger than the median. It is most appropriate to judge relative standing in the
distribution. There are variations in the computation of the median. Such
variations depend on whether there is an odd or even number of scores in the
distribution and whether there is a duplication of score values near the median
point. When the number of observations is odd, and the observations are
arranged in ascending order, the median is the 1/2 (n + 1) the observation or
simply the middle value, e.g. in 2, 3, 6, 5, 6, (2, 3, 5, 6, 6) the median is 5
whereas in 3, 5, 6, 7, 10 the median score is 6. When there is an even number of
scores in a distribution and there is no duplication near the median, the average
of the middle two scores is taken as the median. In a distribution 3 4 5 6 7 8 the
median is (5 + 6) / 2 = 5.5. Again in a distribution 3,5, 5, 6, 9, 17 the median is
5.5 because it is not near the median score.
The median is a more representative measure of central tendency than the mean
in those data sets that are skewed in one direction or other. The skewed data sets
are those containing one or more extreme values at one end than the other. The
median is less influenced by extreme values and thus presents the best measure
of central tendency in the case of skewed data.
The position of the median in relation of the mean

Positively skewed because the Negatively skewed because the


distribution is skewed to the right. distribution is skewed to the left. B
A

mode median mean mean median mode

The direction of skewness is given by the position of the mean relative to the
median, if the position of the mean is to the left it means that the distribution is
also skewed to the left. If to the right the skewness is also to the right.
Procedures in determining the median from grouped data

1. Rank-order, Rank order may be done in either ascending descending order.

General Rank =

2. For grouped data, construct the ascending cumulative frequency column.


3. Identify the class interval containing the general rank when adding the
frequencies cumulatively.
4. Identify the special rank.
Special rank = General Rank – Ascending Cumulative Frequency directly
preceding the interval with the highest frequency (ACF)
5. Calculate time share:
Share =

6. Add this share to the real lower limit of the interval with the highest frequency
to get the median.

Let us use the earlier example of the ages of 100 subjects to calculate the
median

Mothers Class Class Freq. fx Com.


age mid- Boundaries (f) Freq.
(Years) point=(x) (cf)
35 – 39 37 34.5– 39.5 5 185 5
40 – 44 42 39.5 – 44.5 15 630 20
45 – 49 47 44.5 – 49.5 38 1786 58
50 – 54 52 49.5 – 54.5 29 1508 87
55 – 59 57 54.5 – 59.5 13 741 100
∑f = ∑fx =
100 4850

Mean =

Median = General rank =

Special rank = General Rank – Ascending Cumulative Frequency directly


preceding the interval with the highest frequency
= GR – ACF
= 50.5 – 20 = 30.5

Share =

= 4.01 years
Median = share + the real lower limit of the interval with the highest frequency.
= 4.01 + 44.5 (years)
= 48.51 years
Median for group data can also be calculated by making use the following
formula:

Median =

Where:
r//c = real lower limit of the median class
Pn = General rank = n+1/2
∑Fb = the sum of frequencies below the median class
fc = frequency of the median class
i = the width of the class interval.
Note: This formula can also be used to compute the percentile, quartile and
deciles ranks.
 Apply the formula using all these values you have obtained:

Md =

= 44.5 + (152.5/38)
= 44.5 + 4.01 = 48.51
Md = 48.5
The Mode

The Mode, symbolized Mo, is the most frequently occurring score. It is the
score(s) that occur(s) most often or the point at which the largest number of
scores fall. Roughly, it indicates the centre of concentration of a distribution. In
the distribution 1 1 2 3 3 3 4 5, the mode is 3. This is a uni-modal distribution
because the distribution has one mode only. Sometimes a distribution may have
two modes, e.g. the distribution 6 67 7 7 7 8 8 8 9 9 9 9 10. 10, has 7 and 9 as its
modes. Such distribution is called a bi-modal distribution. If the distribution has
more than two modes, it is called a multi-modal distribution. While it is easy to
pick out the mode from an ungrouped data, it is not so easy with a grouped data.
It will require the use of a mathematical formula to determine it. The formula to
use is;

Where a, = Lower limit of the modal class

f = Frequency of the modal class

b = Upper limit of the modal class

fa = Frequency directly lined above the modal class

fb= Frequency directly lined below the modal class


With respect to the table below, the mode is computed thus:

Class F-Tally Freq Mid-point Com.Freq. CFP


interval
65 – 69 III 3 67 60 1.00
60 – 64 IIII 5 62 57 0.95
55 – 59 IIII I 6 57 52 0.87
50 – 54 IIII II 7 52 46 0.77
45 – 49 IIII IIII 10 47 39 0.67
40 – 44 IIII III 8 42 29 0.48
35 – 39 IIII I 6 37 21 0.35
30 – 34 IIII 5 32 15 0.25
25 – 29 IIII 5 27 10 0.17
20 - 24 IIII 5 22 5 0.08
60
The mode is a crude indicator of the central tendency of a distribution. It
ignores or does not consider the size of any other score in the distribution and is
easily affected by changes' in the number of scores in the distribution including
changes in the width of class interval in the case of grouped data. However, it is
most appropriate as a measure of central tendency where the distribution has
both qualitative and quantitative characteristics.

Comparison of the Mean, Median and Mode

The fundamental difference between the mean and the median lies in the fact
that the mean reflects the value of each score in the distribution whereas the
median is based largely on where the mid-point falls without regard for the
particular value of many of the scores, especially when extreme results occur in
the distribution. In contrast, the mean considers the value of every score; any
change in any score instantly affects the value of the mean. The mode reflects
only the frequently occurring score. It is mostly useful in describing the central
tendency of nominal data, even though it could still be used in describing other
data. Generally, while the mode fluctuates widely, the median is found to
remain stable but non-reflective of changes in extreme scores. The mean on the
other hand, being sensitive to the numerical size of every score, changes as any
score in the distribution changes in size. This is why the mean is regarded as the
best measure of central tendency of any distribution that is homogenous
whereas the median is the best measure of central tendency when the
distribution is heterogonous

Measures of Variability
Another important attribute of any given distribution of scores is the
knowledge of its variability. Variability refers to the extent to which the scores
in a distribution differ from their central tendency. It concerns how each score
in the distribution disperses or moves away from the centre score/point. That is,
it expresses quantitatively the extent to which the scores in a distribution
disperse or cluster together. It is the summary description of the spread of
performance. The range, mean deviation, semi-inter-quartile range, standard
deviation, and variance are modes of determining variability.

Range

The range is the numerical distance of a given distribution. Simply put, it is the
difference between the highest score and the lowest score in the distribution.
Technically, the range is defined as the difference between the real upper limit
of the largest score minus the real lower limit of the smallest score. The
problem, hence weakness of the range, is that it ignores the nature of other
scores lying between the extreme scores. Its relevance lies in the fact that it is
used as a first step in collating data for analysis. It does not have the quality of
determining whether the distribution is heterogeneous or not. This is why it is
said to provide a crude measure of variability.

Mean Deviation

It is the mean of absolute values of all the deviations from the mean. It is
obtained by ignoring the signs of the deviation and regarding all of them as
positive. The formula used for its computation is

6 -2 4
8 0 0
8 0 0
10 2 4

32 8
∑X = 32

Mean Deviation =

However, the mean deviation does not have good mathematical property. It is
not used regularly in statistics because it is clumsy to achieve manually. It is a
good measure of variability especially where the automated process is used.

Semi-Inter-quartile Range, Q

Another way of measuring the variability of any given set of the test scores is
the semi-inter-quartile range. According to Nanty (1985), this is half of the
difference between the third (Q3) and the first (Q1) quartiles. The formula for
its computation is:

Where, Q1 = point in the distribution above which 25% of the scores lie

Q3= point in the distribution below which 75% of the score lie.

The advantage of semi-inter-quartile range over the median is that it is devoid


of the influence of extreme scores in the distribution. However, like the range, it
is only determined by two scores in the distribution.

Standard Deviation

The most commonly used measure of variability in test scores is the standard
deviation. Like the mean (x̄ ), the standard deviation takes into account the
numerical size of every score. It takes account of the deviation of every score
from the mean (x̄ ). This is to say that every score contributes its influence on the
standard deviation. Standard deviation is fundamentally the average of deviation
of all the scores from the mean. Since the sum of these deviations is zero, they
are first of all squared before they are added up and then the sum is divided by
the number of scores. The square root of the resulting number is then taken. In
symbols, the standard deviation is defined as:

Example 1:

Calculate the variance of the data on incubation period for the sample of 10
typhoid patients 15, 12, 16, 14, 10, 12, 13, 10, 9, and 14.

Solution:

∑xi = 15 + 12 + 16 + 14 + 10 + 12 + 13 + 10 + 9 + 14 = 125

n = 10

Hence,

S=2.231

Standard deviation computational process using raw score method


X F FX X2 FX2

10 2 20 100 200
9 2 18 81 162
8 3 24 64 192
7 4 28 49 196
6 7 42 36 252
5 5 25 25 125
4 3 12 16 48
3 3 9 9 27
2 4 8 4 16
1 2 2 1 2

35 188 385 1220

Using formula

The Variance

This is defined as the average of the squared deviations of the observations from
their arithmetic mean. The variance is another measure of the dispersion of a
frequency distribution. With it, each difference between the score and the mean
( ) is squared. These squared differences are summed up and divided by the
number of scores. It could be calculated using any of the formulae for standard
deviation. While the variance is perfectly adequate, mathematically, for
describing dispersion, it’s drawback lies in the fact that it is not on the same
level as the original unit of the raw data. This is because squaring was used in
changing the raw score to the variance estimate. Consequently, to get it back to
its original scale or non-inflated measure, the final variance estimate is usually
square-rooted. This is the standard deviation. Here we can distinguish two
variances namely:
Population variance which is denoted by ‘’
Sample variance, denoted by S2

Where ∑xi = the sum of the observations

∑x2 = the sum of the squared observations

n = sample size

The variance represents squared units and is therefore not an appropriate


measure of dispersion when we want to express this concept in terms of the
original units:
A sample has six children whose ages are 6, 8, 10, 12, 14 and 16. Find the
variance in the ages: Find the variation of each age from the mean and then
square. Divide by n – 1 (6–1) to get the variance.

Example 1:
Calculate the variance of the data on incubation period for the sample of 10
typhoid patients 15, 12, 16, 14, 10, 12, 13, 10, 9, and 14.

Solution:
∑xi = 15 + 12 + 16 + 14 + 10 + 12 + 13 + 10 + 9 + 14 = 125

n = 10

Hence,

Calculation of variance for group data


Oestrogen Values Freq (f) Class mid Cm. F
x2 fx2 f ix
(x)
49.5 – 54.5 2 52 2704 5408 104 2
54.5 – 59.5 4 57 3249 12996 228 6
59.5 – 64.5 3 62 3844 11532 186 9
64.5 – 69.5 1 67 4489 4489 67 10
69.5 – 74.5 1 72 5184 5184 72 11
74.5 – 79.5 1 77 5929 5929 77 12
79.5 – 84.5 5 82 6724 33620 410 17
84.5 – 89.5 0 87 7569 0 0 17
89.5 – 94.5 3 92 8464 25392 276 20
∑f = 20 ∑x2 = ∑fix2 = ∑fi xi
48156 104,550 =
1420
Variance =

S2 = 196.3157895
Coefficient of variation (CV)
This is a unit-free measure of dispersion expressed as a percentage. It is used to
compare variability in two or more variables with different population of data of
the same measurement, since in this case comparing the standard deviation as a
measure of variability may lead to fallacious results. This measure of coefficient
of variation expresses relative variation rather than absolute variation. It
expresses the standard deviation multiply by the 100% and divided by the mean.

Apart from its use in comparing variation in different set of data it is also useful
in comparing the results obtained by different person making the same
measurements.

Standard deviation (S) =


S = 14.011
≈ S = 14
Co-efficient of Variation (CV)
Summary

The standard deviation is the most accurate and reliable measure of variability
and it is very useful in describing test data. Along with the mean ( ) it provides
a reliable description of most distributions. It follows that a set of data is
considered adequately described if its mean ( ) and standard deviation (std) are
given. The median and the semi-quartile range are also always used together to
describe especially those distributions that are skewed. When we consider
together the mean ( ) and the standard deviation, it helps us to compare the
different distribution of scores. We may decide, for instance, to compare the
mean ( ) score and variability of male and female scores in one class or of
students' scores on one test or testing on another. The performance of a student
could be compared to that of a class by determining how far away from the
group mean ( ) his score stands. You can also correctly describe the students'
score mathematically by indicating how many standard deviations away from
the class mean ( ) his score lies. The mean and standard deviation enable a
relative interpretation of individuals and group performances.

You might also like