0% found this document useful (0 votes)
18 views

measure of central tendency-intro

Chapter 1 discusses measures of central tendency, focusing on the arithmetic mean, median, and mode as key statistical averages. It provides definitions, methods for calculating these measures for both ungrouped and grouped data, and includes various tasks for practical application. The chapter also outlines the merits and demerits of each measure, emphasizing their importance in statistical analysis.

Uploaded by

tum chris
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

measure of central tendency-intro

Chapter 1 discusses measures of central tendency, focusing on the arithmetic mean, median, and mode as key statistical averages. It provides definitions, methods for calculating these measures for both ungrouped and grouped data, and includes various tasks for practical application. The chapter also outlines the merits and demerits of each measure, emphasizing their importance in statistical analysis.

Uploaded by

tum chris
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 21

CHAPTER 1: ANALYSIS AND INTERPRETATION OF DATA

Measures of Central Tendency


1.1 Introduction
In the previous chapter, we have studied how to collect raw data, its classification and tabulation
in a useful form, which contributes in solving many problems of statistical concern. Yet, this is
not sufficient, for in practical purposes, there is need for further condensation, particularly when
we want to compare two or more different distributions. We may reduce the entire distribution to
one number which represents the distribution.
A single value which can be considered as typical or representative of a set of observations and
around which the observations can be considered as Centered is called an ’Average’ (or average
value) or a Centre of location. Since such typical values tend to lie centrally within a set of
observations when arranged according to magnitudes, averages are called measures of central
tendency.
In fact, the distribution has a typical value (average) about which, the observations are more or
less symmetrically distributed. This is of great importance, both theoretically and practically.

Dr. A.L. Bowley correctly stated, "Statistics may rightly be called the science of averages."

The word average is commonly used in day-to-day conversations. For example, we may say that
Kanga is an average boy of my class; we may talk of an average Kenyan, average income, etc.
When it is said, "Kanga is an average student," it means is that he is neither very good nor very
bad, but a mediocre student. However, in statistics the term average has a different meaning.

The fundamental measures of tendencies are:


(1) Arithmetic mean
(2) Median
(3) Mode
(4) Geometric mean
(5) Harmonic mean
(6) Weighted mean

However, the most common measures of central tendencies or Locations are Arithmetic mean,
median and mode.

1.2 Arithmetic Mean


This is the most commonly used average. Here are definitions given by two great masters of
statistics.
Arithmetic mean is the amount secured by dividing the sum of values of the items in a series by
their number. Or
The arithmetic average may be defined as the sum of aggregate of a series of items divided by
their number.
Thus, you should add all observations (values of all items) together and divide this sum by the
number of observations (or items).
1.2.1 Ungrouped Data
Suppose, we have 'n' observations (or measures) x1, x2, x3, ......., xn then the Arithmetic mean is
We shall use the symbol (pronounced as x bar) to denote the Arithmetic mean. Since we have
to write the sum of observations very frequently, we use the usual symbol ' Ʃ ' (pronounced as
sigma) to denote the sum. The symbol xi will be used to denote, in general the 'i' th observation.
Then the sum, x1 + x2 + x3 + .......+ xn will be represented by

Therefore, the Arithmetic mean of the set x1 + x2 + x3 + .......+ xn is given by,

This method is known as the ''Direct Method".

TASK 1
A variable takes the values as given below. Calculate the arithmetic mean of 110, 117, 129, 195,
95, 100, 100, 175, 250 and 750.

Indirect Method (Assumed Mean Method, Short-cut Method)


Sometimes the values of X are very big and, in such a case, to simplify the calculation the short-
cut method is used. For this, first you assume a mean (called as the assumed mean). Let it be A.
Now find the deviations of all the values of X from A. We now get a new variable u i = xi -A,
now we find the mean of deviations and add to A (assumed mean)

Example:
Using task 1 data to calculate the mean using Assumed mean method.
Example Mr. Sonko’s earnings for the past week were:
Monday $ 450
Tuesday $ 375
Wednesday $ 500
Thursday $ 350
Friday $ 270
Find his average earning per day.
Solution:

TASK 2
The expenditure of ten families in dollars is given below:

Family A B C D E F G H I J
Expenditure 300 700 100 750 500 80 120 250 100 370
Taking $ 500 as Assumed mean, Calculate the Arithmetic mean.

1.2.2 Grouped data


There is a difference in the methods for finding the arithmetic means of the individual series and
a discrete series. In the discrete series, every term (i.e., value of x) is multiplied by its
corresponding frequency(f). The arithmetic mean is then obtained by

The formulae for Arithmetic mean by direct method and by the short-cut methods are as follows:

TASK 3
Find the mean of the following 50 observations.
19, 19, 20, 20, 20, 19, 20, 18, 21, 19,
20, 20, 19, 19, 20, 19, 21, 19, 19, 21,
18, 20, 18, 18, 17, 20, 20, 22, 20, 20,
20, 20, 20, 21, 20, 17, 23, 18, 17, 21,
20, 21, 20, 20, 20, 18, 21, 19, 20, 19

TASK 4
Nine coins were tossed together and the number of times they fell on the side of heads was
observed. The activity was performed 256 times and the frequency obtained for different values
of x, (the number of times it fell on heads) is shown in the following table.
x 0 1 2 3 4 5 6 7 8
f 1 9 26 59 72 52 29 1 1
Calculate then mean by:
i) Direct method
ii) ii) Short-cut method

TASK 5
Find the arithmetic mean for the following:

Marks 10 20 30 40 50 60 70 80
below
No. of 15 35 60 84 96 127 198 250
students

Step-Deviation Method
Here all class intervals are of the same width say 'c'. This method is employed in place of the
Short-cut method. We measure all the class-marks (mid values) from some convenient value, say
'A', which generally should be taken as the class-mark of a class of maximum frequency or of a
class which is the middle one. All the class marks happen to be multiples of c, since all class
intervals are equal. We consider class frequencies as if they are centered at the corresponding
class-marks.
Theorem If x1, x2 , x3, ......, xn are n values of the class marks with frequencies f1, f2 , f3, ......fn
respectively and if each xi is expressed in terms of the new variable ui by the relation

This method is also known as the "Coding method."

Example
Calculate the arithmetic mean from the following data:
Age 25 30 35 40 45 50 55 60
(years)
below
No. of 8 23 51 81 103 113 117 120
employees

Solution:
TASK 6
From the following data, of the calculation of arithmetic mean, find the missing item.
Wages (in 110 112 113 117 ? 125 129 130
dollars)
No. of 25 17 13 15 14 8 7 2
workers

Mean wage $ 115.86

1.2.3 Properties of Arithmetic Mean


1. The sum of the deviations, of all the values of x, from their arithmetic mean, is zero.
2. The product of the arithmetic mean and the number of items gives the total of all items.
3. If x1 and x2 are the arithmetic mean of two samples of sizes n1 and n2 respectively then, the
arithmetic means of the distribution combining the two can be calculated as

TASK 7
The average marks of three batches of students having 70, 50 and 30 students respectively are
50, 55 and 45. Find the average marks of all the 150 students, taken together.

TASK 8
The mean of a certain number of observations is 40. If two or more items with values 50 and 64
are added to this data, the mean rises to 42. Find the number of items in the original data.

TASK 9
The sum of deviations of a certain number of observations measured from 4 is 72 and the sum of
deviations of observations measured from 7 is -3. Find the number of observations and their
mean.

TASK 10
The mean weight of 98 students is found to be 50 kg. It is later discovered that the frequency of
the class interval (30- 40) was wrongly taken as 8 instead of 10. Calculate the correct mean.

1.2.4 Merits of the Mean


1. It is rigidly defined. Its value is always definite.
2. It is easy to calculate and easy to understand. Hence it is very popular.
3. It is based on all the observations; so that it becomes a good representative.
4. It can be easily used for comparison.
5. It is capable of further algebraic treatment such as finding the sum of the values of the
observations, if the mean and the total number of the observations are given; finding the
combined arithmetic mean when different groups are given etc.
6. It is not affected much by sampling fluctuations.
4.2.5 Demerits of the Arithmetic mean
1. It is affected by outliers or extreme values. For example, the Mean (A) of 10, 15,
25 and 500 is 137.5
Due to the outlier 500 the mean (A) of the four numbers is raised to 137.5. In such a case
mean(A) is not a good representative of the given data.
2. It is a value which may not be present in the given data.
3. Many at times it gives absurd results like 4.4 children per family.
4. It is not possible to take out the averages of ratios and percentages.
5. We cannot calculate it when open-end class intervals are present in the data.

4.3 Median
It is the value of the size of the central item of the arranged data (data arranged in the ascending
or the descending order). Thus, it is the value of the middle item and divides the series in to
equal parts.
In Connor’s words - "The median is that value of the variable which divides the group into two
equal parts, one part comprising all values greater and the other all values lesser than the
median." Example
the daily wages of 7 workers are 5, 7, 9, 11, 12, 14 and 15 dollars. This series contains 7 terms.
The fourth term i.e., $11 is the median.
1.3.1 Median in an Individual Series (ungrouped Data)
1. Set the individual series either in the ascending (increasing) or in the descending (decreasing)
order, of the size of its items or observations.
2. If the total number of observations be 'n' then
TASK 11
The following figures represent the number of books issued at the counter of a Statistics library
on 11 different days: 96, 180, 98, 75, 270, 80, 102, 100, 94, 75 and 200.
Calculate the median.

TASK 12
The population (in thousands) of 36 metropolitan cities are as follows: 2468, 591, 437, 20, 213,
143, 1490, 407, 284, 176, 263, 19, 181, 777, 387, 302, 213, 204, 153, 733, 391, 176 178, 122,
532, 360, 65, 260, 193, 92, 672, 258, 239, 160, 147, 151. Calculate the median.

1.3.2 Median in Discrete Series


Steps:
1. Arrange the data in ascending or descending order of magnitude.
2. Find the cumulative frequencies.
3. Apply the formula:
TASK 13
Locate the median in the following distribution.
Size 8 10 12 14 16 18 20
frequency 7 7 12 28 10 9 6

1.3.3 Median in Continuous Series (grouped Data)


Steps:
1. Determine the particular class in which the value of the median lies.
2. After ascertaining the class in which median lies, the following formula is used for
determining the exact value of the median.
TASK 14
Calculate the median for the following and verify it graphically.

Ages(years) 20-24 25-29 30-34 35-39 40-44


No. of 70 80 180 150 20
person)

Sometimes the series is given in the descending order of magnitude. In this situation convert the
series in the ascending order of magnitude and then using the regular formula, the median can be
calculated or the series can be put in the descending order of the magnitude and an alternative
formula be used to calculate the median.

Example
Marks 40-50 30-40 20-30 10-20 0-10
No of 10 12 40 30 8
students

Solution:
By interpolation
Alternative formula:

Note that, while calculating the median of a series, it must be put in the 'exclusive class-interval'
form. If the original series is in inclusive type, first convert it into the exclusive type and then
find its median.

TASK 15
The following distribution represents the number of minutes spent by a group of teenagers in
watching movies. What is the median?

Minutes/Weeks 0-99 100- 200- 300- 400- 500- 600 and


199 299 399 499 599 more
No. of 27 32 65 78 58 32 8
teenagers

1.3.4 Merits of Median


1. It is rigidly defined.
2. It is easy to calculate and understand.
3. It is not affected by extreme values like the arithmetic mean. For example, 5 persons have
their incomes $2000, $2500, $2600, $3000, $5000. The median would be $2600 while the
arithmetic mean would be $3020.
4. It can be found by mere inspection.
5. It is fully representative and can be computed easily.
6. It can be used for qualitative studies.
7. Even if the extreme values are unknown, median can be calculated if one knows the number of
items.
8. It can be obtained graphically.

1.3.5 Demerits of Median


1. It may not be representative if the distribution is irregular and abnormal.
2. It is not capable of further algebraic treatment.
3. It is not based on all observations.
4. It is affected by sample fluctuations.
5. The arrangement of the data in the order of magnitude is absolutely necessary.

1.4 Mode
It is the size of that item which possesses the maximum frequency. According to Professor
Kenney and Keeping, the value of the variable which occurs most frequently in a distribution is
called the mode.
It is the most common value. It is the point of maximum density.

1.4.1 Ungrouped Data


Individual series: The mode of this series can be obtained by mere inspection. The number
which occurs most often is the mode.
TASK 16
Locate mode in the data 7, 12, 8, 5, 9, 6, 10, 9, 4, 9, 9

Note that if in any series, two or more numbers have the maximum frequency, then the mode
will be difficult to calculate. Such series are called as Bi-modal, Tri-modal or Multi-modal
series.
4.4.2 Grouped Data
Steps:
1. Determine the modal class which as the maximum frequency.
2. By interpolation the value of the mode can be calculated as -
TASK 17
Calculate the modal wage by interpolation.

Daily 20-25 25-30 30-35 35-40 40-45 45-50


wages(dollars)
No. of 1 3 8 12 7 5
workers

Verify it graphically.

1.4.3 Merits of Mode


1. It is simple to calculate.
2. In individual or discrete distribution it can be located by mere inspection.
3. It is easy to understand. Everyone is used to the idea of average size of a garment, an average
Kenyan etc.
4. It is not isolated like the median as it is the most common item.
5. Like the Average mean, it is not a value which cannot be found in the series.
6. It is not necessary to know all the items. What we need the point of maximum density
frequency.
7. It is not affected by sampling fluctuations.

1.4.4 Demerits of Mode


1. It is ill defined.
2. It is not based on all observations.
3. It is not capable of further algebraic treatment.
4. It is not a good representative of the data.
5. Sometimes there are more than one values of mode.

1.5 Empirical Relation Between Mean, Median and Mode


A distribution in which the values of mean, median and mode coincide (i.e., mean = median =
mode) is known as a symmetrical distribution. Conversely, when values of mean, median and
mode is not equal the distribution is known as asymmetrical or skewed distribution. In
moderately skewed or asymmetrical distribution, a very important relationship exists among
these three measures of central tendency. In such distributions the distance between the mean
and median is about one-third of the distance between the mean and mode, as will be clear from
the diagrams 1 and 2. Karl Pearson expressed this relationship as:

Knowing any two values, the third can be computed.

TASK 18
Given median = 20.6, mode = 26
Find mean.

1.6 Weighted mean


The weighted mean is a mean where there is some variation in the relative contribution of
individual data values to the mean. Each data value (Xi) has a weight assigned to it (Wi). Data
values with larger weights contribute more to the weighted mean and data values with smaller
weights contribute less to the weighted mean. The formula is

There are several reasons why you might want to use a weighted mean.

1. Each individual data value might actually represent a value that is used by multiple people in
your sample. The weight, then, is the number of people associated with that particular value.
2. Your sample might deliberately over represent or under represent certain segments of the
population. To restore balance, you would place less weight on the over represented segments of
the population and greater weight on the underrepresented segments of the population.
3. Some values in your data sample might be known to be more variable (less precise) than other
values. You would place greater weight on those data values known to have greater precision.

TASK 19
Joan gets quiz grades of 79, 82, and 69. She gets a 65 on her final exam. Find the weighted mean
if the quizzes each count for 10% and the final exam counts for 70% of the final grade.
1.7 Geometric mean
The geometric mean is an average calculated by multiplying a set of numbers and taking the nth
root, where n is the number of numbers.
A common example when the geometric mean is use is when averaging growth rates.
The formula for the geometric mean: -

Where n is the number of observations made of the variable x and X 1, X2…, Xn are the values of
these observations.

Example,
Find the Geometric mean of numbers: 3, 25 and 45
There are three observations, thus n = 3

The geometric mean cannot be calculated if we have negative or zero observations. The
geometric mean of a set of readings is always less than the arithmetic mean (unless all readings
are identical) and is less influenced by very large values / items.

TASK 20
a. Calculate the arithmetic and geometric mean of the following salaries: - in thousands of
shillings per month 6, 8, 10, 10,10,12,16.

b. Given the following salaries (i.e. in thousands of Ksh) in a company per annum (p.a):- 6,
8, 10, 10,10,12,48.

The geometric mean is useful when only a few items in a distribution are changing: it’s in the
circumstances more stable than the arithmetic mean. It is useful in the calculation of share
indices and also in such calculations where data grows in geometric progression i.e., the
population of a country.

EXAMPLE
Given population in a city was 300,000 in 1980 and 400,000 in 1990, if we wanted to find out an
estimate of the arithmetic mean of the population in 1985.

Here, we are making an assumption the population grows by the same number each year which
is not correct. The same thing applies to money assuming its growing in a compound rate. The
geometric mean for 1985 would be: -
= 2√ (300,000 x 400,000)

= 371,080

1.8 Harmonic mean


Harmonic mean is another measure of central tendency and also based on mathematic footing
like arithmetic mean and geometric mean. Like arithmetic mean and geometric mean, harmonic
mean is also useful for quantitative data. Harmonic mean is defined:

Harmonic mean is quotient of “number of the given values” and “sum of the reciprocals of
the given values”.

Harmonic mean in mathematical terms is defined as follows:

TASK 21
Calculate the harmonic mean of the numbers: 13.5, 14.5, 14.8, 15.2 and 16.1

TASK 22
Given the following frequency distribution of first year students of a particular college.
Calculate the Harmonic Mean.
Age 13 14 15 16 17
(Years)
Number of 2 5 13 7 3
Students

TASK 23:
Calculate the harmonic mean for the given below:
Marks 30-39 40-49 50-59 60-69 70-79 80-89 90-99
frequency 2 3 11 20 32 25 7

1.9 Characteristics of a Good Measure of Central Tendency


1. It should be rigidly defined
2. It should be easy to understand and calculate
3. It should be based on all observations
4. It should be amenable to further algebraic manipulation.
5. It should not be affected much by extreme values
6. It should be least affected by fluctuations in sampling.
ASSIGNMENT 1
1. The mean of the ten numbers listed below is 5.5.
4, 3, a, 8, 7, 3, 9, 5, 8, 3
(a) Find the value of a.
(b) Find the median of these numbers.

2. In the following ordered data, the mean is 6 and the median is 5.


2, b, 3, a, 6, 9, 10, 12
Find each of the following
(a) the value of a;
(b) the value of b.

3. For the set of {8, 4, 2, 10, 2, 5, 9, 12, 2, 6}


(a) calculate the mean;
(b) find the mode;
(c) find the median.

4. David looked at a passage from a book. He recorded the number of words in each sentence
as shown in the following frequency table.
Class interval (number of words) Frequency
1-5 16
6-10 28
11-15 26
16-20 14
21-25 10
26-30 3
31-35 1
36-40 0
41-45 2

(a) Find the class interval in which the median lies.


(b) Estimate, correct to the nearest whole number, the mean number of words in a sentence.

5. Twenty students are asked how many detentions they received during the previous week at
school. The results are summarized in the frequency distribution table below.

Number of detentions Number of students


x f
0 6
1 3
2 10
3 1
total 20
(a) What is the modal number of detentions received?
6. The weight in kilograms of 12 students in a class are as follows.
63 76 99 65 63 51 52 95 63 71 65 83
(a) State the mode.
(b) Calculate the mean weight;
When one student leaves the class, the mean weight of the remaining 11 students
becomes 70 kg.
(c) Find the weight of the student who left.

7. An atlas gives the following information about the approximate population of some cities in
the year 2000. The population of Nairobi has accidentally been left out.

City Population in Millions

Melbourne 3.2
Bangkok 7.2
Nairobi
Paris 9.6
São Paulo 17.7
Tokyo 28.0
Seattle 2.1

The atlas tells us that the mean population for this group of cities is 10.01 million.
(a) Calculate the population of Nairobi.
(b) Which city has the median population value?
8. The number of hours that a professional footballer trains each day in the month of June is
represented in the following histogram.
(a) Write down the modal number of hours trained each day.
(b) Calculate the mean number of hours he trains each day.

9. The numbers of games played in each set of a tennis tournament were


9, 7, 8, 11, 9, 6, 10, 8, 12, 6, 8, 13, 7, 9, 10, 9, 10, 11, 12, 8, 7, 13, 10, 7, 7.

(a) Organized this data in a frequency table


(b) Write down the value of n.
(c) Calculate the mean number of games played per set.
(d) What percentage of the sets had more than 10 games?
(e) What is the modal number of games?

You might also like