0% found this document useful (0 votes)
31 views23 pages

Chapter 3 - CT & Dispersion

Uploaded by

minhajulobia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views23 pages

Chapter 3 - CT & Dispersion

Uploaded by

minhajulobia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

CHAPTER 3

DESCRIBING DATA: NUMERICAL MEASURES


3.1 MEASURES OF LOCATION
According to Professor Bowley, Averages or Measures of central tendency are “Statistical constants
which enable us to comprehend in a single effort the significance of the whole”. They give us the
idea about the concentration of the values in the central part of the distribution. The tendency of
concentrating towards the central value of all observation of a distribution is known as Central
Tendency and the measure of the central value is known as measures of Central Tendency. Various
types of Measures of central tendency are discussed below:

Measures of Central Tendency

Mean Median Mode

Arithmetic Mean Geometric Mean Harmonic Mean

Requisites for an ideal Measure of central tendency:


According to Professor Yule, the following are the characteristics to be satisfied by an ideal measure
of central tendency. The measure should be;
i) Rigidly defined
ii) Readily comprehensible and easy to calculate.
iii) Based on all observations.
iv) Suitable for further mathematical treatment
v) Less affected by sampling fluctuations and
vi) Not affected by extreme values
The purpose of a measure of location is to pinpoint the center of a set of observations.

Measure of location: A single value that summarizes a set of data. It locates the
center of the values.

3.1.1 ARITHMETIC MEAN

The arithmetic mean, or simply mean, is the most widely used measure of location.
2

Arithmetic mean: The sum of observations divided by the total number of


observations.

The mean is calculated as follows:

Sum of all values in the population


Population mean =
Number of values in the population

In terms of symbols, the formula for the arithmetic mean of a population is:

ΣX
Population Mean µ= [3 − 1]
N

Where:
µ is the population mean.
N is the number of items in the population.
X is a particular value.
∑ indicates the operation of adding all the values. It is pronounced “sigma.”

∑X is the sum of the X values. It is pronounced “sigma X.”


[3-1] indicates the formula number from the text.

Any measurable characteristic of a population is called a parameter.

Parameter: A characteristic of a population. In other words, it is the unknown constants of a


population.

The Sample Mean


As explained in Chapter 1, we frequently select a sample from the population to find out something
about a specific characteristic of the population.

The mean of a sample and the mean of a population are computed in the same way, but the
shorthand notation is different.

In terms of symbols, the formula for the mean of a sample is:

ΣX
Sample Mean X= [3 − 2]
n

Chapter 3
Describing Data: Numerical Measures
3
Where:
X is the sample mean; it is read X bar@.
n is the number of values in the sample.
X is a particular value.
∑ indicates the operation of adding all the values.

∑X is the sum of the X values.


[3-2] is the formula number from the text.

The mean of a sample, or any other measure based on sample data, is called a statistic.

Statistic: A characteristic of a sample. Any function of sample observations.

“The mean weight of a sample of laptop computers is 15 pounds,” is an example of a statistic.

Note that in both of the above formulas the mean is calculated by summing the observations and
dividing by the total number of observations.

As an example, the Kellogg Company had quarterly earnings per share of $0.89, $0.77, $1.05, $0.79,
and $0.95. The mean is found by:

ΣX ($0.89 + $0.77 + $1.05 + $0.79 + $0.95)


µ= =
N 5
$4.45
= = $0.89
5

The mean quarterly earning per share is $0.89. In some situations the mean may not be
representative of the data.

As an example, the annual salaries of five executives are $40,000, $42,000, $44,000, $48,000, and
$300,000. The mean is:

ΣX ($40,000 + $42,000 + $44,000 + $48,000 + $300,000)


µ= =
N 5
$474,000
= = $94,800
5

Notice how the one extreme value ($300,000) pulled the mean upward. Four of the five executives
earned less than the mean, raising the question whether the arithmetic mean value of $94,800 is
typical of the salary of the five executives.

Advantages of Arithmetic Mean:


Arithmetic mean is
i) Rigidly defined
ii) Easy to understand and easy to calculate.

Chapter 3
Describing Data: Numerical Measures
4
iii) Based on all observations.
iv) Suitable for further mathematical treatment
v) Least affected by sampling fluctuations

Disadvantages of Arithmetic Mean:


i) It cannot be determined by inspection or cannot be located graphically
ii) It cannot be use while dealing with qualitative characteristics, which cannot be
measured quantitatively. Such as intelligence, honesty, beauty, etc.
iii) It cannot be obtained if a single observation is missing or lost
iv) It cannot be calculated if the extreme class is open
v) It is affected very much by extreme values.

Properties of the Arithmetic Mean


As stated, the mean is a widely used measure of location. It has several important properties.

1. Every set of interval level and ratio level data has a mean.
2. All the data values are used in the calculation.
3. A set of data has only one mean, that is, the mean is unique.
4. The mean is a useful measure for comparing two or more populations.
5. The sum of the deviations of each value from the mean will always be zero, that is:
∑( X − X ) =
0
6. Mean of composite series: If X i , (i = 1, 2,  , k ) are the means of k-component series
of sizes ni, (i = 1, 2, ….., k) respectively, then the mean X of the composite series
obtained on combining the component series is given by the formula:
k

n X + n 2 X 2 +  + n k X k ∑n X i i
X= 1 1 = i =1

n 1 + n 2 +  + n k k

∑n i =1
i

3.1.2 GEOMETRIC MEAN


The geometric mean is used to determine the mean percent increase from one period to another. It
is also used in finding the average of ratios, indexes, and growth rates.

Geometric mean: The n th root of the product of n values.

The formula for finding the geometric mean is:

Geometric Mean GM = n ( X 1 × X 2 × X 3 ×  X n )

Chapter 3
Describing Data: Numerical Measures
5
Where:
X 1, X 2, ( X 3 ) etc. are data values.
n is the number of values.
n is the n th root.

The geometric mean can be used for averaging percents. Suppose the return on investment for
McDermoll International for the past 4 years is 0.4%, 2.9%, 2.1%, and 12.3%. The GM increase over
the period is 4.3 percent, found by:

GM = n ( X 1 )( X 2 )( X 3 ) ( X n )
= 4
1.004 × 1.029 × 1.021 × 1.123
= =
4
1.18455 1.043

Advantages of Geometric Mean:


i) Rigidly defined
ii) Based on all observations.
iii) Suitable for further mathematical treatment
iv) Less affected by sampling fluctuations and
v) Gives comparatively more weight to small items

Disadvantages of Geometric Mean:


i) It is not easy to understand and to calculate for a non-mathematic student
ii) If any observation is zero, Geometric mean becomes zero.
iii) If any observation is negative, GM becomes imaginary.
Uses of GM:
i) To find the rate of population growth and the rate of interest
ii) In the construction of index number

3.1.3 HARMONIC MEAN


Harmonic mean of a number of observations is the reciprocal of the AM of the reciprocals of the
n
given values.
1 1 1 1
+ + +  +
X1 X 2 X 3 Xn

Advantages of Harmonic Mean:


i) Rigidly defined
ii) Based on all observations.
iii) Suitable for further mathematical treatment
iv) Less affected by sampling fluctuations and
v) Gives greater importance to small items and is useful only when small items
have to be given a very high weight.

Chapter 3
Describing Data: Numerical Measures
6
Disadvantages of Harmonic Mean:
i) It is not easy to understand and is difficult to calculate.
ii) If any observation is zero, Harmonic mean becomes imaginary

Uses of HM:

It is used for calculating average speed of automobiles.

Weighted Mean
The weighted mean is a special case of the arithmetic mean. It is often useful when there are several
observations of the same value.

Weighted mean: The value of each observation is multiplied by the number of times it
occurs. The sum of these products is divided by the total number of observations to give
the weighted mean.

In general, the weighted mean of a set of numbers, designated X1, X2, X3, … Xn, with the
corresponding weights w1, w2, w3, …, wn is computed by:

w1 X 1 + w2 X 2 + w3 X 3 + + wn X n
=
Weighted Mean Xw [3 − 3]
w1 + w2 + w3 + + wn

The weighted mean is particularly useful when various classes or groups contribute differently to
the total. For example, the coronary care unit of a hospital consists of nurses= aides who are paid
$12 per hour, nurses = assistants who earn $15 per hour, and registered nurses who earn $21 per
hour.

It would not be accurate to say the average hourly wage for the coronary unit is $16 per hour ($12 +
$15 + $21) / 3 unless there was the same number of people in each group.

Suppose the coronary care unit has ten employees: two aides who earn $12 per hour, 3 nurses=
assistants who earn $15 per hour, and five registered nurses who earn $21 per hour. The weighted
mean is:

w1 X 1 + w2 X 2 + w3 X 3 + + wn X n
Xw =
w1 + w2 + w3 + + wn
(2 × $12) + (3 × $15) + (5 × $21) $24 + $45 + $105 $174
= = = = $17.40
2+3+5 10 10

Thus the weighted mean is $17.40.

Chapter 3
Describing Data: Numerical Measures
7

3.1.4 THE MEDIAN


It was pointed out that the arithmetic mean is often not representative of data with extreme values.
The median is a useful measure when we encounter data with an extreme value.

Median: The midpoint of the values after all observations has been ordered from the
smallest to the largest or from largest to smallest.

Fifty percent of the observations are above the median and 50 percent are below the median. To
determine the median, the values are ordered from low to high, or high to low, and the middle
value selected. Hence, half the observations are above the median and half are below it. For the
executive incomes, the middle value is $44,000, the median.

$40,000$42,000$44,000$48,000$300,000
D
median
Obviously, it is a more representative value in this problem than the mean of $94,800.

Note that there were an odd number of executive incomes (5). For an odd number of ungrouped
values we just order them and select the middle value. To determine the median of an even number
of ungrouped values, the first step is to arrange them from low to high as usual, and then
determine the value half way between the two middle values.

As an example, the final grades of the six students in Mathematics 126 were 87, 62, 91, 58, 99, and
85. Ordering these from low to high:

58 62 85 87 91 99
D D
The median grade is halfway between the two middle values of 85 and 87. The median grade is 86.
Thus we note that the median (86) may not be one of the values in a set of data.

The formula of finding median for grouped data is given below


N
− Cf
Median = L1 + 2 × i
fm
Where
L1 is the lower limit of median class
N total number of observations
Cf Cumulative frequency of the class just preceding the median class
fm Frequency of the median class
i Width of the median class

Advantages of Median:
i) Well defined
ii) Readily comprehensible and easy to calculate.

Chapter 3
Describing Data: Numerical Measures
8
iii) Not affected by extreme values
iv) Can be calculated for a distribution when extreme class is open

Disadvantages of Median:
i) Not based on all observations.
ii) Not suitable for further mathematical treatment
iii) As compared to AM it is affected much by sampling fluctuations
Uses of Median:
i) It is the only average to be used while dealing with qualitative data, which cannot be
measured quantitatively but still can be arranged in ascending or descending order of
magnitude. e.g., to find the average intelligence, or average honesty among a group of
people.
ii) It is to be used to determining the typical values in the problems concerning wages,
distribution of wealth, etc.
Properties of the Median
The major properties of the median are:
1. The median is a unique value, that is, like the mean, there is only one median for a set of
data.
2. It is not influenced by extremely large or small values.
3. It can be computed for ratio level, interval level, and ordinal-level data.
4. Fifty percent of the observations are greater than the median and fifty percent of the
observations are less than the median.

3.1.5 THE MODE


A third measure of central tendency is the mode.

Mode: The value of the observation that appears most frequently.

The mode is the value that occurs most often in a set of raw data. The dividends per share declared
on five stocks were: $3, $2, $4, $5, and $4. Since $4 occurred twice, which was the most frequent,
the mode is $4.
Below is the formula for calculating the mode from grouped data
∆1
Mode = L1 + × i
∆1 + ∆ 2
Where
L1 is the lower limit of modal class
∆1 The difference between the frequency of the modal class and the frequency of the class just
preceding the modal class
∆2 The difference between the frequency of the modal class and the frequency of the class just
succeeding the modal class
i Width of the modal class

Chapter 3
Describing Data: Numerical Measures
9
Advantages of Mode:
i) Rigidly defined
ii) Readily comprehensible and easy to calculate.
iii) Not affected by extreme values

Disadvantages of Mode:
i) Not based on all observations.
ii) Ill-defined
iii) Not suitable for further mathematical treatment
iv) As compared to AM it is affected much by sampling fluctuations

Properties of the Mode


i) The mode can be found for all levels of data (nominal, ordinal, interval, and ratio).
ii) The mode is not affected by extremely high or low values.
iii) A set of data can have more than one mode. If it has two modes, it is said to be bimodal.
iv) A disadvantage is that a set of data may not have a mode because no value appears more than once.

Other Location Measures:


Median and mode are widely used two location measures. In addition to median and mode there
are other location measures also. They are
i) Quartiles, ii) Deciles, iii) Percentiles

Chapter 3
Describing Data: Numerical Measures
10

3.2 MEASURES OF DISPERSION


Why Study Dispersion?
A direct comparison of two sets of data based only on two measures of location such as the mean
and the median can be misleading since an average does not tell us anything about the spread of
the data.

For example, the mean salary paid to baseball players for the New York Yankees is $4,342,365.
However, the range is $14,390,000, with a low of $210,000 and a high of $14,600,000. The Tampa
Devil Rays have a mean salary of $1,227,857. The range is $8,550,000, with a low of $200,000 and a
high of $8,750,000.

As another example, suppose a statistics instructor had two classes, one in the morning and one in
the evening; each with six students. In the morning class (AM) the students’ ages are 18, 20, 21, 21,
23, and 23 years. In the evening class (PM) the ages are 17, 17, 18, 20, 25, and 29 years. Note that for
both classes the mean age is 21 years but there is more variation or dispersion in the ages of the
evening students.

A small value for a measure of dispersion indicates that the data are clustered closely, say, around
the arithmetic mean. Thus the mean is considered representative of the data, that is, it is reliable.
Conversely, a large measure of dispersion indicates that the mean is not reliable and is not
representative of the data.

There are several measures of dispersion. We will consider six: the range, the mean deviation, the
variance, the standard deviation, the interquartile range, and quartile deviation.

3.2.1 RANGE
Perhaps the simplest measure of dispersion is the range.

Range: The difference between the highest and lowest value in a set of data.

The formula for range is:


Range = Highest value – Lowest value [3 – 4]
For example, suppose a statistics instructor had two classes with the ages indicated:
A.M. Class: 18, 20, 21, 21, 23, 23 P.M. Class: 17, 17, 18, 20, 25, 29
The range for the classes is:
A.M. Class: (23 − 18) = 5 P.M. Class: (29 − 17) = 12
Thus, we can say that there is more spread in the ages of the students enrolled in the evening (P.M.)
class compared with the morning (A.M.) class.

The characteristics of the range are:


i) Only two values are used in the calculation.
ii) It is influenced by extreme values.
iii) It is easy to compute and understand.

Chapter 3
Describing Data: Numerical Measures
11
iv) It can be distorted by an extreme value.
The range has two disadvantages. It can be distorted by a single extreme value. Suppose the same
statistics instructor has a third class of five students. The ages of these students are given below.

Ages of Students
20, 20, 21, 22, 60

The range of ages is 40 years, yet four of the five students’ ages are within two years of each other.
The 60-year-old student has distorted the spread. Another disadvantage is that only two values, the
largest and the smallest, are used in its calculation.

3.2.2 MEAN DEVIATION


In contrast to the range, the mean deviation considers all the data.

Mean Deviation: The arithmetic mean of the absolute values of the deviations from the arithmetic
mean.
In terms of symbols, the formula for the mean deviation is:

Mean Deviation MD =
∑X−X [3-5]
n
Where:
X is the value of each observation.
X is the arithmetic mean.
n is the number of observations in the sample.

indicates the absolute value.


We disregard the signs of the deviations from the mean because if we didn’t, the positive and
negative deviations from the mean exactly offset each other, and the mean deviation would always
be zero. Such a measure (zero) would be a useless statistic.

The mean deviation is computed by first determining Absolute


the difference between each observation and the mean. X−X Deviation
These differences are then averaged without regard to
17 − 21 = −4 = 4
their signs. For the PM statistics class the mean
17 − 21 = −4 = 4
deviation is 4.0 years, found by the table on the right:
18 − 21 = −3 = 3
Then 20 − 21 = −1 = 1
25 − 21 = 4 = 4
Σ| X − X | 24 29 − 21 = 8 = 8
X= = =4
n 6 = 24

The parallel lines indicate absolute value. To interpret, 4.0 years is the mean amount by which
the ages differ from the arithmetic mean age of 21.0 years for the PM students.

Chapter 3
Describing Data: Numerical Measures
12
3.2.3 VARIANCE AND STANDARD DEVIATION
The disadvantage of the mean deviation is that the absolute values are difficult to manipulate
mathematically. Squaring the differences from each value and the mean eliminates the problem of
absolute values. These squared differences are used both in the computation of the variance and the
standard deviation.

Variance: The arithmetic mean of the squared deviations from the mean.

Note that the variance is non-negative and is zero only if all observations are the same.

Standard Deviation: The square root of the variance

Squaring units of measurement, such as dollars or years, makes the variance cumbersome to use
since it yields units like “dollars squared” or “years squared.” However, by calculating the
standard deviation, which is the positive square root of the variance, we can return to the original
units, such as years or dollars. Because the standard deviation is easier to interpret, it is more
widely used than the mean deviation or the variance.

Population Variance
The formula for the population variance and the sample variance are slightly different. The formula
for the population variance is:

Population Variance σ 2 =
∑ (X − µ) 2

[3 – 6]
N

Where:
σ2 is the symbol for the population variance.
X is a value of an observation in the population.
µ is the arithmetic mean of the population.
N is the total number of observations in the population.

The major characteristics of the variance are:

i) All the observations are used in the calculations.


ii) It is not influenced by extreme observations.
iii) The units are somewhat difficult to work with. (They are the original units squared.)
Population Standard Deviation
The standard deviation is the square root of the variance. The formula for the standard deviation of
a population is:

Population Standard deviation σ=


∑ (X − µ) 2

[3-7]
N

Chapter 3
Describing Data: Numerical Measures
13
Sample Variance
The conversion of the population variance formula to the sample variance formula is not as direct
as the change made when we went from the population mean formula to the sample mean formula.
Recall that we replaced µ with X and N with n.
The conversion from population variance to sample variance requires a change in the denominator.
Instead of substituting n, the number in the sample, for N, the number in the population, we
replace N with (n – 1). Thus the formula for the sample variance is:

s 2
=
∑(X − X ) 2

Sample Variance n −1 [3 – 8]
Where:
s2 is the symbol for the sample variance. It is pronounced as “s squared.”
X is the value of each observation in the sample.
X is the mean of the sample.
n is the total number of observations in the sample.

Changing the denominator to (n – 1) seems insignificant, however the use of n tends to


underestimate the population variance. The use of (n –1) in the denominator provides an
appropriate correction factor.

Interpretation and Uses of the Standard Deviation


Recall that the standard deviation is used to measure the spread of the data. A small standard
deviation indicates that the data is clustered close to the mean, thus the mean is representative of
the data. A large standard deviation indicates that the data are spread out from the mean and the
mean is not representative of the data.

Variance of the Combined Series

If σ i2 , (i = 1, 2,  , k ) are the variances of k-component series of sizes ni, (i = 1, 2, ….., k)


respectively, then the variance σ 2 of the composite series obtained on combining the component
n1 (σ 12 + d12 ) + n2 (σ 22 + d 22 ) +  + nk (σ k2 + d k2 )
series is given by the formula: σ =
2

n1 + n2 +  + nk
Relative Dispersion

Suppose we want to compare the variability of two sets of data that are measured in different units
such as one in dollars and the other in years. How can this be done? Relative dispersion is the
answer. Below are the four relative measures of dispersion:
Coefficient of Range = L − S
L+S
Coefficient of Quartile deviation = Q3 − Q1
Q3 + Q1
Coefficient of Mean deviation = MD , and Coefficient of Standard deviation = σ
X X

Chapter 3
Describing Data: Numerical Measures
14
The coefficient of variation is another relative measure of dispersion.

Coefficient of variation: The ratio of the standard deviation to the arithmetic mean, expressed as a percent.

The formula for coefficient of variation for a sample is:


σ
Coefficient of Variation CV = × 100 [3 – 9]
X
It is a measure of relative dispersion. To compute the coefficient of variation the standard deviation
is divided by the mean and the result is multiplied by 100. This measure reports the standard
deviation as a percent of the mean.

If, for example, in a study of executives the coefficient of variation for incomes is 29 percent and for their
ages it is 12 percent, we would conclude that there is more relative dispersion in the incomes of the
executives than in their ages.
Characteristics of the coefficient of variation are:
° It reports the variation relative to the mean.
° It is useful for comparing distributions with different units.

CHAPTER PROBLEMS
Problem 1
A comparison shopper employed by a large grocery chain recorded Supermarket Price X
these prices for a 340-gram jar of Kraft blackberry preserves at a sample 1 $1.31
of six supermarkets selected at random. 2 1.35
3 1.26
a. Compute the arithmetic mean.
4 1.42
5 1.31
b. Compute the median.
6 1.33
c. Compute the mode. Total $7.98

Solution:

a. Determine the mean price of this raw data by summing the prices for the six jars and dividing
the total by six. Using the formula for the mean of a sample we get.

ΣX $7.98
X= = = $1.33
n 6
b. As noted above the median is defined as the middle value of a set of data, after the data is
arranged from smallest to largest. The prices for the six jars of blackberry preserves have
been ordered from a low of $1.26 up to $1.42. Because this is an even number of prices the
median price is halfway between the third and the fourth price. The median is $1.32.

Chapter 3
Describing Data: Numerical Measures
15

Prices Arranged from Low to High:

$1.26 $1.31 $1.31 $1.33 $1.35 $1.42


D D
$1.31 + $1.33
Median = = $1.32
2

Suppose there are an odd number of blackberry preserve prices, such as shown in the table.

$1.31 $1.31 $1.33 $1.35 $1.42


The median is the middle value ($1.33). To find the median, the values must first be ordered
from low to high.

c. The mode is the price that occurs most often. The price of $1.31 occurs twice in the original data
and is the mode.

Problem 2
A sample of the amounts spent in November for propane gas to heat homes of similar sizes in Duluth
revealed these amounts (to the nearest dollar):
191 212 176 129 106 92 108 109 103 121 175 194
What is the range? Interpret your results.

Solution:

Recall that the range is the difference between the largest value and the smallest value.

Range = Highest Value – Lowest Value = $212 - $92 = $120

This indicates that there is a difference of $120 between the largest and the smallest heating cost.

Problem 3
Using the heating cost data in Problem 2, compute the mean deviation.

Solution:

The mean deviation is the mean of the absolute deviations from the arithmetic mean. For raw, or
ungrouped data, it is computed by first determining the mean. Next, the difference between each
value and the arithmetic mean is determined. Finally, these differences are totaled and the total
divided by the number of observations.

The table below shows the data values, each data value minus the mean, and the absolute value
of the deviations from the mean. In other words, the signs of the deviations from the mean are
disregarded.

Chapter 3
Describing Data: Numerical Measures
16
Payment | X − X| Absolute
X Deviations
$191 |$+48 | = $48 ΣX $1,716
X= = = $143.00
212 | +69 | = 69 n 12
176 | +33 | = 33
129 | −14 | = 14
106 | –37 | = 37
92 | –51 | = 51 ΣX−X $466
108 | –35 | = 35 MD = = = $38.83
109 | –34 | = 34 n 12
103 | –40 | = 40
121 | –22 | = 22
175 | +32 | = 32
194 | +51 | = 51
$1,716 $466

The mean deviation of $38.83 indicates that the typical electric bill deviates $38.83 from the mean of
$143.00.

Problem 4
The hourly wages for a sample of plumbers were grouped into the Hourly Number
following frequency distribution. Since the wages have been Wages f
grouped into classes, we refer to the following distribution as being $8 up to $10 3
grouped data. $10 up to $12 6
$12 up to $14 12
a. Compute the arithmetic mean.
$14 up to $16 10
b. Compute the mode. $16 up to $18 7
$18 up to $20 2
40

Solution:

a. The arithmetic mean of this sample data, grouped into a frequency distribution, is computed by
formula.

X =
∑ fX
n

Chapter 3
Describing Data: Numerical Measures
17
Where:
X is the designation for the arithmetic mean.
M is the mid-value, or midpoint, of each class.
f is the frequency in each class.
fX is the frequency in each class times the midpoint of the class.
∑fM is the sum of these products.
n is the total number of frequencies.

It is assumed that the observations in each class are represented by the midpoint of the class. The
midpoint of the first class is $9.00, found by ($8.00 + $10.00)/2. For the next higher class, the
midpoint is $11.00.
Using formula for the arithmetic mean hourly wage is $13.90, found by
Wage Frequency Class
fX
Rate f Midpoint X

$8 up to $10 3 $9.00 $27.00


$10 up to $12 6 11.00 66.00
$12 up to $14 12 13.00 156.00
$14 up to $16 10 15.00 150.00
$16 up to $18 7 17.00 119.00
$18 up to $20 2 19.00 38.00
Total 40 $556.00
∑ fM $556.00
=X = = $13.90
n 40

b. The mode is the value that occurs most often. So, we can say that mode of this distribution
lies in the class $12 up to $14. For data grouped into a frequency distribution mode is
∆1 6
Mode = L1 + × i = 12 + × 2 = 13.5
∆1 + ∆ 2 6+2
Problem 5
Determine the mean and SD of sales of 100 First Food Restaurants in the Eastern Districts (in ’
000$)
Sales Number of Restaurants
700 - 800 4 Solution:
800 - 900
900 - 1000
7
8 

 fX = 125000, X = 1250


 f(X − X) = 6680000, 
2
1000 - 1100 10
1100 - 1200 12  2 
 σ = 66800, σ = 258.5 
1200 - 1300 17  
1300 - 1400 13
1400 - 1500 10
1500 - 1600 9
1600 - 1700 7
1700 - 1800 2
1800 – 1900 1

Chapter 3
Describing Data: Numerical Measures
18
Exercises: The Measures of Central Tendency
1. The annual exports of 50 medium-sized manufacturers were organized into a frequency
distribution. (Exports are in $ millions).
Exports Frequency
$6 up to $9 2
9 up to 12 8
12 up to 15 20
15 up to 18 14
18 up to 21 6

Compute the: i) mean ii) mode


2. 20% of the workers in a firm employing a total of 2000 earn less than Tk. 20 per hour, 440 earn
from Tk. 20 to Tk. 24 per hour, 24% earn from Tk. 25 to Tk. 29 per hour, 370 earn from Tk. 30 to
Tk. 34 per hour, 12% earn from Tk. 35 to Tk. 39 per hour, and the rest earn Tk. 40 or more per
hour. Set up a frequency table and calculate the modal wage.

3. Following is the distribution of marks (out of 50) obtained by 70 students in Statistics.


Marks more than : 0 5 10 15 20 25 30 35 40 45
No. of students : 70 65 60 55 42 34 24 14 7 2
Calculate mean, median and mode of the distribution of marks.

Finding missing values:


4. The following table shows the distribution of a number of families according to their
expenditure per week. Number of families corresponding to expenditure groups Rs. (10-20) and
Rs. (30-40) are missing from the table. The median and mode are given to be Rs. 25 and Rs. 24.
Calculate the missing frequencies and then arithmetic mean of the distribution.
Expenditure : 0-10 10-20 20-30 30-40 40-50
No. of families : 14 ? 27 ? 15

5. The arithmetic mean of the following series is 30.5. Find the missing figure.
Values : 10 20 ? 40 50
Frequency : 8 10 20 15 7
Correcting incorrect values:
6. The mean and median of 100 items are 50 and 52 respectively. The value of the largest item is
100. It was later found that it is actually 110. Find the correct mean and median.

7. The mean of 20 observations is 50.1. By mistake one observation is taken 70 instead of -70. Find
the correct mean.

Combined Mean:
8. The mean marks obtained in an examination by a group of 100 students were found to be 49.96.
The mean marks obtained in the same examination by another group of 200 students were
52.32. Find the mean of marks obtained by both groups of students taken together. [Ans. 51.53]

Chapter 3
Describing Data: Numerical Measures
19
9. The mean marks got by 300 students in the subject statistics is 45. The mean of the top 100 of
them was found to be70 and the mean of last 100 was known to be 20. What is the mean of the
remaining 100 students?

10. The mean weekly salary paid to all employees in a company is Tk. 500. The mean weekly salary
paid to male and female employee is Tk. 520 and 420 respectively. Determine the percentage of
males and females employed by the company.

Exercises: The Measures of Dispersion


1. Calculate coefficient of variation (CV) from the following data:
Profits (in Crores Tk.) No. of Companies
Less than 10 8
20 20
30 40
40 70
50 90
60 100

2. 10 observations have mean 20 and SD 4 respectively. If each of these observations doubled then
what will be the mean and SD of new observations. [Ans. 40, 8]

3. Run scored by two cricketers in 10 ODI matches are as follows


Cricketer – A: 90, 27, 08, 80, 13, 105, 06, 60, 45, 00
Cricketer – B: 25, 50, 65, 43, 75, 56, 16, 67, 49, 37
Find which cricketer may be considered to be a more consistent player.
4. The sum of squares corresponding to length X (in cm.) and weight Y (in gm.) of 50 tapioca tubes
are given below:
∑ X = 212, ∑ X 2
= 902.8, ∑ Y = 261, and ∑Y 2
= 1457.6
Which is more varying, the length or weight?

Correcting incorrect values:


5. The mean and the SD of a sample of size 10 were found to be 9.5 and 2.5 respectively. Later on,
an additional observation became available. This was 15.0 and was included in the original
sample. Find the mean and SD of the 11 observations. [Ans. Mean = 10, SD = 2.8]

6. For a group of 200 candidates, the mean and SD were found to be 40 and 15 respectively. Later
on it was discovered that the scores 43 and 35 were misread as 34 and 53 respectively. Find the
corrected mean and SD corresponding to the figures.

Combined Variance:

7. For a group containing 100 observations, the AM and SD are 8 and 10.5 respectively. For 50
observations, selected from those 100 observations the mean and SD are 10 and 2 respectively.
Find the AM and SD of the other 50 observations.

Chapter 3
Describing Data: Numerical Measures
20
8. In two factories A and B engaged in the same industry, the average weekly wages and SD’s are
as follows:
Factory Ave. weekly SD of wage No. of wage
wage earners
A 460 50 100
B 490 40 80
a) Which factory A and B pays large amount as weekly wages?
b) Which factory shows greater variability in the distribution of wages?
c) What is the mean and SD of all workers in two factories taken together?

9. The number of employees, average wage per employee and the variance of the distribution of
wages per employee for two factories are given below:
Factory - A Factory – B
No. of employees 50 100
Average wage 120 85
Variance of wage 9 16
a) In which factory is there greater variability in the distribution of wages per employee?
b) Suppose in factory B, the wages of an employee were wrongly noted as 120 instead of
100. What would be the correct mean and variance for factory B?

10. FundInfo provides information to its subscribers to enable them to evaluate the performance of
mutual funds they are considering as potential investment vehicles. A recent survey of Funds
whose started investment goals was growth and income produced the following data on total
annual rate of return over the five years:
Annual rate 11 - 12 12 - 13 13 - 14 14 - 15 15 - 16 16 - 17 17 - 18 18 - 19
of return
Frequency 2 2 8 10 11 8 3 1
Calculate the mean, Variance and SD of the annual rate of return for this sample of 45 funds.

Chapter 3
Describing Data: Numerical Measures
21

MOMENTS, SKEW NESS & KURTOSIS

1 1
r-th raw moment = µ′ = ∑ (X − A) , r-th central moment = µ r = ∑ (X − X ) r ,
r

r N N
Sheppard’s correction for moments:
h2
µ 2 (corrected) = µ 2 −
12
µ3 (corrected) = µ3
h 2 µ2 7h 2
µ 4 (corrected) = µ 4 − +
2 240
Relationship between raw and central moments:

Pearson’s βand γ Co-efficients:


µ2
β1 = 3 , γ1 = + β1
µ32
µ
β2 = 4 , γ 2 = β2 − 3
µ 22

Measures of Skewness
Skewness: Another characteristic of a set of data is the shape of the distribution. Skewness means
“Lack of Symmetry”. There are four shapes commonly observed: symmetric, positively skewed,
negatively skewed, and bimodal. The measures of location and the measures of dispersion are
both descriptive characteristics of a set of data. A third characteristic of a distribution is its
skewness. As noted before, a symmetric distribution has the same shape on either side of the
median and it has no skewness. For a positively skewed distribution the long tail is to the right, the
mean is larger than the median or the mode, and the mode appears at the highest point on the
curve. For a negatively skewed distribution the mode is the largest value and is at the highest point
of the curve, while the mean is the smallest. A bimodal distribution will have two or more peaks.
The coefficient of skewness is used to describe how a distribution is skewed. Different kinds of
skewness are shown in below: Positive Skewness
20
16
Number

12
8
4
0
1 6 11 16 21 26 31 36 41 46 51
Ye ars of Se rvice

Chapter 3
Describing Data: Numerical Measures
22

Symme tric Distribution Negative Skewness

20

15
Number

Number
10

0
1 6 11 16 21 26 31 36 41 46 51 1 5 9 13 17 21 25 29 33 37 41 45 49 53
Hours of Use ful Life Years of Service

Different measures of Coefficient of skewness:


i) Sk = M - Md,
ii) Sk = M – Mo and
iii) Sk = (Q3 - Md) – (Md – Q1)
Relative measures of skewness:
M − Mo 3(M − M d )
1. Prof. Karl Pearson’s Co-efficient of skewness: SK = or SK =
σ σ
Q3 + Q1 − 2M d
2. Prof. Bowley’s Co-efficient of skewness: Sk =
Q3 − Q1
β1 (β2 + 3)
3. Based Upon Moments: Sk =
2(5β2 − 6β1 − 9)

Measure of Kurtosis

Kurtosis is the Convexity of a curve of a distribution.

The coefficient of kurtosis can be measured by using β2 coefficients or by γ 2 coefficient.


µ4
The coefficient of kurtosis β 2 =
µ 22
Types of kurtosis:
i) Leptokurtic: If β 2 > 3 or γ 2 > 0
ii) Mesokurtic: If β 2 = 3 or γ 2 = 0
iii) Platykurtic: If β 2 < 3 or γ 2 < 0

Chapter 3
Describing Data: Numerical Measures
23

Platokurtic Mesokurtic Leptokurtic

Exercise:

1. The first four raw moments of a distribution about the origin of the variable are 2.5, 21, 166
and 1132. Calculate all central moments, SD, β1 and β2. (Ans: 0, 14.75, 39.75, 142.3125, 3.84,
0.4926, .6543)
2. The first four moments of a distribution about the value 4 of the variable are -1.5, 17, -30 and
108. Find the moments about mean, β1 and β2. Also find the moments about the origin. (Ans:
0, 14.75, 39.75, 142.3125, 0.4926, 0.6543,2.5, 21, 166, 1132)
3. The SD of a distribution is 3. What must be the value of fourth moment about the mean in
order that the distribution be mesokurtic? What must be the value of third moment about
the mean When β1 of the distribution is 1.5?
4. For a distribution, the mean is 10, variance is 16, γ1 is +1, and β2 is 4. Obtain first four
moments about the origin. Comment upon the nature of the distribution.
[Ans. 10, 116, 1544, 23184]
5. Calculate the first four moments of the following distribution about the mean and hence
find β1 and β2.
X: 52 - 54 54 - 56 56 - 58 58 - 60 60 - 62 62 - 64 64 - 66
f: 2 5 12 18 39 15 9

Chapter 3
Describing Data: Numerical Measures

You might also like