0% found this document useful (0 votes)
17 views8 pages

Statistical Measures 2024 (Part 2) - Word

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views8 pages

Statistical Measures 2024 (Part 2) - Word

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Statistical Measures and Standard Deviation (Part 2)

Dispersion and Skewness


Dispersion = the spread or variability of data.
Skewness = ‘lack of symmetry’ (where a probability distribution has no
symmetry in its shape, i.e., is ‘lop-sided’). The degree of skewness can
be measured by the difference between the mean and the mode.

Measures of Dispersion:

Overall spread of items: The range.

Spread about the mean: measuring the distance between the items and their common
mean.
i) the mean deviation
ii) the standard deviation

We are going to focus solely on the standard deviation for the purposes of this
module.

Central percentage spread of items: these measures have links with the median.
i) the 10 to 90 percentile range
ii) the quartile deviation

The Range:
The difference between the smallest and largest values of items in a set or distribution.

Example:
The daily number of books sold by two separate book stores over twelve days were:
Bookstore 1: 3, 5, 1, 4, 5, 3, 6, 8, 6, 2, 3, 7
Bookstore 2: 2, 3, 2, 1, 4, 3, 2, 2, 1, 3, 4, 1

The range of values for Bookstore 1 is 8-1=7, and for Bookstore 2 is 4-1=3.
Thus, daily sales are more variable for Bookstore 1.

Standard Deviation

1
The standard deviation is a common method of dispersion. The following standard
deviation formula has been adapted from the formulae for a set and can be used for
both simple discrete and grouped distributions.

● The standard deviation is a measure of the average deviation from the mean value.
● It is the most common measure of dispersion. (Remember: dispersion =
spread/variability).
● It is used as a measure for comparison only when the units in the distribution are
the same and the respective means are comparable.

Formula:

Standard Deviation for:

1) A set: σ = √ Σ (x - x)2 / n

2) A frequency distribution:

We are going to focus solely on the standard deviation of a frequency distribution for
the purposes of this module.

Standard deviation example 1:

The data below relates the number of successful sales made by the sales-force in a
particular quarter. Calculate the mean and standard deviation (to one decimal place).

No. of Sales No. of sales-people


0-4 1
5-9 14
10-14 23
15-19 21
20-24 15
25-29 6

Solution:

No. of Sales No. of salespeople Mid-point


(f) (x) (fx) x2 fx2
0-4 1 2 2 4 4
5-9 14 7 98 49 686
10-14 23 12 276 144 3312
15-19 21 17 357 289 6069
20-24 15 22 330 484 7260
25-29 6 27 162 729 4374
Totals 80 1225 21705

2
Mean = 1225 / 80

= 15.3 sales

Standard deviation,

= 6.1 sales

Standard deviation example 2:

Calculate the standard deviation from the following distribution:

Number of orders10-14 15-19 20-24 25-29 30-34 35-39

Number of weeks 3 7 15 20 9 4

This question will be completed in class.

The Coefficient of Variation

● When a comparison of two distributions and their means are made, it is necessary
to do so with regard to their variability.
● While the standard deviation is the important measure of spread, it cannot be used
as the sole basis of comparing two distributions.
● This is because it is an absolute measure of dispersion that measures variation in
the same units as the original data. (Remember that absolute values ignore the
negative signs).
● For example, if we have a standard deviation of 10 and a mean of 5, the values
vary by an amount twice as large as the mean itself. If, on the other hand, we have
a standard deviation of 10 and a mean of 5,000, the variation relative to the mean
is insignificant. Therefore, we cannot know the dispersion of a set of data until we
know how the standard deviation compares with the mean.
● A relative measure of dispersion, which compares the mean to the standard
deviation, is the coefficient of variation, which is found by dividing the standard
deviation by the mean.

3
Algebraically, this is:

Coefficient of Variation = σ /μ where: μ = mean

Example

Given the following data:

A: μ= 120, σ = 55

B: μ= 90, σ = 50

Calculate the Coefficient of Variation.

Solution:
A: Coefficient of Variation = 55 / 120 = 45.8%
B: Coefficient of Variation = 50 / 90 = 55.6%

B has the higher relative variability in weekly wages.

Skewness

● Skewness describes the extent of non-symmetry of a distribution.


● It can be positive (for a distribution which is skewed to the right), negative (when
a distribution is skewed to the left), or zero (for a symmetric distribution).
● If a distribution is skewed, it means that values of the distribution are concentrated
at either the low end or the high end of the measuring scale on the horizontal axis.
For example, the two curves below are skewed distributions:

Curve A Curve B

30 32 34 44 46 48

● Curve A is skewed to the right (or positively skewed) because it tails off
toward the high end of the scale.
● Curve B is skewed to the left (or negatively skewed) because it tails off toward
the low end of the scale.

4
Measuring Skewness

The most straightforward measure of skewness (called Pearson’s skew) is a


coefficient, which gives the difference between the mean and the mode as a
proportion of the standard deviation:

Pearson’s skew (Psk) =


Notes:

● If a distribution has positive skewness (Psk > 0), the mode is smaller than the
mean, and vice versa.
● If the distribution is symmetrical (Psk = 0), then the mean equals the mode,
and the skewness is zero.
● If a distribution has negative skewness (Psk < 0), then the mode is greater than
the mean, and vice versa.
● Dividing by the standard deviation allows distributions with different units to
be compared.

Alternative Measures of Skewness

For moderately skewed distributions, the following relationship holds:

Mean - Mode = 3 x (Mean - Median)

Therefore, the numerator in Pearson’s measure of skewness can be replaced by


3(Mean - Median) i.e.

Pearson’s skew =

This may be useful if only the mean and median are known for a distribution.

Generally, the values of skewness are low (highly skewed distributions may have
values of ± 1, while values up to ± 3 are theoretically possible).

The Empirical Rule

● The standard deviation can be used to convey information about variability in a


collection of data.
● To illustrate this, we look at the case of a normal population - this means that the
data values have a bell-shaped histogram.

5
● It can be shown for such populations that about 68% of the data lie within one
standard deviation of the mean, about 95% within two standard deviations of the
mean, and about 99% within three standard deviations of the mean. This is shown
in the diagram below.

34% 34%

13.5% 13.5%

2% 2%

It is sometimes important to know if a sample came from a normal population - to do


so, it is necessary that the empirical rule should be satisfied.

Example

The following values represent the scores of 40 students in an exam.

46 58 65 70 76 49 59 66 71 78

50 59 66 71 79 53 60 66 72 80

6
54 62 66 73 82 55 63 68 73 83

55 64 68 73 84 57 65 69 74 88

Given that the mean and standard deviation of this data is 67 and 10 respectively, does
this data satisfy the empirical rule?

Solution:
Examining the data, we see that 26 of the numbers lie in the range 57 - 77 i.e. within
one standard deviation of the mean. These 26 numbers represent 26/40 or 65% of the
data, and are very close to the 68%, which lie within one standard deviation for the
empirical rule. Further calculations are shown below:

Within Number of Values Percent Empirical %


1 S.D. (57-77) 26 65 68
2 S.D.'s (47-87) 38 95 95
3 S.D.'s (37-97) 40 100 99

As the percentages for this sample are very close to the empirical rule, it is reasonable
to conclude that this sample is coming from a normal population.

DECILES

Deciles are similar to quartiles in that they are used to divide up a cumulative
distribution. However, in this case, they break the distribution up into tenths (i.e.
deciles, just like decades, meaning blocks of ten years) rather than quarters. Thus,

● The first decile has 10% of values below it and 90% above it
● The second decile has 20% of values below it and 80% above it, and so on.

PERCENTILES

There are ninety-nine points of a distribution that divide it up into one hundred equal
parts. They are normally denoted as P1, P2, … P99.

● Thus, the tenth percentile (P10) has ten percent of the values of the distribution
below it (and ninety percent of the values above it).

Note that the 50th percentile (P50), is the median, and the 25th percentile (P25) and 75th
percentile (P75) are equal to Q1 and Q3 respectively.

OUTLIERS & EXTREME VALUES

The terms outlier and extreme values are often used interchangeably. Both refer to a
data value that is atypical of the data set i.e. values which differ markedly from most
of the numbers in the set.

7
For example, suppose that the number of championship matches played by a team in
the last five years is as follows:

2 1 10 1 1

The average (mean) of these numbers is 3, which is heavily influenced by the outlier
of 10.

Discarding the outlier, we obtain a modified mean of 1.25 which is perhaps more
meaningful for comparisons or for setting a norm.

Alternatively, we could decide that the median is a better measure of average for data
sets with outliers.
Data sets should always be examined for outliers, as the reasons for such values can
vary - it may be due to weather conditions or due to a recording error.

You might also like