0% found this document useful (0 votes)
42 views20 pages

Measures of Spread and Dispersion

This document discusses various measures of spread and dispersion in data, including standard deviation, variance, interquartile range (IQR), and measures of skew. It defines standard deviation and variance, and explains how to calculate standard deviation using a formula and example data set. The document also defines IQR, explaining how to calculate the first and third quartiles. Finally, it discusses skew and the normal distribution, defining Pearson's measures of skew and how they indicate the shape and symmetry of a distribution.

Uploaded by

siyeni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views20 pages

Measures of Spread and Dispersion

This document discusses various measures of spread and dispersion in data, including standard deviation, variance, interquartile range (IQR), and measures of skew. It defines standard deviation and variance, and explains how to calculate standard deviation using a formula and example data set. The document also defines IQR, explaining how to calculate the first and third quartiles. Finally, it discusses skew and the normal distribution, defining Pearson's measures of skew and how they indicate the shape and symmetry of a distribution.

Uploaded by

siyeni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Measures of

Spread / Dispersion
Measures of Spread and Dispersion
• This is an continuation of our venture into
summary statistics.
• In this video we will cover:
• Variance and Standard Deviation.
• IQR
• Measures of Skew and the Concept of the
Normal Distribution.
Standard Deviation
• The most commonly used measure of spread or
dispersion is the Standard Deviation.
• There’s a formula for this:
;
Σ 𝑥0 − 𝑚𝑒𝑎𝑛454678905:
𝑠𝑑(𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛) =
𝑁
• The main operative aspect of this formula
;
is the
summation of 𝑥0 − 𝑚𝑒𝑎𝑛454678905:
• The first step is therefore to compute the mean
of the population.
Standard Deviation
;
• Lets compute Σ 𝑥0 − 𝑚𝑒𝑎𝑛454678905:
on an example.
• Here is a list of student assessment
scores in my (small) class.
𝑠𝑐𝑜𝑟𝑒𝑠 = [90, 75, 60, 80,95,50,75]
• The mean is:
90 + 75 + 60 + 80 + 95 + 50 + 75
= 75
7
• Now need to compute the rest…
Standard Deviation Score Diff Squared
; 90 15 225
• Lets compute Σ 𝑥0 − 𝑚𝑒𝑎𝑛454678905: 75 0 0
on an example. 60 -15 225

• Here is a list of student assessment 80 5 25


95 20 400
scores in my (small) class.
50 -25 625
𝑠𝑐𝑜𝑟𝑒𝑠 = [90, 75, 60, 80,95,50,75]
75 0 0
• The mean is:
90 + 75 + 60 + 80 + 95 + 50 + 75
= 75
7
• Now need to compute the rest…
Standard Deviation Score Diff Squared
; 90 15 225
• Lets compute Σ 𝑥0 − 𝑚𝑒𝑎𝑛454678905: on an
example. 75 0 0
• Here is a list of student assessment scores in 60 -15 225
my (small) class. 80 5 25
𝑠𝑐𝑜𝑟𝑒𝑠 = [90, 75, 60, 80,95,50,75]
95 20 400
• The mean is: 50 -25 625
90 + 75 + 60 + 80 + 95 + 50 + 75
= 75 75 0 0
7
• Now need to compute ;the rest…
Σ 𝑥0 − 𝑚𝑒𝑎𝑛454678905:
= 225 + 0 + 225 + 25 + 400 + 625 + 0 = 1500
Standard Deviation
• Finally plug into my formula.
;
Have
Σ 𝑥0 − 𝑚𝑒𝑎𝑛454678905: = 1500
• Also know 𝑁 = 𝑙𝑒𝑛𝑔𝑡ℎ 𝑠𝑐𝑜𝑟𝑒𝑠 = 7

𝑠𝑑 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛
;
Σ 𝑥0 − 𝑚𝑒𝑎𝑛454678905:
=
𝑁
1500
= = 14.639
7
Standard Deviation of a Sample
• In some cases we are computing the standard
deviation of a sample.
• The formula is only slightly different:
;
Σ 𝑥0 − 𝑚𝑒𝑎𝑛P8Q47R
𝑠𝑑(𝑠𝑎𝑚𝑝𝑙𝑒) =
𝑁−1
Sample SD versus Population SD
• Which do I use? Need to know the difference between
samples and populations.
• Samples are subsets of populations.
• If for example, I am interested in all the people in my class, and I
have that data for all 30 of them, then we apply the population SD.
• By contrast, if I only had a subset, say 10/30 people in my class,
than that’s a sample.
• Given we are normally looking to generalise to a wider population
(e.g. we study 100 people and see if this generalises to
populations of millions), sample s.d.’s are more commonly used.
• Notation: In practice, 𝜇 is population mean (and 𝜎 its
standard deviation), whilst for samples, we use 𝑥̅ for the
sample mean (and 𝑠 to represent its standard deviation).
This is important when you read formulas.
Variance
• Sometimes you’ll read about variance being used instead
of the standard deviation.
• This is simply the square of the standard deviation.
;
𝑉𝑎𝑟 𝑥 = 𝑠𝑑 𝑥
• As with Standard Deviation, it is possible to have both a
population and sample variance.
• The key thing is that the variance is the square of the standard
deviation!
IQR and some other measures of spread
• Like the arithmetic mean, the standard deviation can be
distorted by outliers.
• Let’s start with the Range. If I have a set of numbers, then
the Range is the largest number minus the smallest:
𝑟𝑎𝑛𝑔𝑒 𝑥 = max 𝑥 − min 𝑥
IQR and some other measures of spread
• Like the arithmetic mean, the standard deviation can be
distorted by outliers.
• Let’s start with the Range. If I have a set of numbers, then
the Range is the largest number minus the smallest:
𝑟𝑎𝑛𝑔𝑒 𝑥 = max 𝑥 − min 𝑥
• An example. Suppose I wanted to find the range of the
following numbers:
𝑠𝑐𝑜𝑟𝑒𝑠 = [90, 75, 60, 80,95,50,75]
IQR and some other measures of spread
• Like the arithmetic mean, the standard deviation can be
distorted by outliers.
• Let’s start with the Range. If I have a set of numbers, then the
Range is the largest number minus the smallest:
𝑟𝑎𝑛𝑔𝑒 𝑥 = max 𝑥 − min 𝑥
• An example. Suppose I wanted to find the range of the
following numbers:
𝑠𝑐𝑜𝑟𝑒𝑠 = [90, 75, 60, 80,95,50,75]
• I would sort those numbers from lowest to highest:
𝑠𝑜𝑟𝑡(𝑠𝑐𝑜𝑟𝑒𝑠) = [𝟓𝟎, 60,75,75,80,90, 𝟗𝟓]
𝑟𝑎𝑛𝑔𝑒 𝑥 = max 𝑥 − min 𝑥 = 95 − 50 = 45
IQR and some other measures of spread
• Now for the IQR, or interquartile range. This does not use the
min and max, but the top and bottom quartile.
• More formally, the IQR is the difference between the third
(upper) and first (lower) quartiles. Given a even 𝑁 = 2𝑛 or
odd 𝑁 = 2𝑛 + 1 number of values:
• The first quartile 𝑄` is the median of the 𝑛 smallest values
• The third quartile 𝑄a is the median of the 𝑛 largest values
• (The second quartile 𝑄; is the median of all 𝑁 values)
• Lets go back to our sorted array of scores…
𝑠𝑜𝑟𝑡(𝑠𝑐𝑜𝑟𝑒𝑠) = [50,60,75,75,80,90,95]
IQR and some other measures of spread
• Now for the IQR, or interquartile range. This does not use the
min and max, but the top and bottom quartile.
• More formally, the IQR is the difference between the third
(upper) and first (lower) quartiles. Given a even 𝑁 = 2𝑛 or
odd 𝑁 = 2𝑛 + 1 number of values:
• The first quartile 𝑄` is the median of the 𝑛 smallest values
• The third quartile 𝑄a is the median of the 𝑛 largest values
• (The third quartile 𝑄; is the median of all 𝑁 values)
• Lets go back to our sorted array of scores…
𝑠𝑜𝑟𝑡(𝑠𝑐𝑜𝑟𝑒𝑠) = [50,60,75,75,80,90,95]
Lowest 𝑛 Highest 𝑛
numbers numbers
𝑄;
IQR and some other measures of spread
• Now for the IQR, or interquartile range. This does not use the
min and max, but the top and bottom quartile.
• More formally, the IQR is the difference between the third
(upper) and first (lower) quartiles. Given a even 𝑁 = 2𝑛 or
odd 𝑁 = 2𝑛 + 1 number of values:
• The first quartile 𝑄` is the median of the 𝑛 smallest values
• The third quartile 𝑄a is the median of the 𝑛 largest values
• (The third quartile 𝑄; is the median of all 𝑁 values)
• Lets go back to our sorted array of scores… 𝑁 = 𝑙𝑒𝑛𝑔𝑡ℎ 𝑠𝑐𝑜𝑟𝑒𝑠 = 7
𝑠𝑜𝑟𝑡(𝑠𝑐𝑜𝑟𝑒𝑠) = [50,60,75,75,80,90,95] ⟹ 2𝑛 + 1 = 7
⟹𝑛=3
Lowest 𝑛 Highest 𝑛
numbers numbers
𝑄;
IQR and some other measures of spread
• Now for the IQR, or interquartile range. This does not use the
min and max, but the top and bottom quartile.
• More formally, the IQR is the difference between the third
(upper) and first (lower) quartiles. Given a even 𝑁 = 2𝑛 or
odd 𝑁 = 2𝑛 + 1 number of values:
• The first quartile 𝑄` is the median of the 𝑛 smallest values 𝑁 = 𝑙𝑒𝑛𝑔𝑡ℎ 𝑠𝑐𝑜𝑟𝑒𝑠 = 7
• The third quartile 𝑄a is the median of the 𝑛 largest values ⟹ 2𝑛 + 1 = 7
⟹𝑛=3
• (The third quartile 𝑄; is the median of all 𝑁 values)
• Lets go back to our sorted array of scores… 𝐼𝑄𝑅 = 𝑄a − 𝑄` = 90 − 60 = 30
𝑠𝑜𝑟𝑡(𝑠𝑐𝑜𝑟𝑒𝑠) = [50, 𝟔𝟎, 75,75,80, 𝟗𝟎, 95]

𝑄` 𝑄; 𝑄a
Skew and Normality
• Skew is about the shape of a distribution of numbers.
• Can plot something called a histogram of data, which illustrates how data is
distributed:
Skew and Normality
• Can describe the level of skew using descriptive statistics.
• One example is Pearson’s first skewness co-efficient:
𝑚𝑒𝑎𝑛 − 𝑚𝑜𝑑𝑒
𝑠𝑘` =
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
• Another example is Pearson’s second skewness co-efficient:
3(𝑚𝑒𝑎𝑛 − 𝑚𝑒𝑑𝑖𝑎𝑛)
𝑠𝑘; =
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
• The first one is more usually used, unless you do not know
the mode (or it is not stable due to a small sample size).
• A value of zero is no skew:
• Higher magnitude (i.e. absolute value) means increased skew.
• The sign indicates the direction of the skew.
Skew and Normality
• Here are three examples with different degrees of Skew:

You might also like