Numerical Descriptive
Measures
Measures of Central Tendency
Mean, Median, Mode, Geometric Mean
Quartiles
Measures of Variation
Range, Interquartile Range, Variance and Standard Deviation,
Coefficient of Variation
Shape
Symmetric, Skewed
Using Box-and-Whisker Plots
Coefficient of Correlation
Pitfalls in Numerical Descriptive Measures and Ethical Issues
Summary Measures
Summary Measures
Central Tendency
Mean
Quartiles
Variation
Mode
Median
Range
Coefficient of
Variation
Variance
Geometric Mean
Standard Deviation
Introduction
Think of a sample portfolio composed of three
stocks.
200
shares
100 shares ARR =
ARR = 10% 15%
100 shares
ARR = 20%
A central measure for this portfolios ARR for is 15%.
Now observe the following portfolio
A central measure of this portfolios ARR for is 15% too.
200
shares
100
100 shares
shares ARR =
ARR
ARR == 5%
5% 15%
100 shares
ARR = 25%
Considering the average ARR only the two
portfolios are equal. But are they really?
Is the dispersion of ARR the same for the two
portfolio?
The dispersion (variability) is an important
property when describing a set of numbers, at
least as important as the central location.
Measures of Central
Tendency
Central Tendency
Mean
Median
Mode
X
i 1
i 1
Chap 3-5
Geometric Mean
X G X 1 X 2 L X n
2004 Prentice-Hall, Inc.
1/ n
Measures of Central Tendency
The central data point reflects the
locations of all the actual data points.
How?
With two data points,
the central location
With one data point
should fall in the middle
clearly the central
location is at the point between them (in order
to reflect the location of
itself.
both of them).
Measures of Central
Tendency
The central data point reflects the
locations of all the actual data points.
How?
If the third data point appears in the center
the measure of central location will remain
in the center, but (click)
Measures of Central
Tendency
The central data point reflects the
locations of all the actual data points.
How?
But if the third data point
appears on the left hand-side
of the midrange, it should pull
the central location to the left.
Measures of Central
Tendency
As more and more data points are added, the
central location moves (left and right) as required
in order to reflect the effects of all the points.
Mean (Arithmetic Mean)
Mean (Arithmetic Mean) of Data Values
Sample mean
Sample Size
X
i 1
Population mean
N
X
i 1
X1 X 2 L X n
n
Population Size
X1 X 2 L X N
Mean (Arithmetic Mean)
The Most Common Measure of Central Tendency
Affected by Extreme Values (Outliers)
0 1 2 3 4 5 6 7 8 9 10
Mean = 5
0 1 2 3 4 5 6 7 8 9 10 12 14
Mean = 6
Median
Robust Measure of Central Tendency
Not Affected by Extreme Values
0 1 2 3 4 5 6 7 8 9 10
Median = 5
0 1 2 3 4 5 6 7 8 9 10 12 14
Median = 5
In an Ordered Array, the Median is the Middle
Number
If n or N is odd, the median is the middle number
If n or N is even, the median is the average of the 2
middle numbers
Mode
A Measure of Central Tendency
Value that Occurs Most Often
Not Affected by Extreme Values
There May Not Be a Mode
There May Be Several Modes
Used for Either Numerical or Categorical Data
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Mode = 9
0 1 2 3 4 5 6
No Mode
Geometric Mean
Useful in the Measure of Rate of Change of a
Variable Over Time
X G X 1 X 2 L X n
1/ n
Geometric Mean Rate of Return
Measures the status of an investment over time
RG 1 R1 1 R2 L 1 Rn
1/ n
Example
An investment of $100,000 declined to $50,000 at the
end of year one and rebounded back to $100,000 at end
of year two:
R1 0.5 (or 50%)
R2 1 (or 100% )
Average rate of return:
( 0.5) (1)
R
0.25 (or 25%)
2
Geometric rate of return:
RG 1 0.5 1 1
0.5 2
1/ 2
1/ 2
1 11/ 2 1 0 (or 0%)
Quartiles
Split Ordered Data into 4 Quarters
25%
25%
Q1
25%
Q2
25%
Q3
i n 1
Position of i-th Quartile Qi
4
Q and
1
Q3 are Measures of Non-central Location
Q2 = Median, a Measure of Central Tendency
Quartiles
The lower half of a data set is the set of all values that are
to the left of the median value when the data has been put
into increasing order.
The upper half of a data set is the set of all values that are
to the right of the median value when the data has been
put into increasing order.
The first quartile, denoted by Q1 , is the median of
the lower half of the data set. This means that about 25%
of the numbers in the data set lie below Q1 and about 75%
lie above Q1 .
The third quartile, denoted by Q3 , is the median of
the upper half of the data set. This means that about 75%
of the numbers in the data set lie below Q3 and about 25%
lie above Q3 .
Quartiles
Data in Ordered Array: 11 12 13 16 16 17 17 18 21
Median
1 9 1
Position of Q1
2.5
4
Q1
12 13
12.5
(17 18)
Q3 2 17.5
Measures of Variation
Measures of central location fail to tell the whole
story about the distribution.
A question of interest still remains unanswered:
How much are the values of a given set
spread out around the mean value?
19
Measures of Variation
Variation
Range
Variance
Interquartile
Range
Population
Variance
Sample
Variance
Standard
Deviation
Population
Standard
Deviation
Sample
Standard
Deviation
Coefficient
of Variation
Range
Measure of Variation
Difference between the Largest and the Smallest
Observations:
Range X Largest X Smallest
Ignores How Data are Distributed
Range = 12 - 7 = 5
Range = 12 - 7 = 5
Chap 3-21
10
11
12
10
11
12
2004 Prentice-Hall, Inc.
Interquartile Range
Measure of Variation
Also Known as Midspread
Spread in the middle 50%
Difference between the First and Third Quartiles
Data in Ordered Array: 11 12 13 16 16 17 17 18 21
Interquartile Range Q3 Q1 17.5 12.5 5
Not Affected by Extreme Values
Variance
Important Measure of Variation
Shows Variation about the Mean
Sample Variance:
n
S2
Population Variance:
X
i 1
n 1
X
i 1
The Variance
Example
Find the variance of the following set of numbers,
representing annual rates of returns for a group of
mutual funds. Assume the set is (i) a sample, (ii) a
population: -2, 4, 5, 6.9, 10
Solution:
The Variance
Solution:
Assuming a sample
Standard Deviation
Most Important Measure of Variation
Shows Variation about the Mean
Has the Same Units as the Original Data
Sample Standard Deviation:
S
Population Standard Deviation:
X
i 1
n 1
X
i 1
Standard Deviation
Example
The daily percentage of defective items in two weeks
of production (10 working days) were calculated for
two production lines?
Which line provides good items more consistently?
Line 1: 8.3, 6.2, 20.9, 2.7, 33.6, 42.9, 24.4, 5.2, 3.1, 30.05
Line 2: 12.1, 2.8, 6.4, 12.2, 27.8, 25.3, 18.2, 10.7, 1.3, 11.4
27
Standard Deviation
Solution:
Line 1:
28
Standard Deviation
Solution:
Line 2:
29
Standard Deviation
Line 1 should be considered less consistent
because the standard deviation of its defective
proportion is larger (i.e. therefore the standard
deviation of the good item proportion is also
larger).
30
Interpreting the Standard Deviation
The standard deviation can be used to
compare the variability of several distributions
make a statement about the general shape of a
distribution.
When describing the shape of a distribution we
refer to
A distribution with any shape
A mound shaped distribution
31
Standard Deviation
From a Frequency Distribution
(continued)
Approximating the Standard Deviation
Used when the raw data are not available and the
only source of data is a frequency distribution
c
m
j 1
X fj
2
n 1
n sample size
c number of classes in the frequency distribution
m j midpoint of the jth class
f j frequencies of the jth class
Comparing Standard
Deviations
Data A
11 12
Mean = 15.5
s = 3.338
13
14
15
16
17
18
19
20 21
Data B
Mean = 15.5
11 12
13
14
15
16
17
18
19
20 21
s = .9258
Data C
Mean = 15.5
11 12
Chap 3-33
13
14
15
16
17
18
19
20 21
s = 4.57
2004 Prentice-Hall, Inc.
Coefficient of Variation
Measure of Relative Variation
Always in Percentage (%)
Shows Variation Relative to the Mean
Used to Compare Two or More Sets of Data
Measured in Different Units
S
CV 100%
X
Sensitive to Outliers
Comparing Coefficient
of Variation
Stock A:
Average price last year = $50
Standard deviation = $2
Stock B:
Average price last year = $100
Standard deviation = $5
Coefficient of Variation:
Stock A:
Stock B:
$2
S
CV 100%
100% 4%
X
$50
$5
S
CV 100%
100% 5%
X
$100
Shape of a Distribution
Describe How Data are Distributed
Measures of Shape
Symmetric or skewed
Left-Skewed
Symmetric
Mean < Median < Mode Mean = Median =Mode
Right-Skewed
Mode < Median < Mean
Exploratory Data Analysis
Box-and-Whisker Plot
Graphical display of data using 5-number summary
X smallest Q
1
Median( Q2)
Q3
10
Xlargest
12
Distribution Shape &
Box-and-Whisker Plot
Left-Skewed
Q1
Q2 Q3
Symmetric
Q1Q2Q3
Right-Skewed
Q1 Q2 Q3
The Empirical Rule
For Data Sets That Are Approximately Bell-
shaped:
Roughly 68% of the Observations Fall Within 1
Standard Deviation Around the Mean
Roughly 95% of the Observations Fall Within 2
Standard Deviations Around the Mean
Roughly 99.7% of the Observations Fall Within 3
Standard Deviations Around the Mean