Chapter 3 Descriptive Statistics
Chapter 3 Descriptive Statistics
University of
Gondar, Ethiopia
Leaning outcomes
• After completing this chapter a student will able to;
07/09/2021 2
Average should posses the following properties:
07/09/2021 4
Arithmetic Mean/simple Mean ( )
• Definition:
the arithmetic mean is the sum of all observations
divided by the number of observations. It is usually denoted
by
07/09/2021 5
Example
• Consider the data on birth weight of 10 new born
children in kilo gram at university of Gondar hospital:
2.51, 3.01, 3.25, 2.02,1.98, 2.33, 2.33, 2.98, 2.88, 2.43.
Then the average birth weight can be computed as:
07/09/2021 6
Arithmetic mean cont…
When the data are arranged or given in the form of frequency
distribution i.e. there are k variety values such that a value Xi
has a frequency f i ( i=1,2,---,k) ,then the Arithmetic mean will
be given as
Example
Solution:
130
𝑓 𝑖 × 𝑥𝑖
18× 2+19× 1+ 20× 4+ …+29 ×12
´𝑥 =∑ =
𝑖=1 ∑ 𝑓𝑖 2+1+4+ …+12
3180
¿ =24.46 ≈ 25
130
07/09/2021 8
Exercise
• Consider the following frequency distribution table
Data value 10 20 30 40 50 60 70 80 90 100 110
Frequency 3 5 6 8 10 12 15 10 10 12 5
07/09/2021 9
Mean for Grouped Data?
07/09/2021 10
Mean for Grouped Data
Example: calculate the mean for the grouped distribution
table given below:
Class Frequency
6-10 35
11-15 23
16-20 15
21-25 12
26-30 9
31-35 6
07/09/2021 11
Example cot…
• Solution
Class Class mid (Xm Frequency fi × Xm
6-10 8 35 280
11-15 13 23 299
16-20 18 15 270
21-25 23 12 276
26-30 28 9 252
31-35 33 6 198
Total 100 1,575
• Therefor
07/09/2021 12
Properties of the arithmetic mean
• The mean can be used as a summary measure for both discrete
and continuous data, in general however, it is not appropriate for
either nominal or ordinal data.
• For a given set of data there is one and only one arithmetic mean.
07/09/2021 14
Median
• An alternative measure of central location, perhaps
second in popularity to the arithmetic mean.
• Suppose there are n observations in a sample.
• If these observations are ordered from smallest to largest,
then the median is defined as follows:
• The median, is a value such that at least half of the
observations are less than or equal to median and
at least half of the observations are greater than or
equal to median .
• The median is the midpoint of the data array.
07/09/2021 15
Median
Ungrouped data
• If the number of observations is odd, the median is defined
as the [(n+1)/2]th observation.
• If the number of observations is even the median is the
average of the two middle (n/2)th and [(n/2)+1]th values i.e
• To find the median of a data set:
• Arrange the data in ascending order.
• Find the middle observation of this ordered data.
Example1: where n is even: 19, 20, 20, 21, 22, 24, 27, 27,
27, 34
16
Example 2
The number of children with asthma during a specific year in
seven local districts clinic is shown. Find the median for this
data set.
253, 125, 328, 417, 201, 70, 90
Solution:
First we must arrange the data in ascending order
70, 90, 125, 201, 253, 328, 417
Therefore, the fourth observation is the median of the data, i.e.
the value 201 is the median value
07/09/2021 17
Exercise
• The actual waiting time for the first job on the
selected sample of nine people having different field
of specialization was given below.
07/09/2021 18
Median cont…
Median for grouped data.
-If data are given in the shape of continuous frequency
distribution, the median is defined as:
Class Frequency
40-44 7
45-49 10
50-54 22
55-59 15
60-64 12
65-69 6
70-74 4
Median for grouped data …
• Solution
Class Frequency Cumulative frequency
40-44 7 7
45-49 10 17
50-54 22 39
55-59 15 54
60-64 12 66
65-69 6 72
70-74 4 76
Total 76 76
¿
07/09/2021 21
Median in grouped data …
•
¿
07/09/2021 22
Merits and Demerits of Median
Merits:
Demerits:
07/09/2021 25
Mode for Grouped data
In grouped data, we usually refer to the modal class, class
with highest frequency. If a single value for the mode of
grouped data must be specified, it is taken as:
1
Mode L w
1 2
Where L = The lower class boundary of the modal class;
1 f mod f 1 2 f mod f 2
15-19 6
20-24 19
25-29 50
30-34 57
35-39 48
40-44 27
45-49 21
Total 228
27
The Mode…
Solution: By inspection (simply looking at the
frequencies), the mode lies in the fourth class, where L
=29.5, fmod = 57, f1=50, f2=48, w = 5, and
1 57 50 7, 2 57 48 9
Therefore, the modal age,
7
x̂ 29.5 5
79
29.5 2.2
31.7
28
Properties of Mode
• The mode can be used as a summary measure for nominal,
ordinal, discrete and continuous data, in general however, it
is more appropriate for nominal and ordinal data.
29
Merits and Demerits of Mode
Merits:
It is not affected by extreme observations.
Easy to calculate and simple to understand.
It can be calculated for distribution with open end class.
Demerits:
It is not rigidly defined.
It is not based on all observations.
It is not suitable for further mathematical treatment.
It is not stable average, i.e. it is affected by fluctuations of sampling
to some extent.
Often its value is not unique.
30
Quartiles
B. Even:
07/09/2021 32
Quartiles
•
W iN
Qi LQi ( C ), i 1,2,3
f Qi 4
Measure of variation/dispersion
Definition:
07/09/2021 34
Measure of variation cont…
A good measure of variation posses:
• It should be easy to compute and understand.
• It should be based on all observations.
• It should be Uniquely defined
• It should be capable of further algebraic treatment.
• It should be as little as affected by extreme values
07/09/2021 35
Measure of variation Cont…
Absolute and relative measures
07/09/2021 39
Range cot…
07/09/2021 40
Quartiles and Inter-quartile Range, Percentiles
35 35 36 37 37 38 42 43 43 44 45 48 48 51 55
• IQR = 48 – 37 = 11
41
Quartile deviation (QD)
The range expresses the extreme variability of
observations of a variable.
is half of the inter quartile range.
42
Coefficient of quartile deviation (CQD)
It gives the average amount by which the
two quartiles differ from the median.
Q3 Q1
CQD
Q3 Q1
43
Variance and Standard Deviation
• Variance measure how far on average scores deviate
or differ from the mean.
07/09/2021 44
07/09/2021 45
Variance
I.e. The sample variance, denoted by s2 , of a set of n
46
Standard deviation
• There a problem in a variance because the deviations
are squared and its units also square, in order to get
the original unit of measurement we insert in to square
root.
07/09/2021 47
Standard cont…
• Consider the following three datasets
Next subtract the mean from each value and square it:
X X-
07/09/2021 49
Cont…
•Sum
up all the squared values
07/09/2021 50
Exercise
• The Areas of spray able surfaces with DDT from a sample of 15
houses are measured as follows (in m2) :
101,105,110,114,115,124,125,125,130,133,135,136,13 7,140,145
07/09/2021 51
Example 2
• Find the variance and the standard deviation for the
frequency distribution of the given data set below.
Class Frequency Midpoint
5.5 – 10.5 1 8
10.5 – 15.5 2 13
15.5 – 20.5 3 18
20.5 – 25.5 5 23
25.5 – 30.5 4 28
30.5 – 35.5 3 33
35.5 – 40.5 2 38
07/09/2021 52
Cot…
• Solution
07/09/2021 54
Special properties of standard deviation /variance
55
Special properties of standard deviation
2 n 1 S n 1 S
2 2
Sp 1 1 2 2
1n n 2
2
56
Coefficient of variation
• The standard deviation is an absolute measure of deviation of
observations around their mean and is expressed with the same
unit of the data.
• Due to this nature of the standard deviation it is not directly used
for comparison purposes with respect to variability.
• Coefficient of variation, is often used for this purpose
• The coefficient of variation (CV) is defined by:
CV =
07/09/2021 57
Examples:
1. An analysis of the monthly wages paid (in Birr) to
workers in two firms A and B belonging to the same
pharmaceutical industry gives the following results
07/09/2021 58
07/09/2021 59
Coefficient of variation cont…
Exercise
2. A meteorologist interested in the consistency of
temperatures in three cities during a given week collected
the following data. The temperatures for the five days of
the week in the three cities were
City 1 : 25 24 23 26 17
City2 : 22 21 24 22 20
City3 : 32 27 35 24 28
Which city have the most consistent temperature, based
on these data?
07/09/2021 60
When to use coefficient of variance
• When comparison groups have very different means (CV is
suitable as it expresses the standard deviation relative to its
corresponding mean)
07/09/2021 61
Exercise
1. Based on the given data set given below
a. Calculate mean, median and mode
b. Calculate variance, standard deviation and coefficient of
variation
15, 7, 13, 9, 10, 11
2. Calculate variance and standard deviation for the following data set
geven below;
5, 17, 12, 10, 8
07/09/2021 62
Standard Score
If X is a measurement from a distribution
with mean and standard
deviation S, then its value in standard units is
Relatively speaking:
a) Which group is more consistent in its performance
b) Suppose a person A from group one take 9.2 minutes while
person B from Group two take 9.3 minutes, who was faster in
performing the task? Why?
Moments
mr
i
( X X ) r
r 0,1,2,
n
• for continuous grouped data it is given by:
mr
fi ( X i X )r
n
67
Example:
m1
i
( X X ) 1
(2 4) (3 4) (7 4)
0
n 3
m2
i
( X X ) 2
(2 4) 2 (3 4) 2 (7 4) 2
4.67
n 3
m3
i
( X X ) 3
(2 4) 3 (3 4) 3 (7 4) 3
6
n 3
68
Measures of shape
a. Skewness
• Skewness is the degree of asymmetry or departure
from symmetry of a distribution.
• A skewed frequency distribution is one that is not
symmetrical.
• Skewness is concerned with the shape of the curve not size.
• If the frequency curve (smoothed frequency polygon) of a
distribution has a longer tail to the right of the central
maximum than to the left, the distribution is said to be skewed
to the right or said to have positive Skewness.
07/09/2021 69
Concept of skewness
• The skewness of a distribution is defined as the lack
of symmetry.
• In a symmetrical distribution, mean, median, and
mode are equal to each other.
07/09/2021 70
Skewness
• If it has a longer tail to the left of the central
maximum than to the right, it is said to be skewed to
the left or said to have negative Skewness.
• For moderately skewed distribution, the following
relation holds among the three commonly used
measures of central tendency.
Mean-Mode=3*(Mean-Median)
07/09/2021 71
Skewness
Measures of Skewness
The Karl Pearson’s Coefficient of Skewness (SK):
Mean Mode 3( Mean Median )
Sk Sk
S tan dard deviation S tan dard deviation
07/09/2021 73
Remarks Related with Skewness
In a negatively skewed distribution, smaller observations
are less frequent than larger observations i.e. the majority
of the observations have a value above an average
07/09/2021 74
Kurtosis
• Kurtosis is the degree of peakdness of a distribution, usually
taken relative to a normal distribution.
The peakdness of a distribution be classified in to three:
• Leptokurtic: -
- A distribution having relatively high peak
- A large number of observations have same values
• Mesokurtic: -
- Normal peak
- The curve is properly peaked
• Platykurtic:
Flat toped
A large number of observations have low frequency are
spread in the middle interval.
07/09/2021 75
Kurtosis
07/09/2021 76
Measures of kurtosis
m4
2
m2 2
07/09/2021 77
You
a n k
Th
07/09/2021 78