Unit 3 Descriptive Statistics
Unit 3 Descriptive Statistics
DESCRIPTIVE STATISTICS
Mathematics Department
XAVIER UNIVERSITY-ATENEO DE CAGAYAN
Descriptive Statistics – numerical
measures that are used to describe
certain characteristics of the data
Common Types of Descriptive Measures
1. Measures of Central Location – mean, median,
mode
2. Measures of Variability – range, variance, standard
deviation, standard error, coefficient of variation
3. Measures of Shape - Skewness
- Kurtosis
4. Other Summary Statistics: sum, count
5. Measures of Location: Quartile
Measures of Central Location
- Any single value which is used to identify the “center”
or the typical value in the data set; it is oftentimes
referred to as the average.
1. Mean
– sum of all values of the observations divided by the number
of observations in the data set
Example. The following table shows the length of
waiting time (in minutes) for the 10 randomly
selected customers at a certain fast food chain
before the order is served.
Customer Time
A 10
Determine the mean waiting time of
B 16
customer using the formula.
C 14
D 20
E 18
F 18
G 8 Answer:
H 14
I 18 14.6 mins
J 10
■Example. The following table lists the calories per 100 milliliters of a
random sample 25 popular milk tea products.
43 37 42 40 53 62 36 32 50 49
26 53 73 48 45 39 45 48 40 56
41 36 58 42 39
2. Median – a value that divides an ordered set of data
(array) into two equal parts (usually denoted by Md)
Md = middle value in the array when n is odd
Md = mean of the two middle values when n is even
EXAMPLE. Using the previous data, determine the median
waiting time of customers.
Time
Customer Time
A 10
Determine the following: MAX, MIN, RANGE
B 16
C 14
D 20 Answer:
E 18 MAX = 20 minutes
F 18
G 8 MIN = 8 minutes
H 14
I 18
RANGE = 20-8=12 minutes
J 10
2. Standard deviation – a measure of dispersion which
indicates the extent of scattering of the observations from the
mean
Note: Standard error of the mean is useful when constructing confidence interval of the mean.
The standard deviation of sample means is known as
the standard error of the mean (SE).
Excel: Data – Data Analysis – Descriptive Statistics
Excel Output. Measures of Variability
Time
Mean 14.6
Standard Error 1.30128142standard deviation divided by square root of n
Median 15
Mode 18
Standard Deviation 4.115013163average deviation of each value from the mean
Sample Variance 16.93333333square of the standard deviation
Kurtosis -1.257893944
Skewness -0.407572294
Range 12Highest value - Lowest value
Minimum 8lowest value
Maximum 20highest value
Sum 146
Count 10
Available Excel functions:
=stdev(data_range) =max(data_range)
=var(data_range) =min(data_range)
SPSS: Analyze – Descriptive Statistics - Descriptives
5. Coefficient of Variation
Formula:
Sk=0 or
Sk>0 close to Sk<0
zero
Determining if skewness is significantly non-normal
Alternative formula:
Skew
EXAMPLE. Determine the the skewness coefficient of
the waiting time of customers using the following
formula.
Customer Time
A 10
B 16
C 14
D 20 First, determine the following:
E 18
mean 14.6
F 18
G 8 std. dev 4.115
H 14
I 18
J 10
Formula:
Xi
Answer:
10 -1.12 -1.40
16 0.34 0.04
Skew
14 -0.15 0.00
20 1.31 2.26
18 0.83 0.56 = - 0.407
18 0.83 0.56
8 -1.60 -4.13
14 -0.15 0.00
18 0.83 0.56
10 -1.12 -1.40
Summation -2.93
Excel: Data – Data Analysis – Descriptive Statistics
Excel Output. Measure of Skewness
Time
Mean 14.6
Standard Error 1.30128142
Median 15
Mode 18
Standard
Deviation 4.115013163
Sample Variance 16.93333333
Kurtosis -1.257893944
Skewness -0.407572294
Range 12
Minimum 8
Maximum 20
Sum 146
Count 10
Available Excel functions:
=skew(data_range)
SPSS: Analyze – Descriptive Statistics - Descriptives
Skewness = -0.408
Standard error (S.E.) of skewness = 0.687
Conclusion: Since sk=-0.407 is within the interval, thus the distribution may be
considered symmetric.
Example: Use the alternative formula (if you are only
provided with the 3 summary statistics.
mean 14.6
median 15
Alternative Formula:
Skew
Ku > 0
Ku =0 or
close to zero
Ku < 0
■ Kurtosis is a measure of the tailedness of a distribution. Kurtosis is a measure of
whether the data are heavy-tailed or light-tailed relative to a normal distribution.
Tailedness is how often outliers occur.
■ Positive excess values of kurtosis indicate that distribution is peaked and
possesses thick tails. Leptokurtic distributions have positive kurtosis values.
■ A leptokurtic distribution has a higher peak (thin bell) and taller (i.e., fatter and
heavy) tails than a normal distribution. Data sets with high kurtosis tend to have
heavy tails, or outliers.
■ Negative excess values of kurtosis indicate that the distribution is flat and has
thin tails. Platykurtic distributions have negative kurtosis values. A platykurtic
distribution is flatter (less peaked) when compared with the normal distribution,
with fewer values in its shorter (i.e., lighter and thinner) tails. Data sets with low
kurtosis tend to have light tails, or lack of outliers.
Determining if kurtosis is significantly non-normal
Customer Time
A 10
B 16
C 14
D 20 First, determine the following:
E 18 mean 14.6
F 18
G 8 std. dev 4.115
H 14
I 18
J 10
Formula:
Xi
Answer:
10 -1.12 1.56
16 0.34 0.01
Kurtosis
14 -0.15 0.00
20 1.31 2.97
18 0.83 0.47 = -1.2575
18 0.83 0.47
8 -1.60 6.62
14 -0.15 0.00
18 0.83 0.47
10 -1.12 1.56
Summa
tion 14.12
Excel: Data – Data Analysis – Descriptive Statistics
Excel Output. Measure of Kurtosis
Time
Mean 14.6
Standard Error 1.30128142
Median 15
Mode 18
Standard
Deviation 4.115013163
Sample Variance 16.93333333
Kurtosis -1.257893944
Skewness -0.407572294
Range 12
Minimum 8
Maximum 20
Sum 146
Count 10
Available Excel functions:
=kurt(data_range)
Optional:
SPSS: Analyze – Descriptive Statistics - Descriptives
Kurtosis = -1.258
Standard error (S.E.) of kurtosis = 1.334
Conclusion: Since Ku=-1.258 is within the interval, thus the distribution may
be considered mesokurtic.
4. Other Summary Statistics
1. Sum – computed by adding numerical observations
2. Count – total number of observations
Time
Customer Time
Mean 14.6
A 10
Standard Error 1.30128142
B 16 Median 15
C 14 Mode 18
D 20 Standard Deviation 4.115013163
Sample Variance 16.93333333
E 18 Kurtosis -1.257893944
F 18 Skewness -0.407572294
G 8 Range 12
Minimum 8
H 14
Maximum 20
I 18 Sum 146
J 10 Count 10
5. Measures of Location: Fractiles
(Percentile, Decile, Quartile)
Formula:
Pi = the value of the item
𝑡ℎ 𝑡ℎ
4 2
=PERCENTILE.EXC(B2:B11, 0.4)
=QUARTILE.EXC(B2:B11, 2)