0% found this document useful (0 votes)
25 views

Unit 3 Descriptive Statistics

STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION

Uploaded by

20220025082
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

Unit 3 Descriptive Statistics

STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION

Uploaded by

20220025082
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

UNIT 3

DESCRIPTIVE STATISTICS

Mathematics Department
XAVIER UNIVERSITY-ATENEO DE CAGAYAN
Descriptive Statistics – numerical
measures that are used to describe
certain characteristics of the data
Common Types of Descriptive Measures
1. Measures of Central Location – mean, median,
mode
2. Measures of Variability – range, variance, standard
deviation, standard error, coefficient of variation
3. Measures of Shape - Skewness
- Kurtosis
4. Other Summary Statistics: sum, count
5. Measures of Location: Quartile
Measures of Central Location
- Any single value which is used to identify the “center”
or the typical value in the data set; it is oftentimes
referred to as the average.
1. Mean
– sum of all values of the observations divided by the number
of observations in the data set
Example. The following table shows the length of
waiting time (in minutes) for the 10 randomly
selected customers at a certain fast food chain
before the order is served.

Customer Time
A 10
Determine the mean waiting time of
B 16
customer using the formula.
C 14
D 20
E 18
F 18
G 8 Answer:
H 14
I 18 14.6 mins
J 10
■Example. The following table lists the calories per 100 milliliters of a
random sample 25 popular milk tea products.

43 37 42 40 53 62 36 32 50 49
26 53 73 48 45 39 45 48 40 56
41 36 58 42 39
2. Median – a value that divides an ordered set of data
(array) into two equal parts (usually denoted by Md)
Md = middle value in the array when n is odd
Md = mean of the two middle values when n is even
EXAMPLE. Using the previous data, determine the median
waiting time of customers.

Customer Time Ordered Data Time


A 10 G 8
B 16 A 10
C 14 J 10 Answer:
D 20 C 14
E 18 H 14
F 18 B 16 Median: 15
G 8 E 18 minutes
H 14 F 18
I 18 I 18
J 10 D 20
3. Mode – the value in the data set that occurs
with the greatest frequency
EXAMPLE. Determine the modal length of waiting time of
customers.
Customer Time
A 10
B 16 Answer:
C 14
D 20
E 18 Mode: 18 minutes
F 18
G 8
H 14
I 18
J 10
EXCEL: Data – Data Analysis – Descriptive Statistics
EXCEL: Data – Data Analysis – Descriptive Statistics

Time

Customer Time sum of the observations divided


A 10 Mean 14.6 by 10
Standard Error 1.30128142
B 16 Median is the average of the two middle values 14
C 14 Median 15 and 16
Mode 18 appeared thrice in the data set
D 20 Standard
E 18 Deviation 4.115013163
F 18 Sample Variance 16.93333333
Kurtosis -1.257893944
G 8 Skewness -0.407572294
H 14 Range 12
I 18 Minimum 8
Maximum 20
J 10 Sum 146
Count 10

Available Excel functions:


=average(data_range)
=median(data_range)
=mode(data_range)
SPSS: Analyze – Descriptive Statistics - Descriptives
SPSS: Analyze – Descriptive Statistics - Descriptives
Jamovi: Analyses -> Exploration -> Descriptive-> Statistics
PSPP: Analyze – Descriptive Statistics - Descriptives
Measures of Variability/Dispersion
- Numerical descriptive measures which indicate
the extent to which individual observations in a
set of data are scattered about an average.

Common Measures of Variability


1.Range
2.Variance
3.Standard deviation
4. Standard Error
5.Coefficient of Variation
Measures of Variability/Dispersion

1. Range – the difference between the maximum and the


minimum values in the data set
Maximum – the highest value in the data set
Minimum – the lowest value in the data set

Customer Time
A 10
Determine the following: MAX, MIN, RANGE
B 16
C 14
D 20 Answer:
E 18 MAX = 20 minutes
F 18
G 8 MIN = 8 minutes
H 14
I 18
RANGE = 20-8=12 minutes
J 10
2. Standard deviation – a measure of dispersion which
indicates the extent of scattering of the observations from the
mean

3. Variance – the square of the standard deviation


- the average squared deviation of the observations from the
mean
Compute the variance and standard deviation of the waiting
time of customers using the formula.

(mean 14.6) Answer:


Customer Time x – mean (x-mean)^2
A 10 -4.6 21.16
B 16 1.4 1.96
C 14 -0.6 0.36
D 20 5.4 29.16
E 18 3.4 11.56
F 18 3.4 11.56 Sample variance
G 8 -6.6 43.56 S2 = 16.933 minutes2
H 14 -0.6 0.36
I 18
Sample Standard deviation
3.4 11.56
S = 4.115 minutes
J 10 -4.6 21.16
sum 152.4
4. Standard error (of the mean) – is used to measure how
well the obtained statistic will estimate the target parameter.
The smaller the standard error, the better the statistic
estimates the parameter.
- It is estimated by dividing the standard deviation with the
square root of the sample size.

standard error of the mean

EXAMPLE. Determine the standard error of the mean waiting


time.

Answer: standard error = 1.301 minutes

Note: Standard error of the mean is useful when constructing confidence interval of the mean.
The standard deviation of sample means is known as
the standard error of the mean (SE).
Excel: Data – Data Analysis – Descriptive Statistics
Excel Output. Measures of Variability
Time

Mean 14.6
Standard Error 1.30128142standard deviation divided by square root of n
Median 15
Mode 18
Standard Deviation 4.115013163average deviation of each value from the mean
Sample Variance 16.93333333square of the standard deviation
Kurtosis -1.257893944
Skewness -0.407572294
Range 12Highest value - Lowest value
Minimum 8lowest value
Maximum 20highest value
Sum 146
Count 10
Available Excel functions:
=stdev(data_range) =max(data_range)
=var(data_range) =min(data_range)
SPSS: Analyze – Descriptive Statistics - Descriptives
5. Coefficient of Variation

The coefficient of variation (CV) measures how


scattered the data relative to the mean. It is a relative
measure of variation that is always expressed as a
percentage.

Formula:

The coefficient of variation is very useful when


comparing the two or more data sets that have different
means and/or measured in different unit of
measurement.
Example. The following Jolly A Jolly B
table shows the 10 10
waiting time (in 16 16
minutes) of customers 14 15
20 25
in Jolly A and Jolly B 18 20
fastfood chains. 18 18
8 9
14 14
Which fastfood chain 18 21
has a lower coefficient 10 12
Stdev 4.115 5.077
of variation?
Mean 14.6 16
CV 28.19% 31.73%
Measures of Shape
1. Measure of Skewness
– refer to the degree of asymmetry, or departure from
symmetry of a distribution.

Sk=0 or
Sk>0 close to Sk<0
zero
Determining if skewness is significantly non-normal

■ Skewness. The question arises in statistical analysis of


deciding how skewed a distribution can be before it is
considered a problem. One way of determining if the degree of
skewness is "significantly skewed" is to compare the numerical
value for "Skewness" with twice the "Standard Error of
Skewness" and include the range from minus twice the Std.
Error of Skewness to plus twice the Std. Error of Skewness. If
the value for Skewness falls within this range, the skewness is
considered not seriously violated.
Simplified Guideline
i) If │ skewness │ ≤ 2*standard error, then symmetric.
ii) If │skewness │ > 2*standard error, then skewed right if Sk is positive;
or skewed left if Sk is negative .

where std. error = (rough estimate)


Formula:

Alternative formula:

Skew
EXAMPLE. Determine the the skewness coefficient of
the waiting time of customers using the following
formula.

Customer Time
A 10
B 16
C 14
D 20 First, determine the following:
E 18
mean 14.6
F 18
G 8 std. dev 4.115
H 14
I 18
J 10
Formula:

Xi
Answer:
10 -1.12 -1.40
16 0.34 0.04
Skew
14 -0.15 0.00
20 1.31 2.26
18 0.83 0.56 = - 0.407
18 0.83 0.56
8 -1.60 -4.13
14 -0.15 0.00
18 0.83 0.56
10 -1.12 -1.40
Summation -2.93
Excel: Data – Data Analysis – Descriptive Statistics
Excel Output. Measure of Skewness
Time

Mean 14.6
Standard Error 1.30128142
Median 15
Mode 18
Standard
Deviation 4.115013163
Sample Variance 16.93333333
Kurtosis -1.257893944
Skewness -0.407572294
Range 12
Minimum 8
Maximum 20
Sum 146
Count 10
Available Excel functions:
=skew(data_range)
SPSS: Analyze – Descriptive Statistics - Descriptives

Skewness = -0.408
Standard error (S.E.) of skewness = 0.687

Twice the std. error is 2x0.687 = 1.374


The distribution is symmetric if Sk is within the interval [-1.374, 1.374]

Conclusion: Since sk=-0.407 is within the interval, thus the distribution may be
considered symmetric.
Example: Use the alternative formula (if you are only
provided with the 3 summary statistics.

mean 14.6

median 15

std. dev 4.115

Alternative Formula:

Skew

Answer: Skew = -0.297 (estimate)


2. Measure of Kurtosis
– measures the extent to which observations cluster around a
central point

Ku > 0
Ku =0 or
close to zero

Ku < 0
■ Kurtosis is a measure of the tailedness of a distribution. Kurtosis is a measure of
whether the data are heavy-tailed or light-tailed relative to a normal distribution.
Tailedness is how often outliers occur.
■ Positive excess values of kurtosis indicate that distribution is peaked and
possesses thick tails. Leptokurtic distributions have positive kurtosis values.
■ A leptokurtic distribution has a higher peak (thin bell) and taller (i.e., fatter and
heavy) tails than a normal distribution. Data sets with high kurtosis tend to have
heavy tails, or outliers.
■ Negative excess values of kurtosis indicate that the distribution is flat and has
thin tails. Platykurtic distributions have negative kurtosis values. A platykurtic
distribution is flatter (less peaked) when compared with the normal distribution,
with fewer values in its shorter (i.e., lighter and thinner) tails. Data sets with low
kurtosis tend to have light tails, or lack of outliers.
Determining if kurtosis is significantly non-normal

■ Kurtosis. The same numerical process can be used to check if


the kurtosis is significantly non normal. One way of
determining if the degree of kurtosis is "significantly non-
normal" is to compare the numerical value for “Kurtosis" with
twice the "Standard Error of Kurtosis" and include the
range from minus twice the Std. Error of Kurtosis to plus
twice the Std. Error of Kurtosis. If the value for Kurtosis falls
within this range, the kurtosis is considered not seriously
violated.
Simplified Guideline
i) If │ kurtosis│ ≤ 2*standard error, then Mesokurtic.
ii) If │kurtosis│ > 2*standard error, then Leptokurtic if Ku is positive;
or Platykurtic if Ku is negative .

where std. error = (estimate)


Formula
Example: Determine the kurtosis coefficient of the waiting time
of customers.

Customer Time
A 10
B 16
C 14
D 20 First, determine the following:
E 18 mean 14.6
F 18
G 8 std. dev 4.115
H 14
I 18
J 10
Formula:

Xi
Answer:
10 -1.12 1.56
16 0.34 0.01
Kurtosis
14 -0.15 0.00
20 1.31 2.97
18 0.83 0.47 = -1.2575
18 0.83 0.47
8 -1.60 6.62
14 -0.15 0.00
18 0.83 0.47
10 -1.12 1.56
Summa
tion 14.12
Excel: Data – Data Analysis – Descriptive Statistics
Excel Output. Measure of Kurtosis
Time

Mean 14.6
Standard Error 1.30128142
Median 15
Mode 18
Standard
Deviation 4.115013163
Sample Variance 16.93333333
Kurtosis -1.257893944
Skewness -0.407572294
Range 12
Minimum 8
Maximum 20
Sum 146
Count 10
Available Excel functions:
=kurt(data_range)
Optional:
SPSS: Analyze – Descriptive Statistics - Descriptives

Kurtosis = -1.258
Standard error (S.E.) of kurtosis = 1.334

Twice the std. error is 2x1.334 = 2.668


The distribution is mesokurtic if Ku is within the interval [-2.668, 2.668]

Conclusion: Since Ku=-1.258 is within the interval, thus the distribution may
be considered mesokurtic.
4. Other Summary Statistics
1. Sum – computed by adding numerical observations
2. Count – total number of observations

Time
Customer Time
Mean 14.6
A 10
Standard Error 1.30128142
B 16 Median 15
C 14 Mode 18
D 20 Standard Deviation 4.115013163
Sample Variance 16.93333333
E 18 Kurtosis -1.257893944
F 18 Skewness -0.407572294
G 8 Range 12
Minimum 8
H 14
Maximum 20
I 18 Sum 146
J 10 Count 10
5. Measures of Location: Fractiles
(Percentile, Decile, Quartile)

1. Percentiles – are values that divide an ordered set of


observations into 100 equal parts. These values, denoted by
P1, P2,…,P99 are defined such that 1% of the data fall below
P1, 2% of the data fall below P2, … and 99% of the data fall
below P99, respectively.
2. Deciles – are values that divide an ordered set of
observations into 10 equal parts. These values, denoted by
D1, D2,…,D9 are defined such that 10% of the data fall below
D1, 20% of the data fall below D2, … and 90% of the data fall
below D9, respectively.
3. Quartiles – are values that divide an ordered set of
observations into 4 equal parts. These values, denoted by Q1,
Q2, and Q3 are defined such that 25% of the data fall below Q1,
50% of the data fall below Q2, and 75% of the data fall below
Q3, respectively.

Formula:
Pi = the value of the item

Di = the value of the item

Qi = the value of the item


Example

Customer Time Determine the following:


A 10 1) P75
B 16
2) D4
C 14
D 20 3) Q2
E 18
F 18
G 8
H 14
I 18
J 10
First, arrange the data from lowest to highest.

Ordered Data Time Determine the following:


G 8 1) P75
A 10 2) D4
J 10 3) Q2
C 14
H 14
B 16 75
𝑡ℎ

E 18 = 8th + 0.25 (9th - 8th)


F 18 = 18 + 0.25 (18 - 18)
I 18 =18 mins
D 20

𝑡ℎ 𝑡ℎ
4 2

= 4th + 0.4 (5th - 4th) = 5th + 0.5 (6th - 5th)


= 14 + 0.4 (14-14) = 14 + 0.5 (16-14)
=14 mins =15 mins
How to compute: Percentile, Decile and Quartile in
Excel

=PERCENTILE.EXC(B2:B11, 0.4)
=QUARTILE.EXC(B2:B11, 2)

Note: D4 is equivalent to P40.


(There is no direct Excel function for decile in Excel.)

You might also like