Lesson 5 (Descriptive Statistics Part 1)_Oct 2024
Lesson 5 (Descriptive Statistics Part 1)_Oct 2024
Descriptive Statistics
(Part 1)
i. Introduction
ii. Types of data
2
i. Introduction
3
○ For example:
○ In the field of science – statistical techniques are used to analyze
data that is obtained from an experiment.
○ In manufacturing – quality control is achieved with the aid of
statistics.
○ In the area of business – marketing surveys are carried out to
determine the compatibility of the product with the economic
and social demand.
○ In the field of education – statistical techniques are used to
analyze the performance of students in an examination.
6
Definition
Population: The entire collection of individuals or
objects whose characteristics are being
studied.
Sample: A subset of the population selected for study.
7
Standard Notation
Mean 𝑥ҧ 𝜇
Standard deviation 𝑠 𝜎
Variance 𝑠2 𝜎2
Branches of Statistics
8
Statistics
Descriptive Inferential
Point Interval
9
Descriptive Statistics:
Methods for organizing and summarizing
Definition data by using tables, graphs and
summary measures.
Inferential Statistics:
The branch of statistics that includes
methods that use sample results to help
make decisions of predictions about a
population.
10
11
Why Descriptive Statistics
Process of Statistical Inference 12
Types
of Data
data
Quantitative Qualitative
(ratio and interval) (nominal and
ordinal)
Discrete Continuous
15
Nominal Data
Example:
What is your gender? Did you enjoy the film?
(please tick) (please tick)
Male Yes
Female No
Ordinal Data
20
Example:
How satisfied are you with the level
Are you satisfied with your
of service you have received?
(please tick) education at U of L?
Very satisfied
Dissatisfied Satisfied
Somewhat satisfied
Neutral 1 2 3 4 5
Somewhat dissatisfied
Very dissatisfied
21
Ratio Data
o Ratio data measured on a continuous scale and
does have a true zero point.
Examples:
o Age
o Weight
o Height
Summary of “types of data” and “scale 23
of measurement”
24
Example 1
Classify each set of data as discrete or continuous.
1) The number of suitcases lost by an airline.
2) The height of corn plants.
3) The grade level of students.
4) The number of green M&M's in a bag.
5) The time it takes for a car battery to die.
6) The production of tomatoes by weight.
Example 2 25
26
a) Gender
b) Age
c) Educational Background
d) Position in Botanic Gardens
e) Working Experience (years)
Example 3 27
Age Month Sex Head.L Head.W Neck.G Length Chest.G Weight Name
19 7 1 10 5 15 45 23 65 Allen
19 7 2 11 6.5 20 47.5 24 70 Berta
20 8 2 12 6 17 57 27 74 Berta
23 11 2 12.5 5 20.5 59.5 38 142 Berta
29 5 2 12 6 18 62 31 121 Berta
19 7 1 11 5.5 16 53 26 80 Clyde
20 8 1 12 5.5 17 56 30.5 108 Clyde
55 7 1 16.5 9 28 67.5 45 344 Doc
67 7 1 16.5 9 27 78 49 371 Doc
81 9 1 15.5 8 31 72 54 416 Quincy
10 1 16 8 32 77 52 432 Kooch
115 7 1 17 10 31.5 72 49 348 Charlie
117 9 1 15.5 7.5 32 75 54.5 476 Charlie
124 4 1 17.5 8 32 75 55 478 Charlie
140 8 1 15 9 33 75 49 386 Charlie
28
Statistical Measures
31
a) Measure
measure of asymmetry : of central
to show frequency tendency
distribution symmetrical
measure of location :
about the mean or skewed
to show where the center of
the data
STATISTICAL b) Measure of
c) Measure of MEASURES dispersion
skewness
measure of spread :
to show how spread out the data
are around the center
32
Numerical Descriptions
(a) Measures of central tendency
▪ Also called measures of location or average
▪ It refers to the middle point (central value) of a
distribution.
(b) Measures of dispersion
▪ It describes how spread or scattered a set or distribution
of numeric data about the central point or “how far apart
are the data values from each other”.
33
Numerical Descriptions
(c) Measures of skewness
▪ Skewness is the statistical term for asymmetry or “lop-
sided”
▪ Measure of skewness summarizes to what extent the
items are symmetrically distributed.
34
➢ Advantages
➢ it is widely understood
➢ the value of every item is included in the computation of the
mean.
➢ it is well suited to further statistical analysis.
➢ Disadvantages
➢ its value may not correspond to any actual value.
➢ it might be distorted by extremely high or low values.
Median 37
➢ Advantages
➢ can be used when certain end values of a set or distribution are
difficult, expensive or impossible to obtain, particularly appropriate
to ‘life’ data.
➢ can be used with non-numeric data if desired, providing the
measurements can be naturally ordered.
➢ will often assume a value equal to one of the original data.
➢ Disadvantages
➢ it is difficult to handle theoretically in more advanced statistical
work, so its use is restricted to analysis at a basic level.
➢ it fails to reflect the full range of values.
Mode 38
➢ Advantages
➢ it is more appropriate average to use in situations where it is useful
to know the most common value.
➢ easy to understand, not difficult to calculate and can be used when a
distribution has opened-ended classes.
➢ it is not affected by extreme values.
➢ Disadvantages
➢ it ignores dispersion around the modal value and it does not take all
the values into account.
➢ it is unsuitable for further statistical analysis.
➢ although it ignores extreme values, it is thought to be too much
affected by the most popular class when a distribution is significantly
skewed.
39
Summary of when to use the Mean,
Median & Mode
b) Measures of Dispersion
Range Standard deviation
STUDENT A STUDENT B
𝑥ҧ = 30 𝑥ҧ = 25
𝑠=4 𝑠=6
4 6
𝐶𝑉 = × 100% 𝐶𝑉 = × 100%
30 25
𝐶𝑉 = 13.33% 𝐶𝑉 = 24%
Since the coefficient of variation of Student A is lower than the
coefficient of variation of Student B, we can interpret as Student
A is more consistent in completing the projects.
Example 6 46
STANDARD
MEAN WEEKLY NO OF
DEVIATION
SALARY (RM) WORKERS
(RM)
FACTORY A 345 50 476
FACTORY B 285 45 524
𝐹𝑎𝑐𝑡𝑜𝑟𝑦 𝐴 ∶ 𝐹𝑎𝑐𝑡𝑜𝑟𝑦 𝐵 ∶
50 45
𝐶𝑉 = × 100% 𝐶𝑉 = × 100%
345 285
𝐶𝑉 = 14.49% 𝐶𝑉 = 15.79%
➢ Advantages
➢ it is widely understood
➢ the value of every item is included in the computation of the
mean.
➢ it is well suited to further statistical analysis.
➢ Disadvantages
➢ its value may not correspond to any actual value.
➢ it might be distorted by extremely high or low values.
Standard Deviation 51
➢ Advantages
➢ it takes all values into account; therefore, it can be regarded
as truly representative of the data.
➢ it is suitable for further statistical analysis.
➢ Disadvantages
➢ it is more difficult to understand than some other measures
of dispersion.
52
b) Measures of Skewness
Negatively Positively
Skewed Skewed
53
➢ types of kurtosis:
➢ Platykurtic – when the kurtosis < 0, the frequencies
throughout the curve are closer to be equal (i.e., the curve
is more flat and wide). Thus, negative kurtosis indicates a
relatively flat distribution
61
➢ These two distributions have the same variance, approximately the same
skewness, but differ markedly in kurtosis.
64
leptokurtic
platykurtic
65
Comparison of Central Tendency
for Three Curves
Curve C
Curve A
Curve B
Comparison of Dispersion of Two 66
Curves
67
Comparison of Two Skewed
Curves
Curve A: Curve B:
Positively Skewed Negatively Skewed
Example 9 68
(a) Obtain a computer output for the summary of data that includes
the mean, mode, median, range, quartiles, inter-quartile range
(IQR), standard deviation, skewness and kurtosis of the data set.
69
(b) Is Adam's test score data skewed to the left or to the right?
(c) Which measure of spread is larger? Which measure of spread
will give a more accurate picture of Adam's Maths
performance?
(d) Which measure of center is higher? Which measure of center
gives a more accurate picture of Adam's Maths performance?
70
(a)