0% found this document useful (0 votes)
69 views

Describing Data: Numerical Measures: Nguyen Thi Lien

This document provides an overview of key concepts for describing data numerically that will be covered in Chapter 3, including measures of central tendency (mean, median, mode), variation (range, variance, standard deviation), and distribution shapes. Examples are given to demonstrate calculating the mean, weighted mean, geometric mean, median, and mode. The chapter will also cover population summary measures, the empirical rule, five number summaries and box plots, covariance and correlation, and issues to consider when using these descriptive measures.

Uploaded by

Trịnh Anh Vũ
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views

Describing Data: Numerical Measures: Nguyen Thi Lien

This document provides an overview of key concepts for describing data numerically that will be covered in Chapter 3, including measures of central tendency (mean, median, mode), variation (range, variance, standard deviation), and distribution shapes. Examples are given to demonstrate calculating the mean, weighted mean, geometric mean, median, and mode. The chapter will also cover population summary measures, the empirical rule, five number summaries and box plots, covariance and correlation, and issues to consider when using these descriptive measures.

Uploaded by

Trịnh Anh Vũ
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 50

Chapter 3

Describing Data: Numerical


Measures
Nguyen Thi Lien
Faculty of Mathematical Economics, NEU
Email: [email protected]

04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 1


What are we going to learn?
Compute and interpret the mean, median, and mode for a
set of data

Find the range, variance, standard deviation, and


coefficient of variation and know what these values mean

Apply the empirical rule to describe the variation of


population values around the mean

04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 2


After this chapter, you should be able
to?
Measures of central tendency, variation, and shape
◦ Mean, median, mode, geometric mean, Quartiles
◦ Range, interquartile range, variance and standard deviation, coefficient
of variation
◦ Symmetric and skewed distributions
Population summary measures
◦ Mean, variance, and standard deviation
◦ The empirical rule and Bienaymé-Chebyshev rule

Five number summary and box-and-whisker plots

Covariance and coefficient of correlation


Pitfalls in numerical descriptive measures and ethical considerations

04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 3


Describing Data
Numerically
Describing Data
Numerically

Central Variation
Tendency
Arithmetic Mean Range
Median Interquartile Range
Mode Variance
Standard Deviation
Coefficient of
Variation
04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 4
Mean
Raw Data:

Arithmetic Weighted Data:


Mean

Grouped Data:

Geometric
Mean

04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 5


Exercise 3.1: Calculate the Arithmetic
Mean
Course Grade Points
Algebra 3.63
Introduction to Logic 4.20
Microeconomics 3.46
Statistics 4.00
Total 11.66

n
  ∑ xi
i=1 x 1+ x 2 + ⋯ + x n
x̄ = =
n n

04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 6


Exercise 3.2: Calculate the Weighted
Mean
If know more information about the number of credits:
Course Number Grade
of Credits Points
Algebra 3 3.63 n
  ∑ w i xi
Introduction to Logic 2 4.20 x̄ = i=1
n

∑ w𝑖
i=1
Microeconomics 3 3.46
Statistics 3 4.00

Weight wi Value xi
04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 7
Exercise 3.2: Calculate the Weighted
Mean
Course Number of Grade Grade Points x
Credits Points Credits
Algebra 3 3.63 10.89
Introduction to Logic 2 4.20 8.40
Microeconomics 3 3.46 10.38
Statistics 3 4.00 12.00
Total 11 41.67

04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 8


Exercise 3.3: Calculate the Weighted
Mean
A sample of 33 students were asked to rate themselves on whether they
were outgoing or not using this five point scale: 1 = extremely
extroverted, 2 = extroverted, 3 = neither extroverted nor introverted, 4
= introverted, or 5 = extremely introverted. The results are shown in
the table below:
Rating 1 2 3 4 5
Frequency 1 7 20 5 0

Calculate the sample mean.

04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 9


Exercise 3.4: Calculate the Geometric
Mean
Year Rate of GDP change (%)
2012 105.25
2013 105.42
2014 105.98
2015 106.68
2016 106.21

04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 10


Mean
Compare the mean of following data:
◦ Data 1: {10, 10, 11, 12, 12}

◦ Data 2: {10, 10, 11, 12, 120}

The mean is easily affected by the extreme values or outliers


 lead to biased comparison  Use the other measure

04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 11


Median
No class width Class width
Put all the numbers in numerical order
An
An odd
odd number
number of
of observation
observation Find
Find the
the class
class containing
containing Median
Median
(f
(fii =2m+1):
=2m+1): Median
Median == xxm+1
m+1
An even number of observation Calculate the Median
Median = +
(fi =2m): Median =

 XMedian(min) : Lower boundary of the class containing the median


hMedian : Width of the class containing the Median
i : Number of observations
SMedian-1 : Cumulative frequency of the previous class
fMedian : Frequency of the class containing the Median
04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 12
Exercise 3.5: Calculate the Median
 
• Data: { 5, 6, 9, 5, 6}

ÞOrdered data: { 5, 5, 6, 6, 9 } : Median = 6

• Data: { 5, 7, 9, 8, 6,11}

=> Ordered Data {5, 6, 7, 8, 9, 11} : Median = = 7.5

04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 13


Exercise 3.6: Calculate the Median
Number of Number Cumulative
tracks on CD of CDs Frequency
8 1 1
9 4 5 Median = xm+1
10 1 6
11 3 9
13 2 11
Total 11

04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 14


Exercise 3.7: Calculate the Median
Number of Number Cumulative
tracks on CD of CDs Frequency
8 1 1
9 4 5
  Median =

10 1 6
11 3 9
12 1 9
13 2 11
Total 12

04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 15


Exercise 3.8: Calculate the Median

Age Number Cumulative


of users Frequency
10 – 20 3 3
20 – 30 7 10
30 – 40 18 28
40 – 50 20 48
50 – 60 12 60
Total 60

04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 16


Median
Compare the mean and median of following data:
◦ Data 1: {10, 10, 11, 12, 12}

◦ Data 2: {2, 3, 4, 6, 40}

The median is independent from the outliers

Depends on the position

And Apply for quantitative variable only

04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 17


Mode
• Could be applied for both quantitative and qualitative variable
• Mode is the value repeated most often. Its frequency is the largest
• Find the Mode:
 Qualitative Data: Mode is the category having the largest frequency
 Quantitative Data: Mode is the value having the largest frequency

04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 18


Exercise 3.9: Calculate the
Mode
• Qualitative Data
 Data: { Yellow, Yellow, Red, Blue, Green}
 Mode is “Yellow”

• Quantitative Data
 Data 1: { 5, 6, 6, 7, 7, 7, 9 }
 Data 2: { 5, 6, 7, 8, 9 }
 Data 3: { 5, 6, 9, 5, 6 }

04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 19


Exercise 3.9: Calculate the
Mode
• Qualitative Data
 Data: { Yellow, Yellow, Red, Blue, Green}
 Mode is “Yellow”

• Quantitative Data
 Data 1: { 5, 6, 6, 7, 7, 7, 9 } -> Mode = 7
 Data 2: { 5, 6, 7, 8, 9 } -> No mode
 Data 3: { 5, 6, 9, 5, 6 } -> Mode = 5 and 6

04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 20


Exercise 3.10: Calculate the
Mode
Age Number of users
10 – 20 3
20 – 30 7
30 – 40 18
40 – 50 20
50 – 60 12
Total 60

04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 21


Quartile
 Divide data into 4 equal-parts by 3 cutoff points: 3 quartile

2nd quartile:

25% 25% 25% 25%


Q1 Q2 Q3

04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 22


Quartile
Find a quartile:
Q1 = 0.25(n+1)
First quartile position

Q2 = 0.50(n+1)
Second quartile position

Q3 = 0.75(n+1)
Third quartile position

where n is the number of observed values

04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 23


Quartile
 
◦ Lower Quartile (Q1):  LCB=lower boundary of the

Q1 = + class containing the item,

 n=number of observations,
◦ Median (Q2):
 S=cumulative frequency,
Q2 = +
 f=frequency of the class
◦ Upper Quartile (Q3):
containing the item,
◦ Q3 = +  h= width of the class containing
the item

04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 24


Exercise 3.11: Calculate the Quartile
Cumulative
Age Frequency
Frequency
10 – 20 3 3
20 – 30 7 10
30 – 40 18 28
40 – 50 20 48
50 – 60 12 60
Total 60

04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 25


Mean, Mode, Median
Left skewed Symmetric Right skewed

Mean
Median
Mean < Median < Mode Mode Mode < Median < Mean
04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 26
Measures of Dispersion
Describing Data
Numerically

Central Variation
Tendency
Arithmetic Mean Range
Median Interquartile Range
Mode Variance
Standard Deviation
Coefficient of
Variation
04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 27
Range
• The difference between the largest and the smallest value
in a data set.
Range = xmax -
xmin

• Pros: simple
• Cons: affected by outliers

04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 28


Exercise 3.12: Calculate the Range
Firm A Firm B
Worker 1 400 1480
Worker 2 400 1485
• Range (A) = 6000 – 400 = 5600 Worker 3 600 1486
Worker 4 600 1488
• Range (B) = 1522 – 1480 = 52 Worker 5 700 1490
Worker 6 800 1503
Worker 7 900 1505
Worker 8 2000 1520
Worker 9 2600 1521
Worker 10 6000 1522

04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 29


Interquartile Range
Interquartile Range (IQR) is range between Q3 and Q1

IQR = Q3-Q1

IQR is the width of 50% middle value of data

04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 30


Exercise 3.13: Calculate the IQR
Cumulative
Age Frequency
Frequency
10 – 20 3 3
20 – 30 7 10
30 – 40 18 28
40 – 50 20 48
50 – 60 12 60
Total 60

04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 31


Variance
  Population variance   Sample variance

µ = population mean  = arithmetic mean


N = population size n = sample size
xi = ith value of the variable x Xi = ith value of the variable X

04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 32


Standard Deviation
  Population Standard  Sample Standard variance
variance
=

µ = population mean  = arithmetic mean


N = population size n = sample size
xi = ith value of the variable x Xi = ith value of the variable X

04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 33


Exercise 3.14: Calculate the Variance
The variability between the number of
Compare coffee sales in two branches of Starbucks
A 20 40 50 60 80 20 49 50 51 80 B

Coffee sales in A Coffee sales in B


90 90
80 80 80 80
70 70
60 60 60
50 50 50 49 50 51
40 40 40
30 30
2020 2020
10 10
0 0
1 2 3 4 5 1 2 3 4 5

04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 34


Exercise 3.15: Calculate the Variance
Age Frequency (fi) xi
10 - 20 3 15 1900.08
20 - 30 7 25 1610.19
30- 40 18 35 480.50
40 - 50 20 45 467.22
50 - 60 12 55 2640.33
Total 60 7098.33
 
=120.31  
s

04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 35


Coefficient of Variation
• Applying to compare among:
- Different variables
- Same variables but the means are different
• This is the ratio of the standard deviation to the mean
  SD
CV = ×100
mean

04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 36


Exercise 3.16: Calculate the CV
An investor is considering the relative risks associated with two
projects:
- The first project has a mean expected profit of £5000 with a
standard deviation of £707.11
- The second project has a mean expected profit of £500 with a
standard deviation of £112.13
Use the measures of dispersion to establish which project has
the lowest degree of risk.

04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 37


Chebychev’s Theorem
 
For any population with mean μ and standard deviation σ,
and k > 1, the percentage of observations that fall within the
interval: [μ + kσ] is at least

Examples:
At least Within
(1 - 1/1.52) = 55.6% k = 1.5 (μ ± 1.5σ)
(1 - 1/22) = 75% k = 2 (μ ± 2σ)
(1 - 1/32) = 89% k = 3 (μ ± 3σ)

04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 38


The Empirical Rule
If the data distribution is bell-shaped, then the interval:

04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 39


Box and Whisker plot

 Boxplot 1
   

Lower Limit  
Upper Limit

 Boxplot 2
 

Q1 – 1.5IQR Q3 + 1.5IQR
Lower limit: the maximum of Upper limit: the minimum of
(min, Q1-1.5*IQR) (max, Q3+1.5*IQR)

04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 40


Skewness (Sk)

Sk = – 0.3 Sk = 0 Sk = 0.3
Left short tail Two-tail Right short tail

Sk = – 1.3 Sk = 1.3
Left long tail Right long tail

04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 41


Covariance
 
 Measure combined variability of and

Negati ve covariance
20
Positive covariance 20
Mean of Y

Mean of Y
10
10

0
0 5 10
Mean of X 0
0 5 10
Mean of X

04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 42


Exercise 3.17: Calculate the Covariance
Calcultate the covariance of the following sample data of four
(X, Y) pairs: (1, 5), (2, 10), (4, 7), and (5, 9)

04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 43


Correlation Coefficient
 

, no unit, measures linear relationship between and


: linear negative
: negatively correlated
: no correlated
: positively correlated
: linear positive

04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 44


Correlation Coefficient
Size of correlation Interpretation
0.9-1.0 (-0.9 to -1.00) Very high positive (negative) correlation
0.7-0.9 (-0.7 to -0.9) High positive (negative) correlation
0.5-0.7 (-0.5 to -0.7) Moderate positive (negative) correlation
0.3-0.5 (-0.3 to -0.5) Low positive (negative) correlation
0.0-0.3 (-0.0 to -0.3) Negligible correlation

04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 45


Correlation
 Graph and Correlation Coefficient

Positively
Moderate
High

Negatively

Negligibl
y
04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 46
Exercise 3.18: Calculate the correlation
coefficient
(X-MeanX).
No X Y X-MeanX Y-MeanY (X-MeanX)2 (Y-MeanY)2
(Y-MeanY)
1 1 5 -2 -2.75 5.50 4 7.5625
2 2 10 -1 2.25 -2.25 1 5.0625
3 4 7 1 -0.75 -0.75 1 0.5625
4 5 9 2 1.25 2.50 4 1.5625
Total 12 31 5 10 14.75
Mean 3 7.75
Variance 3.333 4.917
SD 1.8257 2.2174

 = 0.411

The low positive relationship between price and quantity supplied


04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 47
Summary Statistics in SPSS

04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 48


Summary Statistics in Excel
Data -> Data Analysis

04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 49


Summary Statistics in SPSS
Net Sales
Mean 77.6005
Standard Error 5.566494
Median 59.705
Mode 31.6
Standard Deviation 55.66494
Sample Variance 3098.585
Kurtosis 3.149955
Skewness 1.714996
Range 274.36
Minimum 13.23
Maximum 287.59
Sum 7760.05
Count 100

04/11/2021 STATISTICS FOR BUSINESS AND ECONOMICS 50

You might also like