ES Lecture Note
ES Lecture Note
Elementary Statistics
This course focuses on the teaching of quantitative skills which can be applied to the problems commonly
encountered in daily life. It serves to cultivate students' interest towards quantitative techniques that are
necessary for further studies as well as to strengthen their ability to apply these techniques in different areas.
Upon completion of the course, students should have acquired a solid training in fundamental statistical skills.
They should be able to apply these skills in practice, and to analyse and present data using basic statistical
methods.
Topics
1. Introduction to Survey and Statistics
2. Probability Distributions
3. Sampling Distributions and Central Limit Theorem
4. Estimation
5. Time Series
6. Price Index
Assessment
Individual assignments: 60 marks
Final examination: 40 marks
References
1. Haeussler, Ernest F., Paul, Richard S. & Wood, R. J. Introductory Mathematical Analysis for Business,
Economics, and the Life and Social Sciences, 14th edition, Pearson Education Limited.
2. Berenson, Mark L., Levine, David M. & Szabet, Kathryn A. Basic Business Statistics: Concept and
Applications, 14th edition, Pearson Education Limited.
Class Lecturer
Name:
Email contact:
General Reminders:
1. Remember to bring a HKEA approved calculator with SD (statistics) function to classes and
examination. Calculators with graphical display will not be allowed in the examination.
2. Check SOUL course link and class link frequently for updated information about the course and class
management.
2
Elementary Statistics
We start this chapter with Example 1: suppose now the manager of a large company with 3000
employees wants to collect information about employees' satisfactory level towards the company.
How should the manager plan this survey?
3
Elementary Statistics
Johnny: 9
May: 7 May: 7
Peter: 8 Amy: 9
David: 9 ...
... ...
... ...
...
...
...
...
This left oval represents the population of this study, it includes every employee in this company.
There should be 3000 employees in the company, each of them will give a score according to his /
her satisfactory level towards the company. Imagine all score must range from 0 to 10, 0 means
totally unsatisfied while 10 means totally satisfied.
When doing census, data would be collected from every employee (3000 data would be collected).
Once the data collection is completed, we try to understand the current situation by doing some data
analysis (for example, calculation of mean and standard deviation of the scores). You can imagine
that the mean satisfactory score equals to 9.5 or 2.4 represent very different situation.
However, sometimes it is not practical to conduct census due to some limitations, for example,
limited time and budget, survey will be conducted. A sample would be selected based on a fair and
random procedure (as far as possible) and then data will be collected and analysed, which is
represented by the right oval.
After doing survey, we would also analysis the data in order to draw conclusion about the current
situation. However, as the data collection is incomplete, we need to be very careful when we try to
make the conclusion. The reliability of the conclusion you made from a sample survey very much
depends on how good is your sample as a representative of the population.
4
Elementary Statistics
Probability Sampling
When selecting probability sample, we need to ensure every element has a chance to be selected. In
order to do so, an updated sampling frame has to be prepared. Sampling Frame is a data file that
contains information of the population objects. An updated sampling frame is particularly important
for the selection of probability samples, even though sometimes it is quite a difficult task to prepare
the sampling frame. A sampling frame may be a telephone directory, student registration list,
employment record, etc.
5
Elementary Statistics
4928088924357790028381163072758986302348
6187041657074680861298083973492077545091
4389865923250788612978496976539155008078
6299393912304548459856095206641287264647
Example 1:
Select a sample of size 500 from 3000 employees by simple random sampling method.
Solution:
Sampling Steps:
1. Assign unique identity number to each employee, 0001 – 3000.
2. Count every 4-digits (the digits are determined by the population size which is 3000 with 4
digits) as the employee number being chosen.
3. Numbers that are outside the range 0001 - 3000 have to be discarded.
4. If the same number is selected again, it also has to be discarded.
5. Suppose the first employee is selected from row 1 column 1 of the random number table, the
following numbers are selected.
4928 0889 2435 7790 0283 8116 3072 7589 8630 2348 ... ...
6. Continue the selection until 500 different employees are selected.
7. Employees with these employment numbers are chosen for the survey.
Using computer software to generate random numbers could be much faster, the underlying logic is
just the same as using the random number table.
6
Elementary Statistics
where k = .
Example 1:
Select a sample of size 500 from 3000 employees by systematic sampling method.
Solution:
Sampling Steps:
1. Assign unique identity number to each employee, 0001 – 3000.
2. Compile the ratio k = = 6.
3. Select the first subject a randomly from the first k employees (may get help from the random
number table). Suppose the first employee is selected from row 1 column 1 of the random
number table, then a = 4.
4. Then we select a, a + k, a + 2k, …, and so on, until 500 employees are selected:
4, 10, 16, 22, 28, ... 2998
5. Employees with these employment numbers are chosen for the survey.
7
Elementary Statistics
Example 1:
Select a stratified sample of size 500 from 3000 employees, for whom 600 are managers and the
other 2400 are junior staffs.
Solution:
Sampling Steps:
1. Compile the sample size for each subgroup, which should be proportional to the population
600
size for each subgroup. Sample size for managers: 500 × = 100 and sample size for
3000
2400
juniors: 500 × = 400.
3000
2. Generate individual samples of 100 managers and 400 junior staffs randomly.
8
Elementary Statistics
Non-probability Samples
Select sample based on a convenience way (e.g. street interview). Practical when no sampling frame
is available.
Example 2:
How does a sample of 500 teenagers to be selected in order to review the satisfactory level towards a
brand of cola?
Solution:
As the population size is very large, all teenagers in Hong Kong, it is impossible to prepare a
sampling frame. A more practical way is to invite 500 teenagers to join the survey by convenience.
9
Elementary Statistics
Types of Data
A single survey would deal with a variety of variables. The data, which are the observed outcomes
of these variables, will virtually always differ from person to person. There are two types of
variables: numerical variable and categorical variable.
Data
Numerical Categorical
Example 3:
This is the result of part of the survey. How many variables are there? What are the data types of the
variables?
10
Elementary Statistics
Solution:
There are four variables.
Gender is a categorical variable which uses "M" and "F" as the names of the two categories. It is in
nominal scale as no natural order between "M" and "F".
Number of previous full time employment is a numerical variable, and it is discrete.
Highest education level is a categorical variable. It is in ordinal scale so that "Undergraduate",
"HKALE graduate", and then "HKCEE graduate" represents the decreasing education level of the
three groups of employees.
Total working hours on 1/9/2019 is a numerical value, and it is continuous.
We usually use capital letter, e.g. X to denote the variable and use small letter, x to denote the
collected data. Suppose let X represents the gender of an employee, x1 = "F", x2 = "M", x3 = "F", x4 =
"F", x5 = "M". Sample size is usually denoted by n (n = 5) and population size is denoted by N (N =
3000).
11
Elementary Statistics
Summary Measures
In order to generate some helpful information from a messy numerical data set, a list of commonly
used summary measurement would be reviewed. There are three major types of descriptive
measures which help to describe a set of numerical data: central tendency, variation, and shape.
With μ as the notation of population mean and 𝑥̅ as the notation of sample mean, the formulae of this
most commonly used measurement are as follow:
Population Sample
𝑥 +𝑥 +⋯+𝑥 ∑𝑥 x1 x 2 ... xn x
Mean 𝜇= = x
𝑁 𝑁 n n
Mean is a measurement that shares the total by the number of data equally.
Example 4:
The sales record (number of items sold in August, 2020) of a sample of 15 salespersons selected
from a company was as follow: (in ordered array)
300 345 368 420 488 522 627 668 731 774 802 890 902 904 990
Solution:
⋯
Sample mean = 𝑥̅ = = 648.7333 items
Some salespersons had better sales record and some performed not as good as the others. If
we put all records together and share equally among them, on the average, each person sold
648.73 items. Multiply 648.73 with 15 gives back the total of 9731 items sold in August,
2020.
Remark:
Mean is the most common measure of central tendency.
Affected by extreme values.
Multiply the mean by the number of data equals to the total value.
12
Elementary Statistics
Let’s take a look of this example. In a university, all year 1 students (e.g. N = 2000) have to take a
course “General Statistics”, the following are the examination result (marks) of all students:
The population mean is a unique measurement. It totally reflects the characteristics of the population.
It can only be calculated when census is conducted.
When students are randomly assigned to different classes with each class size equals to 30, the
average result of each class can be calculated:
⋯
Class 1: 28, 32, …, 95, 97 mean result = = 75.3
⋯
Class 2: 33, 35, …, 96, 98 mean result = = 81.4
⋯
Class 3: 30, 31, …, 88, 91 mean result = = 74.2
...
The sample mean is not unique. Its value depends on which data are selected in the sample. The
idea of selecting a “good representative” sample is to avoid subjective selection of data so that the
sample mean is hopefully reasonably close to the unknown population mean.
There is an interesting relationship between random sample means and population mean. We will
discuss it in Chapter 3. In Chapter 4, we will try to make use of a randomly selected sample mean to
estimate the population mean with a reasonable high level of accuracy.
In this chapter, we highlight the difference between the population mean and sample mean and how
to calculate them correctly.
13
Elementary Statistics
(b) Mode
The mode is the value occurs most frequently in the data set. Unlike mean, mode is not affected by
the occurrence of extreme values.
Example 4:
In the above example with sample size n = 15:
300 345 368 420 488 522 627 668 731 774 802 890 902 904 990
we don’t have a mode as all data are having the same frequency.
Remark:
Mode can also be used in categorical data set.
14
Elementary Statistics
Step2: Compile index i as the number of data in group 1, where i = n . Think about the two
cases for handling
median when the size of
So that number of data in group 2 should be n - n
data is odd number and
Step3: Adjust i as the position of the pointer even number!
Example 4:
In the above example with sample size n = 15:
300 345 368 420 488 522 627 668 731 774 802 890 902 904 990
Find the 10th percentile, 40th percentile, and 90th percentile.
Solution:
The bottom 10% salesperson sold less than 345 items and the top 10% salesperson sold more
than 904 items. 40th of the salesperson sold less than 574.5 items.
Remark:
You may have some ideas about the following measurements:
25th percentile = first quartile (Q1)
50th percentile = second quartile (Q2) = median
75th percentile = third quartile (Q3)
15
Elementary Statistics
Measures of Variation
Variation is the amount of dispersion, or spread, in the data. Two data sets with the same mean may
have completely different spreads (for example, both class A and class B have the mean test score of
82.5, however, class A students are very stable while class B students have large deviation). The
measure of central tendency and variation together give a good picture of a data set.
(a) Range
The range is the difference between the largest and smallest observations in a set of data.
Solution:
Range = 990 – 300 = 690 items.
Remarks:
The range is affected by extreme values.
16
Elementary Statistics
IQR = Q3 – Q1
Example 4:
In the above example with sample size n = 15:
300 345 368 420 488 522 627 668 731 774 802 890 902 904 990
Find the interquartile range of the data.
Solution:
Remarks:
1. The interquartile range measures the spread of the middle 50% of the data.
2. It is not affected by extreme values.
17
Elementary Statistics
Population Sample
Variance 2
(x ) 2
s2
(x x) 2
N n 1
Standard deviation
(x ) 2
s
(x x) 2
N n 1
Consider (𝑥 − 𝜇) as a new variable, which measures the square difference of the data point to the
mean. The variance is the average of the square differences. When the variance is small, that means
the difference of the data point to the mean is small, which also means the data points are located
closely together. A more commonly used measure of variation is the standard deviation, which is
simply the square root of the variance.
Besides using the formula to calculate the sample variance and then the sample standard deviation,
the dataset can be inputted into the calculator and the sample standard deviation can be generated
(See appendix in page 24).
18
Elementary Statistics
Example 4:
In the above example with sample size n = 15:
300 345 368 420 488 522 627 668 731 774 802 890 902 904 990
Find the sample variance and standard deviation.
Solution:
⋯
𝑥̅ = = 648.7333 items
Sample variance:
( . ) ( . ) … ( . )
𝑠 = = 52690.50 items2
Alternatively,
sample standard deviation s = 229.5441 items (from calculator)
sample variance s2 = (229.5441)2 = 52690.4952 items2.
On the average, a salesperson sold 648.73 items. However, we know that no one sold exactly
648.73 items, some sold more and some sold less. If we use 648.73 items as a reference point,
then the average difference of the data to this mean is about 229.54 items.
Remarks:
1. Both variance and standard deviation are non-negative.
2. Variance is in the squared units of the original units of the data, e.g. squared dollars. Thus,
the standard deviation is more commonly used as it is in the original units of the data, e.g.
dollars.
3. When we are working on the sample survey data, we compile sample standard deviation as
the summary. We cannot find out the population standard deviation as there is a lot of
missing data out there.
4. The denominator in the sample variance is n - 1 instead of n, which makes the sample
variance the best estimator of the population variance. (We learn the concept of estimator in
more detail in Chapter 4.)
19
Elementary Statistics
Skewness of data
When the relative frequency of a variable at different data value is plotted, the probability density
function of the variable is visualized. A distribution can have many different shapes. We can
classify distributions according to their skewness. A distribution is symmetric if the parts above and
below its centre are mirror images in the density function. A distribution is skewed to the right if the
right side is longer, while it is skewed to the left if the left side is longer.
(a) Symmetric
Q2 – Q1 = Q3 – Q2
(example: height of a 10 years old boy)
In summary, when
Q2 – Q1 = Q3 – Q2 symmetric distribution
Q2 – Q1 > Q3 – Q2 left-skewed distribution
Q2 – Q1 < Q3 – Q2 right-skewed distribution
20
Elementary Statistics
Example 4:
In the above example with sample size n = 15:
300 345 368 420 488 522 627 668 731 774 802 890 902 904 990
Comment on the skewness with reason.
Solution:
There is no standard rule about how to construct the summary. However, naming the variable,
presenting the central tendency with the support of the measure of variation, together with special
observation would give a simple picture of the basic characteristics of the variable.
Example 4:
The variable of this study was the sales (number of items sold in August, 2020) by a salesperson. 15
salespersons were selected from the company to form a sample. On the average, a salesperson sold
648.73 items with the standard deviation of 229.54 items. The range was 690 items. The bottom
10% salesperson sold less than 345 items, while the top 10% salesperson sold more than 904 items
with the median was 668 items.
21
Elementary Statistics
Y as a linear function of X
Sometimes instead of just focusing on the analysis of the given variable, it is also the interest to
analyse a function of it. A simple linear function, which involves multiplication of a constant,
addition of a constant, or both applications, is often observed in daily application.
Y = a + bX
With the summary statistics for variable X has been calculated, the summary statistics of variable Y
can be calculated directly without regenerate the dataset with the following relationship
Summary statistics Y = a + bX
Mean Mean(Y) = a + b Mean(X)
Percentile pth(Y) = a + b pth(X)
Range Range(Y) = |b| Range(X)
IQR IQR(Y) = |b| IQR(X)
Standard deviation SD(Y) = |b| SD(X)
Variance Variance(Y) = b2 Variance(X)
22
Elementary Statistics
Example 4:
After reviewing the summary of the number of items sold by a salesperson in August 2020, the
senior management also wants to have a summary about the monthly salary of a salesperson.
Without regenerate the dataset of the monthly salary, find the mean, 10 th percentile, median, 90th
percentile, range, standard deviation, and variance of the monthly salary, for which, monthly salary is
calculated with basic salary of $20000 and an allowance of $30 for each item sold.
Solution:
By using X to denote the number of items sold in a month and Y be the monthly salary of that
salesperson, Y = 20000 + 30X.
The variable is the monthly salary earned by a salesperson in August 2020. On the average, a
salesperson earned $39461.90 with the standard deviation of $6886.20. The range was $20700. The
bottom 10% salesperson earned less than $30350, while the top 10% salesperson earned more than
$47120 with the median was $40040.
23
Elementary Statistics
Data Set:
163.6 156.2 166.3 179.3 157.8 165.4 159.5 161.7 160.4
3. Input data
163.6 DT 156.2 DT 166.3 DT 179.3 DT
157.8 DT 165.4 DT 159.5 DT 161.7 DT
160.4 DT
5. Change Data
Example : change the first data ‘163.6’ to ‘183.6’
▲/▼ (until you see x1=163.6) 183.6 EXE
6. Delete Data
Example : delete the second data ‘156.2’
▲/▼ (until you see x2=156.2) SHIFT DT
24
Elementary Statistics
Probability
Probability is the likelihood or chance that a particular event will occur.
Classical probability
When tossing a fair die, the probability of observing 1, P(1) = 1/6
When tossing a fair die, the probability of observing an odd number, P(odd) = 3/6 = 1/2
Combination
When 3 students are randomly selected from 10 students, there should be 10C3 = 120 possible
combinations.
Empirical probability
When tossing an unfair die, the probability of observing 1, according to the following frequency
table:
Observation 1 2 3 4 5 6
Frequency 12 20 18 19 22 29
P(1) = = 0.1
P(odd) = = 0.4333
If the die is tossed two times independently, based on the above frequency table, the probability that
the first tossing results as 1 and the second tossing results as 6:
P(first 1 and the second 6) = × = 0.0242
25
Elementary Statistics
By understanding the probability distribution of a variable, it gives us insight to predict the outcome
under an uncertain situation.
26
Elementary Statistics
In general, when the outcome of “something” is unpredictable and can be expressed numerically, it is
considered as a numerical random variable.
Example 1:
What side may face-up when you toss a die?
Example 2:
Imagine you are a librarian working in the borrowing counter in a public library. How many books
may the next reader borrow?
Example 3:
Imagine you are a tour guide. Every day you need to take care of a group of 10 tourists. Each tourist
would choose to visit one and only one of the two theme parks, Ocean Park or Disneyland. Today,
how many out of 10 tourists may go to Ocean Park?
Example 4:
Imagine you are a researcher doing analysis for a telecommunication company. You are required to
collect information about the duration of long-distance calls. How long may a long-distance call last
for?
Discrete vs Continuous
In Chapter 1, we have discussed the difference between discrete data and continuous data:
Discrete: Data only takes place at particular values
Continuous: Data covers a range of values
Let’s see if you can define for which variable(s), in Example 1 to Example 4, is (are) discrete
variable and which variable(s) is(are) continuous variable:
Discrete variable:
Continuous variable:
27
Elementary Statistics
The probability distribution function, pdf, for a discrete variable X, can be represented by a formula,
a table, or a graph, which provides the probabilities P(X = x) = p(x) corresponding to each value of x
and it has the properties :
1. 0 p ( x ) 1,
2. p( x) 1 where the summation is over all possible values of x with non-zero probability.
x
Remarks:
1. Summarizing the discrete variable as a probability distribution function helps us to
a. understand the possible range of the outcome
b. evaluate which outcome has a relatively higher chance of happening than the others
2. There are many ways to prepare the probability distribution function, by theoretical approach,
experiment, observation, survey, …
28
Elementary Statistics
Let's review the probability distribution functions for our discrete random variables.
Example 1:
Variable X: result of tossing a die
If the die is fair, it is expected that the chance of happening for each possible outcome is the same, so
the probability distribution function should be:
x 1 2 3 4 5 6
p(x) 1/6 1/6 1/6 1/6 1/6 1/6
However, if the die is unfair, we cannot simply assume each outcome has the same chance of
happening. An alternative way to find the probability distribution function is by experiment.
Suppose we do a series of experiment (for example, toss this single die 100 times) and find out the
empirical probability for each possible outcome by referring to the observed relative frequency. If
the following is the result of tossing this single die 100 times,
x 1 2 3 4 5 6
Frequency 18 12 32 16 12 10
Example 2:
Variable X: number of book a reader borrows from the public library
We can generate the probability distribution function by referring to the book borrowing record in
the library system. The idea is to use the relative frequency to project the probability as in Example
1.
Here, we do not present you with the whole borrowing record, instead, the probability distribution
function is presented as follow:
x 1 2 3 4 5 6 7 8
P(X =x) 0.02 0.07 0.15 0.28 0.33 0.10 0.03 0.02
29
Elementary Statistics
Example 3:
Variable Y: how many tourists, out of a group of 10 tourists, will go to Ocean Park
There are two ways to prepare this probability distribution function.
Method 1: check the past record of the number of tourists go to Ocean Park every day for a long time
and use the relative frequency as a projection.
Method 2: Suppose based on the previous experience, 60% of the individual tourist visited Ocean
Park, while the other 40% individual tourist visited Disneyland.
In a later session, we would derive the probability distribution function of the number of tourists
(among 10) may visit Ocean Park theoretically by a Binomial distribution. The result is summarized
here:
y 0 1 2 3 4 5 6 7 8 9 10
p(y) 0.0001 0.002 0.011 0.042 0.111 0.201 0.251 0.215 0.121 0.040 0.006
Referring to the table, it is easy to aware that most likely there will be around 4 to 8 tourists (with a
probability of 0.899) visit Ocean Park in a group of 10 tourists.
30
Elementary Statistics
Definitions
For a discrete random variable X with probability distribution p(x),
1. The expectation (or expected value or mean) of X, denoted by X or E(X), may be considered
as its weighted average over all possible outcomes – the “weights” being the probability
associated with each of the outcomes. i.e. p(x)
The mean of X can be interpreted as the average value of X in the long run.
Example 2:
In this table, X represents the number of books a reader borrows in one visit to the public library.
x 1 2 3 4 5 6 7 8
P(X =x) 0.02 0.07 0.15 0.28 0.33 0.10 0.03 0.02
On the average, how many books a reader borrows in one visit to the public library?
Solution:
E(X) = 1(0.02) + 2(0.07) + 3(0.15) + 4(0.28) + 5(0.33) + 6(0.10) + 7(0.03) + 8(0.02) = 4.35 books
On the average, a reader borrows 4.35 books from the public library.
Of course, not every reader borrows the same amount of books and actually none of them would
borrow 4.35 books. How large is the deviation between readers in terms of the number of books they
borrow?
31
Elementary Statistics
2. The variance of X, denoted by X2 or Var(X) may be defined as the weighted average of the
squared discrepancies (i.e. difference) between each possible outcome and its mean.
Var( X ) E( X 2 ) ( E( X )) 2
While the positive square root of the variance gives the standard deviation of X.
x Var(X )
Example 2:
In this table, X represents the number of books a reader borrows in one visit to the public library.
x 1 2 3 4 5 6 7 8
P(X =x) 0.02 0.07 0.15 0.28 0.33 0.10 0.03 0.02
Find the variance and standard deviation of the number of books a reader borrows from the public
library.
Solution:
Var(X) = (1- 4.35)2(0.02) + (2 - 4.35)2 (0.07) + (3 - 4.35)2 (0.15) + (4 - 4.35)2 (0.28)
+ (5 - 4.35)2 (0.33) + (6 - 4.35)2 (0.10) + (7 - 4.35)2 (0.03) + (8 - 4.35)2 (0.02)
= 1.8075 (formula similar to that in Chapter 1)
Var(X) = 12(0.02) + 22(0.07) + 32(0.15) + 42(0.28) + 52(0.33) + 62(0.10) + 72(0.03) + 82(0.02) – 4.352
= 1.8075
On the average, a reader borrows 4.35 books from the public library with a standard deviation
of 1.34 books.
32
Elementary Statistics
Example 5:
We need to evaluate the profit made by selling a particular brand of photocopier “PhotoC”. We want
to check if it meets the target of having an average weekly profit of $5000.
Based on the limited information, suppose this is all we know about “PhotoC”:
1. Below is the probability distribution function of the number of “PhotoC” sold in a week, X, based
on past record.
x 0 1 2 3 4 5
p(x) 0.05 0.22 0.38 0.24 0.10 0.01
Based on the given information about the variable weekly sales, a direct analysis of the weekly sales,
in terms of the expected value and standard deviation are:
On the average, the weekly sales of PhotoC is 2.15 items with a standard deviation of 1.06
items.
33
Elementary Statistics
As our target is to evaluate the weekly profit, we need to link up sales and profit by denoting Y as the
weekly profit and consider Y as a function of X:
1st method for the calculation of the expected value of Y = f(X): by substitution
After we substitute the value of x into the formula Y = 4340X – 3500, we have the probability
distribution of Y as:
x 0 1 2 3 4 5
y=f(x) -3500 840 5180 9520 13860 18200
p(y) 0.05 0.22 0.38 0.24 0.10 0.01
On the average, the weekly profit gained by selling PhotoC is $5831 with a standard deviation
of $4608.38.
34
Elementary Statistics
When Y is a linear function of X, so that Y = a + bX, where a and b are constant, then
E(Y) = a + b E(X),
Var(Y) = b2Var(X),
σ(Y) = |b|σ(X)
2nd method for the calculation of the expected value of Y: (only apply for linear function)
Y = 4340X – 3500 = – 3500 + 4340X
With a = – 3500, b = 4340
On the average, the weekly profit gained by selling PhotoC is $5831 with a standard deviation
of $4608.38.
35
Elementary Statistics
If we denote one of the two outcomes as success and the other outcome as failure, then after one
experiment you will have either one success or one failure. If you continuously conduct / observe a
series of identical experiments, you will have uncertain number of successes obtained. Sometimes,
we are interested to review the probability distribution function of the number of successes after a
series of identical experiments.
Just as in example 3 (page 30), when we just ask one tourist, the tourist may go either Ocean Park or
Disneyland. If we define visiting Ocean Park as success, then visiting Disneyland must be defined as
failure. By asking 10 tourists, you may result as 10 successes, 9 successes, 8 successes, …, 1 success
or 0 successes. Regarding to the number of tourists may go Ocean Park, there are 11 possibilities. In
this session, we try to derive the probability of each of these 11 possibilities theoretically.
36
Elementary Statistics
By denoting X as the number of successes among n identical trials with the probability of success in
each trial is p, the variable X is said to follow the binomial distribution and is commonly denoted as
X ~ Bin(n, p)
P( X x) p( x) nCxp x (1 p) n x x 0, 1, ..., n
n!
where Cx
n .
x!(n x)!
Our interest is the total number of successes in n trials. The possible number of successes must be an
integer, which ranges from 0 to n. The product p x (1 p) n x tells us the probability of obtaining
exactly x successes out of n observations in a particular sequence. (n-x) is the number of failures
while the term nCx tells us how many sequences or arrangements (combinations) of the x successes
out of n observations are possible.
37
Elementary Statistics
Example 3:
Suppose the chance of one tourist visits Ocean Park is given as 0.6 (p = 0.6).
If you ask only 2 tourists about their preferences, you may get the following answer:
OO, OD, DO, DD
Converting the outcomes as the number of tourists visit Ocean Park, Y, a simple mapping will tell
you that
OO → y = 2 Do you remember how
to construct the two
OD → y = 1 levels tree diagram?
DO → y = 1
DD → y = 0
Y is a random variable and y = 0, 1, or 2.
p(0) = P(DD) = (0.4)(0.4) = 0.16
p(1) = P(OD) + P(DO) = 0.6(0.4) + (0.4)(0.6) = 2(0.6)(0.4) = 0.48
p(2) = P(OO) = 0.6(0.6) = 0.36
As a summary, let Y be the number of tourists, among 2 tourists, may go to Ocean Park,
Y ~ Bin(2, 0.6),
y 0 1 2
P(Y = y) 0.16 0.48 0.36
38
Elementary Statistics
Well, as now we have a group of 10 tourists, we need to know how many of them prefer visiting
Ocean Park. We have already studied, the possible number of them will go Ocean Park can be 0, 1,
2, 3, 4, 5, 6, 7, 8, 9, 10. In order to project the probability, we need to do the mapping as before with
a larger scale.
p(0) = P(DDDDDDDDDD) = (0.4)(0.4) …(0.4)
= (0.4)10 = 0.0001
If we summarize the above calculation and let Y be the number of tourists will go Ocean Park among
10 tourists and the chance of each tourist would go Ocean Park is known as 0.6,
then Y ~ Bin(10, 0.6).
Starting from y = 0 to y = 10, we can calculate the probability for each y by the formula:
P(Y = y) = 10Cy (0.6)y(1 - 0.6)10-y and summarize the probability distribution function of Y as:
y 0 1 2 3 4 5 6 7 8 9 10
p(y) 0.0001 0.002 0.011 0.042 0.111 0.201 0.251 0.215 0.121 0.040 0.006
Most likely, 6 out of 10 tourists would go to Ocean Park. Relatively there is a high chance to
have around 4 to 8 out of 10 tourists go to Ocean Park.
39
Elementary Statistics
Example 5:
In a casino, the probability to win a certain game is 0.3. Suppose you are going to play the game 4
times, what is the probability that
(a) you will win exactly two games?
(b) you will win at least two games?
Solution:
Let X be the number of games you will win, X ~ Bin(4, 0.3)
(a) P(X = 2) = 4C2(0.3)2(0.7)2 = 0.2646
(b) P(X ≥ 2) = P(X = 2) + P(X = 3) + P(X = 4)
= 4C2(0.3)2(0.7)2 + 4C3(0.3)3(0.7)1 + 4C4(0.3)4(0.7)0 = 0.2646 + 0.0756 + 0.0081
= 0.3483
40
Elementary Statistics
y 0 1 2 3 4 5 6 7 8 9 10
p(y) 0.0001 0.002 0.011 0.042 0.111 0.201 0.251 0.215 0.121 0.040 0.006
How about we have many groups of tourists, each group with 10 tourists, then what is the average
number of tourists will go to Ocean Park in each group?
As in example 3, if for each group with 10 tourists, and the probability of each tourist go to Ocean
Park is 0.6, then
E(Y) = 10(0.6) = 6
Var(Y) = 10(0.6)(0.4) = 2.4
𝜎(𝑌) = 10(0.6)(0.4) = 1.549
Remark:
The calculation of the expectation of a Binomial variable gives us some insight about the most likely
number of happenings in a group. As in our example, when there are 10 tourists in a group, with the
expectation of the number of tourists will go to Ocean Park is calculated as 6 tourists, that means
most likely, there would be about 6 tourists go to Ocean Park. The standard deviation of 1.549 helps
us to widen a range including those neighbours with relatively high chance. (You may compare this
with the table in page 38)
41
Elementary Statistics
42
Elementary Statistics
(a) This graph indicates the time a baby needs to finish a simple task in a regular body check.
Regarding to the graph, you would see that a baby takes 1 to 5 minutes to finish the task. Unlike
discrete random variable, there are infinitely many possibilities between 1 and 5 minutes. A
horizontal probability density function means that it is equally likely for a baby to finish the task at
every possible time, between 1 and 5 minutes.
(b) This graph indicates the time a student spends on revision in a week. This random variable
takes any value greater than 0 and the curve shows a down going (exponential decay) pattern. It
shows that most students do not spend much time on doing revision.
43
Elementary Statistics
Normal Distribution
The normal distribution is the most widely used continuous distribution in statistics. For simplicity,
if a random variable X is normally distributed with population mean and population variance 2,
we say that
X ~ N( , 2).
μ − 3σ μ − 2σ μ−σ μ μ+σ μ + 2σ μ + 3σ X
Example 6:
The waiting time for checking in a room in a hotel is normally distributed with mean 18 minutes and
standard deviation 4 minutes. By denoting X as the waiting time, X ~ N(18, 42). A simple review of
the waiting time gives us the following:
Half of the customers have to wait for more than 18 minutes for checking in a room.
There is about 34.1% chance that a customer has to wait 18 to 22 minutes. There is another
34.1% chance that a customer has to wait 14 to 18 minutes.
44
Elementary Statistics
Example 7:
The spending on a cup of coffee in a café follows a normal distribution with mean $50 and standard
deviation $10. By denoting X as the spending on a cup of coffee by a customer, X ~ N(50, 102). A
simple review of the examination result tells:
Half of the customers spend more than 50 dollars for a cup of coffee.
About 68.2% customers spend between 40 to 60 dollars.
About 2.3% customers spend less than 30 dollars.
Besides knowing the above basic information, can we do further analysis, such as
(a) What is the probability that a customer spends more than $53 for a cup of coffee?
(b) What is the minimum spending on a cup of coffee for the top 10% customers?
45
Elementary Statistics
Z=
-3 −2 −1 0 1 2 3 Z ~ N(0, 12)
46
Elementary Statistics
Below is first few rows of the standard normal table which keeps the probability function of
P(0 < Z < z), where z is any positive number correct to 2 decimal places. Let’s see how to use it to
read the probability function relate to a position z = 0.32.
0 0.32
The entries in Table I are the probabilities that a random variable having the standard
normal distribution will take on a value between 0 and z. They are given by the area of
the gray region under the curve in the figure.
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
47
Elementary Statistics
The entries in Table I are the probabilities that a random variable having the standard
normal distribution will take on a value between 0 and z. They are given by the area of
the gray region under the curve in the figure.
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4648 0.4656 0.4664 0.4671 0.4678 0.4685 0.4692 0.4699 0.4706
1.9 0.4713 0.4719 0.4725 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964
2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981
2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986
3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990
Also, for z = 4.0, 5.0 and 6.0, the areas are 0.49997, 0.4999997, and 0.499999999.
48
Elementary Statistics
P( <Z< )
Example 7:
(a) What is the probability that a customer spends more than $53 for a cup of coffee when the
spending is normally distributed with mean of $50 and standard deviation of $10?
Solution:
53 X 0.30 Z
50 Z 0
53 50
P ( X 53) P Z P ( Z 0.3)
10
= 0.5 – 0.1179 = 0.3821
In conclusion, the chance that a customer spends more than $53 for a cup of coffee is 0.3821.
49
Elementary Statistics
In order to find the location of k which fulfils the specify probability requirement, we can follow the
following procedure:
1. Locate the unknown normal score (k) reasonably in the normal curve. Make sure you
aware if the normal score should be smaller than the mean or bigger than the mean.
3. Rewrite the probability statement for variable Z and find the value of a from the standard
normal table.
P(a < Z < 0) where a should be negative
or P(0 < Z < 𝑎) where a should be positive
4. Transform a back to k:
k = µ + (a)
50
Elementary Statistics
Example 7:
(b) In the café, the spending on a cup of coffee, X, is known to follow normal distribution with
mean $50 and standard deviation $10. The manager of the café wants to know the minimum
spending for a cup of coffee for the top 10% customers. For P(X > k) = 0.1, what is the value of k?
Solution:
50%
40%
10%
50 k
0 a
The minimum spending for the top 10% customers would be $62.8.
51
Elementary Statistics
Example 8:
In the cafe shop, besides selling coffee, it also sells sliced cakes. It is known that the spending on a
piece of cake follows a normal distribution with mean of $32 and standard deviation of $6.
(a) What is the probability that a customer spends less than $30 for a piece of cake?
(b) There are 60% customers would spend more than $k for a piece of cake, what is the value of k?
Solution:
Use Y to denote the spending on a piece of cake, Y ~ N(32, 62)
(a) A graph indicates Y < 30: (b) A graph indicates 60% of the spending is
more than $k:
(a) 𝑃(𝑌 < 30) = 𝑃 𝑍 < = 𝑃(𝑍 < −0.33) = 0.5 − 0.1293 = 0.3707
52
Elementary Statistics
Y ~ N(a + b µ, (b σ)2)
Example 7:
In the café, the spending on a cup of coffee, X, is known to follow normal distribution with mean $50
and standard deviation $10. Suppose the owner of the café is considering adjustment of the selling
price of each cup of coffee by marking up the original price by 8% and then a discount of $2 will be
applied.
(a) What are the (i) mean and (ii) standard deviation of the selling price of a cup of coffee after
the adjustment?
(b) After the adjustment, what is the probability that someone buy a coffee which costs $54 or
more?
Solution:
(a) With X as the notation of the original price of a cup of coffee and use Y to denote the
adjusted price, Y = 1.08X – 2
(i) Mean of Y = 1.08E(X) – 2 = 1.08(50) – 2 = $52
(ii) Standard deviation of Y = 1.08 σ (X) = 1.08(10) = $10.8
53
Elementary Statistics
Example 9:
In the café, the spending on a cup of coffee, X, is known to follow normal distribution with mean $50
and standard deviation $10. It is also known that, the spending on a piece of cake, Y, follows a
normal distribution with mean $32 and standard deviation $6,
X ~ N(50, 102);
Y ~ N(32, 62)
Every day, there are many customers buying one cup of coffee and one piece of cake.
Imagine you want to review the total spending of a customer on a cup of coffee and a piece of cake:
Solution:
(a) For T be the total spending, T = X + Y
T ~ N(50 + 32, 102 + 62);
T ~ N(82, 11.66192)
On the average, a customer will spend $82 when ordering a cup of coffee and a piece of cake.
The standard deviation of the total spending on the two items is $11.6619.
(b) P(T > 80) = P 𝑍 > = P(Z > -0.17) A graph indicates T > 80:
.
54
Elementary Statistics
Chapter 3
Sampling Distributions and Central Limit Theorem
In chapter 2, we look at the distribution function of a variable. In chapter 3, we consider sample
mean as a variable and try to understand the distribution function of the sample mean and we name it
as the sample mean distribution. We would also look at the distribution function of the sample
proportion.
By understanding the sampling distribution, we can prepare ourselves to understand the study of
inferential statistics in the next Chapter.
55
Elementary Statistics
In Chapter 1, page 13, we have briefly reviewed the idea that sample mean is not unique, but a
variable which depends on the data in the sample.
In a university, all year 1 students have to take “General Statistics”. The population mean and
standard deviation of the result of all students are:
⋯
population mean score = = 68.65
( . ) ( . ) ⋯ ( . )
and population standard deviation = = 12.4
When students are randomly assigned to different classes with each class size equals to 30, the
average score of each class can be calculated:
⋯
Class 1: 28, 32, …, 95, 97 mean score = = 65.3
⋯
Class 2: 33, 35, …, 96, 98 mean score = = 61.4
⋯
Class 3: 30, 31, …, 88, 91 mean score = = 64.2
It is easy to understand from the above example that sample mean is not unique, but a variable.
If sample mean is a random variable, what are the characteristics of this random variable? Is it
discrete or continuous? What are the mean and standard deviation of this random variable?
56
Elementary Statistics
Firstly, make sure you understand that sample mean, 𝑋, is a random variable. Then, we can take a
look at the characteristics of this random variable.
Sample 1
x1
Variable X,
population mean μ
population variance 2
Sample 2
x2
Sample 3
x3
Imagine we are now selecting samples with sample size n repeatedly from the population with
population mean µ and population standard deviation are given, the sample mean of each sample is
calculated and denoted as 𝑥̅ . The sample mean distribution characteristics can be summarized as
follows:
57
Elementary Statistics
If the sample size is reasonably large (n ≥ 30), the sample mean distribution is well approximated by
a normal distribution. (Central Limit Theorem)
2
X ~ N ,
n
With the requirement (i) or (ii) (or both) fulfilled, further analysis by using the normal variable
characteristics can be conducted.
58
Elementary Statistics
Let’s look at some examples to have a better understanding of the sample mean distribution.
Example 1:
As mentioned earlier, suppose the examination result of General Statistics is as follow:
Population mean score: 68.65
Population standard deviation: 12.4
For every 30 students are randomly assigned to a class, the sample mean distribution of class average
is as follow:
Mean of class mean: 68.65
.
Variance of class mean: =5.1253
.
Standard error of class mean: = 2.2639
√
If you compare the performance between individual students, the mean score is 68.65 and the
standard deviation is 12.4. However, if you compare the performance between different classes
by using class mean to represent the performance of the class, the average of the mean score is
68.65 and the standard deviation is 2.2639. It is not a surprise that comparison between classes
should be more stable than comparison between students as in each class we have some well
performed and not so well performed students. The class mean takes balance between the high
marks and low marks.
59
Elementary Statistics
Example 2:
A report indicates that on average a tourist spends $5000 in a 3-days trip to Taiwan. The standard
deviation of the spending is $600 so the variance is 360000($2). Imagine you are a tour guide and
you take care of a group (sample) of 40 tourists every day. If you make a long term record of the
mean spending of each group of 40 tourists per day, then you should aware that the mean spending in
each group is not constant, but a variable. Use X to denote the spending of an individual and X to
denote the mean spending of a sample of 40 tourists, then
Because of the large sample size (n = 40 > 30), the sample mean spending follows a normal
distribution,
𝑋~𝑁(5000, 94.87 )
Example 3:
Samples of 25 light bulbs are randomly selected from the factory regularly for quality control
checking. The machine has been set up with the mean lifetime of the light bulbs at 1000 hours and
standard deviation at 80 hours. It is reasonable to assume the lifetime of the light bulbs follow a
normal distribution. Use X to denote the lifetime of a light bulb and 𝑋 to denote the mean lifetime
of a sample of 25 light bulbs, then
Because the lifetimes of light bulbs follow a normal distribution, the sample mean lifetimes also
follow a normal distribution,
𝑋~𝑁(1000, 16 )
60
Elementary Statistics
When the population variable is a numerical variable (e.g. examination result), the sample is usually
summarized by the calculation of the sample mean. When the population variable is a categorical
variable (e.g. gender of a student), the sample is then summarized by the calculation of the sample
proportion.
Example 1:
Imagine, for the same group of 2000 students taking the course “General Statistics”, there are 1500
male, 500 female. The variable gender is a categorical variable. Here, we use p to denote the
population proportion of male, for example, p = 0.75.
If a class of 30 students has 24 male and 6 female, we can use 𝑝̂ to denote the class proportion of
male (sample proportion) such that 𝑝̂ = 0.8.
Imagine now we select another class of 30 students, it is easy to realize the proportion of male in this
class may or may not be the same as the previous class. We have so many classes of students and
each class has its own class proportion of male. Again, we should consider sample proportion as a
random variable.
Now, try to include all possible sample proportions and review its probability density function.
61
Elementary Statistics
When p is used to denote the given population proportion, the characteristics of the density function
of the sample proportion 𝑝̂ can be summarized as follows:
𝑝(1 − 𝑝)
𝑆𝐸(𝑝̂ ) =
𝑛
When the sample size is reasonably large (n ≥ 30), the sample proportion distribution is well
approximated by a normal distribution. (Central Limit Theorem)
𝑝(1 − 𝑝)
𝑝̂ ~𝑁(𝑝, )
𝑛
for n > 30, np > 5, n(1-p) >5
62
Elementary Statistics
Example 1:
For all year 1 students taking the course “General Statistics”, it is known that the population
proportion of male, p = 0.75.
For every 30 students are randomly assigned to a class, the distribution of proportion of male in a
class is as
Mean of class proportion of male = 0.75
. × .
Variance of class proportion of male = = 0.00625
. × .
Standard error of class proportion of male = = 0.0791
Approximately there are about 75% male in each class, but it is not fixed. The proportion of
male in a class has a standard deviation of 7.91% around the true level of 75%.
Example 4:
Assume that among all customers of a jewelry shop, 40% customers are classified as “high
spending”. If random samples of size 70 are selected, and each time the sample proportion of
customers classified as “high spending” is calculated and denoted as 𝑝̂ , then
E(𝑝̂ ) = 0.4
. ( . )
Var(𝑝̂ ) = = 0.0034
. ( . )
SE(𝑝̂ ) = = 0.05855
As the sample size n = 70 is reasonably large, the sample proportion is normally distributed
𝑝̂ ~N(0.4, 0.05855 )
63
Elementary Statistics
Chapter 4 Estimation
In the previous chapter, for a continuous random variable X with given population mean 𝜇 and
population standard deviation , the sample mean distribution for samples with sample size n
consists of the following characteristics:
(i) 𝐸(𝑋) = 𝜇
(ii) 𝑉𝑎𝑟(𝑋) =
(iii) SE(𝑋) =
√
Similarly, for any categorical random variable X with given population proportion in favour to one
particular option is denoted as p, the sample proportion distribution for samples with sample size n
consists of the following characteristics:
(i) 𝐸(𝑝̂ ) = 𝑝
( )
(ii) 𝑉𝑎𝑟(𝑝̂ ) =
( )
(iii) SE(𝑝̂ ) =
In this chapter, because of the above sampling distribution characteristics, we are going to study the
technique of estimating the unknown population mean (population proportion) by using the sample
mean (sample proportion) obtained from the survey.
64
Elementary Statistics
Example 1:
You are asked to review the lifetime of the light bulbs produced in a factory by reporting the
population mean lifetime. Lifetime is a continuous random variable. According to the information
provided by the factory, the population mean lifetime 𝜇 is unknown while the population standard
deviation is known to be 80 hours. How can we estimate the population mean lifetime by not doing
a census but only conducting a survey with sample size n = 50?
65
Elementary Statistics
Example 1:
In order to estimate the population mean lifetime, a random sample of 50 light bulbs is selected. The
sample mean lifetime is calculated as 680 hours.
Solution:
The point estimate of the population mean lifetime is 680 hours.
The sample mean is a point estimate of the population mean. Definitely, a certain level of error in
the estimation is expected. The problem is: can the error in the estimation be calculated?
66
Elementary Statistics
How large is this sampling error? We cannot derive the sampling error for a particular sample as the
population mean is unknown (you must remember this point). However, we can derive the sampling
error at a certain confidence level (some statisticians named this maximum sampling error as margin
of error), e.g. 95% confidence level. In order to derive the sampling error at a certain confidence
level, we must be familiar with the normal characteristics of the sampling distribution.
Example 1:
As you remember, we just mentioned the lifetime of the light bulb in a factory has the following
characteristics:
population mean 𝜇, which is unknown,
population standard deviation = 80 hours
In order to estimate the population mean lifetime, a random sample of 50 light bulbs is selected. If
we don’t just focus on one particular sample, but consider we can repeatedly selecting many samples,
each with sample size n = 50, then the sample mean distribution is as:
80
𝑋~𝑁(, )
50
As 95% of z-scores lies between (-1.96, 1.96)
then, 95% of sample means lies between ( − 1.96 × , + 1.96 × )
√ √
Proof:
P(-1.96 < Z < 1.96) = P(-1.96 < < 1.96) = 𝑃(−1.96 × < 𝑋 − 𝜇 < 1.96 × )
√ √
√
67
Elementary Statistics
0.025
0.025
-1.96 0 1.96 Z
𝜎 𝜎
𝜇 − 1.96 𝜇 𝜇 + 1.96 𝑋
√𝑛 √𝑛
That means there are 95% cases the error of the estimation is less than 22.175 hours.
We call 𝑧 / the critical value, while α 2 is the upper tail area in the normal curve.
68
Elementary Statistics
There is a 95% chance that the difference between the calculated sample mean and the true
population mean is no more than 22.17 hours. Only 5% chance the difference is more than
22.17 hours.
69
Elementary Statistics
(𝑥̅ − 𝑧 / × , 𝑥̅ + 𝑧 / × )
√ √
Let’s take a look of how to construct the 95% confidence interval estimate. As the confidence level
is set at 95%, the sampling error is calculated as 1.96 × . If repeated sampling is conducted and
√
0.025 𝜎 0.025
1.96
√𝑛
sample 1:
sample 2:
sample 3:
sample 4:
sample 5:
sample 6:
sample 7:
sample 8:
sample 9:
sample 10:
……
We can see from the above diagram that most of the constructed intervals can cover the true
unknown population mean, only a few cannot. In fact, of all these constructed intervals, 95% can
cover the true unknown population mean.
Practically, if only one random sample is selected, there is 95% chance that the constructed
confidence interval can successfully include the unknown population mean.
70
Elementary Statistics
Example 1:
As the sample mean lifetime of 50 light bulbs is 680 hours and the 95% sampling error is calculated
as 22.1749 hours, the 95% confidence interval estimate of the population mean lifetime is:
(680 − 1.96 × , 680 + 1.96 × ) = (657.8251, 702.1749) hours
√ √
As a summary,
The unbiased point estimate of the population mean is 𝑥̅
Example 2:
The manager of a beauty counter wants to review the spending of the customers. The population
mean spending is unknown and the population standard deviation is $180. He estimates the
population mean spending by randomly select 60 customers. The sample mean spending of the
selected 60 customers is $880.
(a) What is the point estimate of the population mean?
(b) What is the sampling error at 90% confidence level?
(c) What is the 90% confidence interval estimate of the population mean?
Solution:
(a) The point estimate of the population mean spending is $880
(b) With σ = 180, n = 60,
the sampling error at 90% confidence level = 1.645 × = $38.2264
√
(c) The 90% confidence interval estimate of the population mean is:
(880 − 1.645 × , 880 + 1.645 × ) = $ (841.77, 918.23)
√ √
The population mean spending is point estimated as $880 with a 90% sampling error of
$38.2264.
71
Elementary Statistics
for which is the population standard deviation and s is the sample standard deviation, which is the
best estimator of the unknown population standard deviation.
72
Elementary Statistics
Standardized
Normal
Z
t
0
T-distribution is very similar to the standard normal distribution, while t-distribution has relatively
fatter tails. When the degrees of freedom (degrees of freedom is defined as sample size minus 1, df =
n - 1) increases, the t-distribution is getting more similar to the standard normal distribution. The
reason behind it is a larger sample size makes the sample standard deviation a more accurate
estimator of the population standard deviation. It is well accepted that when the degrees of freedom
is greater than 29, the t-distribution is well approximated by the standard normal distribution.
Let’s use the standard normal table and t-table to look up the middle 95% data:
Standard normal distribution : -1.96 to 1.96
t-distribution with degrees of freedom 5 (sample size = 6): -2.571 to 2.571
t-distribution with degrees of freedom 13(sample size = 14): -2.160 to 2.160
t-distribution with degrees of freedom larger than 29: -1.96 to 1.96
73
Elementary Statistics
The entries in Table II are values for which the area to their right under the t distribution with
given degrees of freedom (the gray area in the figure) is equal to .
TABLE II VALUE OF t
d.f. t0.050 t0.025 t0.010 t0.005 d.f.
74
Elementary Statistics
By using t-distribution as a replacement of the standard normal distribution, now we can estimate the
population mean with the 3 steps procedure:
where 𝑡 / is the critical value with α/2 as upper tail area and n - 1 as the degrees of freedom.
Remark:
The t-distribution is developed with the assumption that the random variable X follows a normal
distribution. Practically, we can use the t-distribution to estimate the population mean when the
sample size is large enough (n > 30).
Example 3:
In order to estimate the population mean age of patients of a dentist, a random sample of 20 patients
is selected. The sample mean age is 37.4 and the sample standard deviation is 7.8. Assume that the
age of all patients follow a normal distribution.
(a) What is the point estimate of the population mean?
(b) What is the sampling error at 90% confidence level?
(c) What is the 90% confidence interval estimate of the population mean?
Solution:
With 𝑥̅ = 37.4, s = 1.8, n = 20, d.f. = 19, t19, 0.05 = 1.729
(a) point estimate of population mean age is 37.4
.
(b) 90% sampling error is 1.729 × = 3.0156
√
. .
(c) 90% C.I. of the population mean is (37.4 − 1.729 × , 37.4 + 1.729 × )
√ √
= (34.3844, 40.4156)
The population mean age of all patients is point estimated as 37.4 with the 90% sampling
error of 3.0156.
75
Elementary Statistics
/2
/2
-z/2 0 z/2 Z
𝑝(1 − 𝑝) 𝑝(1 − 𝑝)
𝑝 − 𝑧/ 𝑝 𝑝 + 𝑧/ 𝑝̂
𝑛 𝑛
When the population proportion of a population is unknown, we would use the sample proportion,
together with the normal distribution characteristics to do the estimation.
( )
The sampling error at 100(1 - α)% confidence level is 𝑧 /
𝑝̂ (1 − 𝑝̂ ) 𝑝̂ (1 − 𝑝̂ )
(𝑝̂ − 𝑧 / , 𝑝̂ + 𝑧 / )
𝑛 𝑛
76
Elementary Statistics
Example 4:
A large insurance company is conducting a survey to reveal the proportion of employees have taken
the professional examination in the past two years. A survey involves 200 employees indicating 125
of them have taken the professional examination in the past two years.
(a) What is the point estimate of the population proportion of employees have taken the
examination?
(b) What is the sampling error at 90% confidence level?
(c) What is the 90% confidence interval estimate of the population proportion of employees have
taken the examination?
Solution:
(a) The point estimate of the population proportion of employees have taken the examination =
0.625
. ( . )
(b) The sampling error at 90% confidence level = 1.645 = 0.0563
. ( . ) . ( . )
(c) The 90% C.I. of p = (0.625 − 1.645 , 0.625 + 1.645 )
= (0.5687, 0.6813)
The population proportion of employees have taken the professional examination in the past two
years is point estimated as 62.5% with the 90% sampling error of 5.63%.
77
Elementary Statistics
A time series is a sequence of measurements of the same variable collected over time. Most
commonly, the measurements are made at regular time interval, which can be daily, weekly,
monthly, quarterly, or yearly.
78
Elementary Statistics
Based on the graph, we can aware the trend, cyclical, seasonal changes of the variable across time.
1. The nearly straight line through the middle of the graph shows the overall direction of
movement of the variable with time. There is a general upward movement. This type of
variation is called the trend. The trend maintains the general direction for a long time.
2. The gently curving line which moves from side to side of the trend line represents cyclical
variation in the time series. This is an approximately periodic variation in the data values, and
the period, in as much as it can be discerned, will be of several years’ duration. The
amplitude of the cycles may also vary from one part of the time series to another.
3. The variation highlighted by the lines joining the plotted points together is the seasonal
variation. This is the variation from one part of the year to another. Seasonal variation is
usually regular in its period and its amplitude. In the example shown, we have larger values
in the second and third quarters of the year than in the first and fourth quarters.
4. A fourth kind of variation, which cannot be effectively represented on the graph, is called
random variation. Random variation is associated with random one-off occurrences such as
strikes or natural disasters, with sampling errors that occur in data collection, or with
rounding errors in processing and presentation of the data.
79
Elementary Statistics
Our attention will be restricted to short time series, so the cyclical variation will not manifest itself
and is usually linked with the trend, so there we concentrate on three components — trend, seasonal
and random only.
Additive Model
With the availability of the observed past data, it is possible to build a time series model (e.g. moving
average model) to smooth out short-term frustrations and focuses on longer-term trends.
Let’s use t, s and r to denote the trend, seasonal variation and the random variation respectively.
Additive Model expresses the time series data as the sum of the three components with the
assumption that seasonal and random factors are independent of the trend.
y=t+s+r
Besides additive model, there are also multiplicative model and other models. In this course, we start
from the basic and assume the time series data following the additive model.
Analysis Approach
We will follow the procedures below to analyse the time series data:
1. Find the trend by moving average
2. Find the seasonal variation
80
Elementary Statistics
The first of the five-term moving average = , with this moving average is centered on
The second of the five-term moving average = , which is centered on the 4th position.
Remark:
Notice that the average is in line with the middle value of the set being worked on. Be aware that
there are no trend values corresponding to the first and last original values.
81
Elementary Statistics
To demonstrate the technique, a set of moving averages of period of t = 5 has been calculated below
for a set of values.
Example 1:
Below is the number of applications received in the receptionist counter in a driving school during
the past two weeks (the counter opens on weekdays only).
Week 1 Week 2
Mon Tue Wed Thu Fri Mon Tue Wed Thu Fri
Number of applications 12 10 11 11 9 11 10 10 11 10
Solution:
The first average, located at the 3rd position is formed by .
82
Elementary Statistics
Example 2:
(a) Describe the trend and seasonal variation by reading the time series plot.
(b) Find the trend by calculating the 4-period moving average.
83
Elementary Statistics
Solution:
(a) The sales revenue has a general upward trend. Within a year, a better performance is
observed in Quarter 4, followed by Quarter 1. The revenue is relatively weaker in Quarter 3
and the worst is observed in Quarter 2.
(b) The first average, located at the 2.5th position is formed by = 75.50.
84
Elementary Statistics
When values are obtained to describe seasonal variation, they are sometimes known as seasonal
values of factors and are expressed as deviations (i.e. ‘+’ or ‘–’) from the underlying trend. They
show on average, by how a particular season will tend to increase or decrease the underlying trend.
Example 2:
The following data shows the sales revenue of a small company:
Continue with the previous calculation, find the seasonal variation for Quarter 1, 2, 3, 4 respectively.
85
Elementary Statistics
Solution:
86
Elementary Statistics
Remarks:
1. The seasonal factors parallel our previous observation that the revenue is relatively higher in
quarter 4, followed by quarter 1, then quarter 3, while the performance in quarter 2 is the
worst.
2. A wide range of time series analysis would be continued, which includes the forecasting of
the coming data, making decision in stock market, etc.
87
Elementary Statistics
∑
PIL =∑ × 100
The index number in the base year is 100. A higher price level results in an index greater than 100
while a lower price level results in an index lower than 100.
(PI- 100) %
88
Elementary Statistics
Example 1
By using the information in the table below, calculate and interpret the Laspeyre price index for year
2012 and 2013 by using year 2011 as base year.
Solution
. ( . ) . ( . ) . ( . )
PIL (2011) = × 100 = 100
. ( . ) . ( . ) . ( . )
. ( . ) . ( . ) . ( . )
PIL (2012) = × 100 = 116.6519
. ( . ) . ( . ) . ( . )
. ( . ) . ( . ) . ( . )
PIL (2013) = × 100 = 127.7700
. ( . ) . ( . ) . ( . )
The price level increased by 16.65% from 2011 to 2012 and increased by 27.77% from 2011 to 2013.
∑
PIP = ∑ × 100
The index number in the base year is 100. A higher price level results in an index greater than 100
while a lower price level results in an index lower than 100.
89
Elementary Statistics
Example 1
By using the information in the table below, calculate and interpret the Paasche price index for year
2012 and 2013 by using year 2011 as base year.
Solution
. ( . ) . ( . ) . ( . )
PIP (2011) = × 100 = 100
. ( . ) . ( . ) . ( . )
. ( . ) . ( . ) . ( . )
PIP (2012) = × 100 = 116.5262
. ( . ) . ( . ) . ( . )
. ( . ) . ( . ) . ( . )
PIP (2013) = × 100 = 127.6058
. ( . ) . ( . ) . ( . )
The price level increased by 16.53% from 2011 to 2012 and increased by 27.61% from 2011 to 2013.
Remark:
The Laspeyres Price Index and Paasche Price Index are very similar, with the main difference is the
quantity used. In Laspeyres Price Index, the base time quantities are used; in Paasche Price Index,
the current time quantities are used.
90
Elementary Statistics
CPI is a measurement to reflect changes in the price levels of consumer goods and services generally
purchased by households. With the quantity and quality of the items in it are fixed, CPI provides
measures of the relative change over time in the total cost of a specified basket of consumer goods
and services.
CPI = ∑ (𝑤 × 𝐼 )
where w is the expenditure weight of the item and
91
Elementary Statistics
In Hong Kong, the Census and Statistics Department compiles separate CPI series relating to
households in different expenditure ranges. CPI(A) relates to households which are in the relatively
low expenditure range; CPI(B) relates to households which are in the medium expenditure range;
CPI(C) relates to households which are in the relatively high expenditure range. Composite CPI
relates to all of the above households taken together. The reason of compiling different CPI series
because the expenditure patterns of households in different expenditure ranges vary as shown in the
graph.
For more information about Consumer Price Index in Hong Kong, you can refer to Introduction to
the Consumer Price Index: https://www.statistics.gov.hk/pub/B8XX0021.pdf
92
Elementary Statistics
Example 2
Referring to the prices and expenditure weights indicated in the following table:
Item Expenditure weight Price in 2010 ($) Price in 2012 ($) Price in 2014 ($)
A 30% 50 55 58
C 10% 30 39 42
(a) Compile the CPIs for year 2012 and 2014 by using 2010 as the base year
(b) Find the percentage change in prices from 2010 to 2012.
(c) Find the percentage change in prices from 2012 to 2014.
Solution
.
(a) CPI in 2012 = 0.3 × 100 + 0.6 × 100 + 0.1 × 100 = 109
.
CPI in 2014 = 0.3 × 100 + 0.6 × 100 + 0.1 × 100 = 114.8
.
(c) Percentage change in prices from 2012 to 2014 = × 100% = 5.32%
93
Elementary Statistics
Example 3
The table below shows the Consumer Price Indices for group A (relatively low expenditure range),
group B (medium expenditure range) and group C (relatively high expenditure range) in year 2017,
2018 and 2019.
(a) Find the percentage change in prices from 2017 to 2018 for CPI(A), CPI(B), and CPI(C)
respectively.
(b) Find the percentage change in prices from 2018 to 2019 for CPI(A), CPI(B), and CPI(C)
respectively.
(c) Which household group experienced the highest percentage rise in prices from 2018 to 2019?
Solution
(a) Percentage change in prices from 2017 to 2018 for
. .
CPI(A) = × 100% = 2.66%
.
. .
CPI(B) = × 100% = 2.30%
.
. .
CPI(C) = × 100% = 2.21%
.
(c) Households in relatively low expenditure range experienced the highest percentage rise in
prices (3.34%) from 2018 to 2019.
94
Elementary Statistics
95
Elementary Statistics
96