0% found this document useful (0 votes)
23 views20 pages

ST2334 Notes

The document provides an overview of basic concepts in probability, including definitions of observations, statistical experiments, sample spaces, events, and operations on events. It also covers counting methods, permutations, combinations, and various approaches to probability, such as classical, relative frequency, and subjective methods. Additionally, it introduces random variables, probability distributions, and key properties of probability, including conditional probability and independence.

Uploaded by

xZealthiusx
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views20 pages

ST2334 Notes

The document provides an overview of basic concepts in probability, including definitions of observations, statistical experiments, sample spaces, events, and operations on events. It also covers counting methods, permutations, combinations, and various approaches to probability, such as classical, relative frequency, and subjective methods. Additionally, it introduces random variables, probability distributions, and key properties of probability, including conditional probability and independence.

Uploaded by

xZealthiusx
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

ST2334 Notes

Chapter 1: Basic Concepts of Probability


Basic Concepts and Definitions
Observation
Any recording of information, be it numerical or categorical
Statistical Experiment
Any procedure that generates a set of observations
Sample Space
The set of all possible outcomes of a statistical experiment, represented by the symbol S .

💡 S = {1, 2, 3, 4, 5, 6} for a die toss.


💡 S = {even, odd} for a number
💡 but
S = {(H, H), (H, T ), (T , H), (T , T )} for two coin flips. Note that both (H, T ) and (T , H) are shown so that each outcome is equally likely,
this does not really matter.
Sample Points
Every outcome in a sample space

💡 (H, H) is a sample point for the above example


Events
A subset of a sample space
Simple Event
Consists of exactly one outcome or sample point
Compound Event
Consists of more than one outcome or sample point

💡 The event may even be expressed as A = {t : 0 ≤ t < 5}.


Sure Event
The sample space itself
Null Event
Event with no outcomes or sample points, i.e. ∅.
Operations of Events
Complement Events
A′ is the set of all elements in S that are not in A
Mutually Exclusive Events
A and B are mutually exclusive or disjoint if A ∩ B = ∅
Union of Events
A∪B is the event containing all sample points in A or B or both
n
⋃i=1 Ai = A1 ∪ A2 ∪ ⋯ ∪ An
ST2334 Notes by Hanming Zhu
Intersection of Events
A∩B is the event containing all elements common to A and B
n
⋂i=1 Ai = A1 ∩ A2 ∩ ⋯ ∩ An

Basic Properties
. A ∩ A′ = ∅
. A∩∅=∅
. A ∪ A′ = S
. (A′ )′ = A
. (A ∩ B)′ = A′ ∪ B ′
. (A ∪ B)′ = A′ ∩ B ′
. A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C)
. A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)
. A ∪ B = A ∪ (B ∩ A′ )
. A = (A ∩ B) ∪ (A ∩ B ′ )
De Morgan's Law
. (A1 ∪ A2 ∪ ⋯ ∪ An )′ = A′1 ∩ A′2 ∩ ⋯ ∩ A′n
. (A1 ∩ A2 ∩ ⋯ ∩ An )′ = A′1 ∪ A′2 ∪ ⋯ ∪ A′n
Contained
A⊂B if all elements in event A are in event B
If A ⊂ B and B ⊂ A then A = B
In this module, we assume contained means it's a proper subset
Counting Methods
Multiplication Principle (OP1 ∧ OP2)
If an operation can be performed in n1 ways, and for each of these ways a second operation can be performed in n2 ways, then the two operations can be
performed together in n1 n2 ways
For k such operations, we have n1 n2 ⋯ nk ways
Addition Principle (OP1 ∨ OP2)
If a first procedure can be performed in n1 ways, and a second procedure in n2 ways, and that it is not possible to perform both together, then the number
ways we can perform either the first or second proceudres is n1 + n2 ways
For k such procedures, we have n1 + n2 + ⋯ + nk ways

💡 Application of the above concepts:


How many even three-digit numbers can we form from 0, 1, 2, 5, 6, 9? Each digit can only be used once.
Case A: 0 is used for the ones.
Number of ways = 5 × 4 = 20 for the hundreds and tens places.
Case B: 0 is not used for the ones
Number of ways = 4 × 4 × 2 = 32 as we cannot put 0 in the hundreds.
Total ways = 20 + 32 = 52

Permutation
An arrangement of r objects from a set of n objects, where r ≤ n
Number of permutations of n distinct objects taken r at a time = n Pr = (n−r)!
n!

Permutations around a Circle


Number of ways = (n − 1)!
Permutations when not all objects are distinct
ST2334 Notes by Hanming Zhu
If we have nk elements of a k-th kind, where n1 + n2 + ⋯ + nk = n, then the number of distinct permutations is n Pn ,n ,⋯ ,n
1 2 k
= n!
n1 !n2 !⋯nk !

Combination
Number of ways to select r objects from n objects without regard to the order
Number of combinations of n distinct objects taken r at a time = (nr ) = n Cr = r!(n−r)!
n!

n−1 n−1
( )=( )+( ) for 1 ≤ r ≤ n
n
r r r−1

💡 If 2 balls are randomly drawn from an urn containing 6 white and 5 black balls, what is the probability that 1 is white and 1 is black?
We have ( )(×)( ) = 5530
6 5
1 1
11
2

Approaches to Probability
Classical
Assume each outcome has equal probability, hence n outcomes = 1/n probability
Also known as axiomatic approach
Relative Frequency
fA = nA
n is the relative frequency of A in the n repetitions of E
Not the same as probability, but we can assert that Pr(A) = limn→∞ fA
Subjective
For outcomes that cannot really be calculated, i.e. unrepeatable experiments
Axioms of Probability
. 0 ≤ Pr(A) ≤ 1
. Pr(S) = 1
. If A1 , A2 , ⋯ are mutually exclusive (disjoint) events, i.e. Ai ∩ Aj = ∅ when i = j, then
∞ ∞
Pr( ⋃ Ai ) = ∑ Pr(Ai )
i=1 i=1

Basic Properties of Probability


. Pr(∅) = 0.
. If A1 , A2 , ⋯ , An are mutually exclusive events, then Pr(⋃ni=1 Ai ) = ∑ni=1 Pr(Ai )
. For any event A, Pr(A′ ) = 1 − Pr(A)
. For any two events A and B, Pr(A) = Pr(A ∩ B) + Pr(A ∪ B ′ )
. For any two events A and B, Pr(A ∪ B) = Pr(A) + Pr(B) − Pr(A ∩ B)
. For any three events, A, B, C , Pr(A ∪ B ∪ C) = Pr(A) + Pr(B) + Pr(C) − Pr(A ∩ B) − Pr(A ∩ C) − Pr(B ∩ C) + Pr(A ∩ B ∩ C)
(See the Inclusion-Exclusion Principle)
. If A ⊂ B, then Pr(A) ≤ Pr(B).
Inclusion-Exclusion Principle
n n−1 n n−2 n−1 n
Pr(A1 ∪ A2 ∪ ⋯ ∪ An ) = ∑ Pr(Ai ) − ∑ ∑ Pr(Ai ∩ Aj ) + ∑ ∑ ∑ Pr(Ai ∩ Aj ∩ Ak ) − ⋯ + (−1)n+1 Pr(A1 ∩ A2 ∩ ⋯ ∩ An )
i=1 i=1 j=i+1 i=1 j=i+1 k=j+1

Birthday Problem

pn = Pr(A) = 1 − qn

Once you have 23 people, the probability of having two people sharing the same birthday exceeds 1/2.

ST2334 Notes by Hanming Zhu


Inverse Birthday Problem
How large does a group of randomly selected people have to be such that the probability that someone is sharing his or her birthday with me is larger than
0.5?
We need n such that 1 − ( 365 ) ≥ 0.5.
364 n

Solving this, we get


log(0.5)
n≥ 364
= 252.7
log( 365 )

Conditional Probability
Pr(A ∩ B)
Pr(A∣B) = , if Pr(A) 
=0
Pr(B)

Multiplicative Rule of Probability


Pr(A ∩ B) = Pr(A)Pr(B∣A) = Pr(B)Pr(A∣B)

Pr(A ∩ B ∩ C) = Pr(A)Pr(B∣A)Pr(C∣A ∩ B)

Law of Total Probability


n n
Pr(B) = ∑ Pr(B ∩ Ai ) = ∑ Pr(Ai )Pr(B∣Ai )
i=1 i=1

Assuming the events A1 , ⋯ , An are mutually exclusive and exhaustive events


Bayes' Theorem
Let A1 , A2 , ⋯ , An be a partition of the sample space S. Then
Pr(Ak )Pr(B∣Ak )
Pr(Ak ∣B) = n
∑i=1 Pr(Ai )Pr(B∣Ai )

Independent Events
Two events A and B are independent if and only if Pr(A ∩ B) = Pr(A)Pr(B)
Properties of Independent Events
. Pr(B∣A) = Pr(B) and Pr(A∣B) = Pr(A)
. A and B cannot be mutually exclusive if they are independent, supposing Pr(A), Pr(B) > 0
. A and B cannot be independent if they are mutually exclusive
. The sample space S and the empty set ∅ are independent of any event
. If A ⊂ B, then A and B are dependent unless B = S.
Theorem about Complementary of Independent Events
If A and B are independent, then so are A and B ′ , A′ and B, A′ and B ′
n Pairwise Independent Events
A set of events A1 , A2 , ⋯ , An are said to be pairwise independent if and only if Pr(Ai ∩ Aj ) = Pr(Ai )Pr(Aj ) for i = j and i, j = 1, ⋯ , n
ST2334 Notes by Hanming Zhu
n Mutually Independent Events
The events A1 , A2 , ⋯ , An are mutually independent if and only if for any subset {Ai , Ai1 2
⋯ , Aik } ,
Pr(Ai1 ∩ Ai2 ∩ ⋯ ∩ Aik ) = Pr(Ai1 )Pr(Ai2 ) ⋯ Pr(Aik )

💡 Mutually independence basically means the multiplicative rule holds for any subset of the set of events.
Mutually independence implies pairwise independence, but pairwise independence does not imply mutually independence!
The complements of any number of the above events will also be mutually independent with the remaining events.

Chapter 2: Concepts of Random Variables


Random Variables
Random Variable
A real-valued function X which assigns a number to every element s ∈ S
Range space, RX = {x∣x = X(s), s ∈ S}
Equivalent Events
Let B be an event with respect to RX , i.e. B ⊂ RX
If A = {s ∈ S∣X(s) ∈ B}, then A and B are equivalent events and Pr(A) = Pr(B)
Note that A contains all sample points that fit the criteria
Table Format
Rolling Two Die
x 2 3 4 5 6 7 8 9 10 11 12
Pr(X = x) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36
Discrete Probability Distributions
Discrete Random Variable
If the number of possible values of X is finite or countable infinite, we call X a discrete random variable
Probability Function
Each value of X has a certain probability f (x), and this function f (x) is called the probability function (p.f.) or probability mass function (p.m.f.)
The collection of pairs (xi , f (xi )) is called the probability distribution of X
It must satisfy the following two conditions
. f (xi ) ≥ 0 for all xi
. ∑∞i=1 f (xi ) = 1
Tossing Two Coins
x 0 1 2
f(x) = Pr(X = x) 1/4 1/2 1/4
If we plot these values on a graph, we get a probability histogram. The total area of all rectangles is 1.
Another View of Probability Function
We can also think of a probability function as specifying a mathematical model for a finite population.
Continuous Probability Distributions
Continuous Random Variable
If RX , the range space of a random variable X , is an interval or a collection of intervals, then X is a continuous random variable
Probability Density Function
The probability density function (p.d.f.) f (x) of a continuous random variable must satisfy the following conditions
ST2334 Notes by Hanming Zhu
. f (x) ≥ 0 for all x ∈ RX .
. This also means that we may set f (x) = 0 for x ∈/ RX , i.e. Pr(A) = 0 does not imply A = ∅
. ∫R f (x)dx = 1 or ∫∞∞ f (x)dx = 1 since f (x) = 0 for x not in RX
X

. For any (c, d) ⊂ RX , c < d, Pr(c ≤ X ≤ d) = ∫cd f (x)dx


. Pr(X = x0 ) = ∫xx f (x)dx = 0
0
0

For number 4, in the continuous case, the probability of X being equals to a fixed value is 0, and
Pr(c ≤ X ≤ d) = Pr(c ≤ X < d) = Pr(c < X ≤ d) = Pr(c < X < d)
We can use ≤ and < interchangeably for a probability density function.
Cumulative Distribution Function
Let X be a random variable (can be discrete or continuous).
We define F (x) to be the cumulative distribution function (c.d.f.) of the random variable X where
F (x) = Pr(X ≤ x)

CDF for Discrete Random Variables


F (x) = ∑ f (t) = ∑ Pr(X = t)
t≤x t≤x

This c.d.f. is a step function.


For any a ≤ b, Pr(a ≤ B ≤ b) = Pr(X ≤ b) − Pr(X < a) = F (b) − F (a− )
where a− is the largest possible value of X that is strictly less than a
CDF for Continuous Random Variables
x
F (x) = ∫ f (t)dt
−∞

The reverse is also true, if a derivative exists:


dF (x)
f (x) =
dx
For any a ≤ b, Pr(a ≤ X ≤ b) = Pr(a < X ≤ b) = F (b) − F (a)
Note that this c.d.f. is a non-decreasing function.
Expressing CDF
To express CDF, we can write in cases
⎧0,


if x < 0,
⎪0.3,
F (x) = ⎨
if 0 ≤ x < 1,




0.9, if 1 ≤ x < 2,
1, if 2 ≤ x.

Expectation
Expected Value of Discrete Random Variable
The mean or expected value of X , denoted by E(X) or μX is
μX = E(X) = ∑ xi fX (xi ) = ∑ xfX (x)
i x

Expected Value of Continuous Random Variable



μX = E(X) = ∫ xfX (x)dx
−∞

Expectation of Functions of Random Variables


For any function g(X) of a random variable X with p.f. or p.d.f. fX (x),
. E[g(X)] = ∑x g(x)fX (x) if X is discrete, providing the sum exists
. E[g(X)] = ∫−∞ ∞
g(x)fX (x)dx if X is continuous, providing the integral exists
ST2334 Notes by Hanming Zhu
Variance
The special function, g(x) = (x − μX )2 , leads us to the definition of variance.
= V (X) = E[(X − μX )2 ] = { ∞x
2 ∑ (x − μX )2 fX (x), if X is discrete,
σX
∫−∞ (x − μX )2 fX (x)dx, if X is continuous.
The positive square root of the variance is the standard deviation,
σX = V (X)

We also can calculate variance using


V (X) = E(X 2 ) − [E(X)]2

Moment
The special function, g(x) = xk , leads us to the definition of moment. The k-th moment of X is E(X k )
Properties of Expectation
. E(aX + b) = aE(X) + b, where a and b are constants
. V (X) = E(X 2 ) − [E(X)]2
. V (aX + b) = a2 V (X)
Chebyshev's Inequality
We cannot reconstruct X from E(X) and V (X). However, we can derive some bounds.
Let X be a random variable, E(X) = μ and V (X) = σ2 . Then for any positive number k, we have
1
Pr(∣X − μ∣ ≥ kσ) ≤
k2
Alternatively,
1
Pr(∣X − μ∣ < kσ) ≥ 1 −
k2

💡 We are using the standard deviation σ, not the variance σ . 2

Chapter 3: Two-Dimensional Random Variables and Conditional Probability


Distributions
2-Dimensional Random Variables
(X, Y ) is a two-dimensional random variable, where X, Y are functions assigning a real number to each s ∈ S.
(X, Y ) is also called a random vector.

Range Space
RX,Y = {(x, y)∣x = X(s), y = Y (s), s ∈ S}
The above definition can be extended to more than two random variables, i.e. n-dimensional random variable/vector.
Discrete and Continuous
(X, Y ) is a discrete random variable if the possible values of (X(s), Y (s)) are finite or countable infinite.
(X, Y ) is a continuous random variable if the possible values of (X(s), Y (s)) can assume all values in some region of the Euclidean plane R2 .

Joint Probability Functions for Discrete Random Variables


With each possible value (xi , yj ), we associate a number fX,Y (xi , yj ) representing Pr(X = xi , Y = yj ) and satisfying the following conditions:
. fX,Y (xi , yj ) ≥ 0 for all (xi , yj ) ∈ RX,Y .
. ∑∞i=1 ∑∞j=1 fX,Y (xi , yj ) = ∑∞i=1 ∑∞j=1 Pr(X = xi , Y = yj ) = 1.
fX,Y is the joint probability function of (X, Y ).

Expressing joint p.f.


ST2334 Notes by Hanming Zhu
Joint Probability Density Functions for Continuous Random Variables
fX,Y (x, y)is called a joint probability density function if it satisfies the following:
. fX,Y (x, y) ≥ 0 for all (x, y) ∈ RX,Y .
. ∬(x,y)∈R fX,Y (x, y)dxdy = 1 or ∫∞∞ ∫∞∞ fX,Y (x, y)dxdy = 1.
X,Y

Marginal Probability Distributions


For discrete random variables,
fX (x) = ∑y fX,Y (x, y) and
fY (y) = ∑x fX,Y (x, y)
For continuous random variables,
fX (x) = ∫−∞ fX,Y (x, y)dy and


fY (y) = ∫−∞ fX,Y (x, y)dx
Basically we fix one of the two values, then either sum or integrate over the other. It gives the probabilities of various values of the variables in the subset
without reference to the values of the other variables.
Conditional Probability Distributions
The conditional distribution of Y given that X = x is given by fY ∣X (y∣x) = f f (x,y)
(x) , if fX (x) > 0, for each x within the range of X . Flip the variables
X,Y

for X given Y = y.
X

The condition p.f. or p.d.f. is also a 1-dimensional p.f. or p.d.f.


. fY ∣X (y∣x) ≥ 0 and fX∣Y (x∣y) ≥ 0.
. The sum or integral of the p.f. or p.d.f. respectively is 1.
. For fX (x) > 0, fX,Y (x, y) = fY ∣X (y∣x)fX (x). For fY (y) > 0, fX,Y (x, y) = fX∣Y (x∣y)fY (y).
Independent Random Variables
Random variables X and Y are independent if and only if fX,Y (x, y) = fX (x)fY (y) for all x, y.
This can be extended to n random variables.
Product Space
The product of 2 positive functions fX (x) and fY (y) results in a function which is positive on a product space.
If fX (x) > 0 for x ∈ A1 and fY (y) > 0 for y ∈ A2 , then fX (x)fY (y) > 0 for (x, y) ∈ A1 × A2 .
Expectation
E[g(X, Y )] = { ∞x ∞
∑ ∑y g(x, y)fX,Y (x, y), for Discrete RV’s,
∫−∞ ∫−∞ g(x, y)fX,Y (x, y)dxdy, for Cont. RV’s.

Covariance
Let g(X, Y ) = (X − μX )(Y − μY ). Recall that μX = E(x). This leads to the definition of covariance between two random variables.
Cov(X, Y ) = E[(X − μX )(Y − μY )]
. Cov(X, Y ) = E(XY ) − μX μY .
. If X and Y are independent, then Cov(X, Y ) = 0. However, Cov(X, Y ) = 0 does not imply independence.
. Cov(aX + b, cY + d) = acCov(X, Y ).
. V (aX + bY ) = a2 V (X) + b2 V (Y ) + 2abCov(X, Y ).
Correlation Coefficient
The correlation coefficient of X and Y , denoted by Cor(X, Y ) or ρX,Y or ρ is defined by ρX,Y = Cov(X,Y )
V (X) V (Y )
.
ST2334 Notes by Hanming Zhu
. −1 ≤ ρX,Y ≤ 1.
. ρX,Y is a measure of the degree of linear relationship between X and Y .
. If X and Y are independent, then ρX,Y = 0. On the other hand, ρX,Y = 0 does not imply independence.

Chapter 4: Special Probability Distributions


Discrete Distributions
Discrete Uniform Distribution
If the random variable X assumes the values x1 , x2 , ⋯ , xk with equal probability, then the random variable X is said to have a discrete uniform
distribution and the probability function is given by fX (x) = k1 , x = x1 , x2 , ⋯ , xk , and 0 otherwise.

. Mean, μ = E(X) = ∑ki=1 xi k1 = k1 ∑ki=1 xi .


. Variance, σ2 = V (X) = ∑all x (x − μ)2 fX (x) = k1 ∑ki=1 (xi − μ)2 .
. Variance, σ2 = E(X 2 ) − μ2 = k1 (∑ki=1 x2i ) − μ2 .
Bernoulli Distributions
Bernoulli experiments only have two possible outcomes, and we can code them as 1 and 0.
A random variable X is defined to have a Bernoulli distribution is the probability function of X is given by fX (x) = px (1 − p)1−x , x = 0, 1, where 0 <
p < 1. fX (x) = 0 for all other X values.
. (1 − p) is often denoted by q.
. Pr(X = 1) = p and Pr(X = 0) = 1 − p = q.
. Mean, μ = E(X) = p
. Variance, σ2 = V (X) = p(1 − p) = pq.
Parameter and Family of Distributions
If fX (x) depends on a quantity assigned to any one of some possible values, and each different value results in a different probability distribution, that
quantity is called a parameter of the distribution.
p is the parameter in the Bernoulli distribution.
The collection of all probability distributions for different values of the parameter is called a family of probability distributions.
Binomial Distributions ~ B(n, p)
A random variable X is defined to have a binomial distribution with two parameters n and p if the probability function of X is given by Pr(X = x) =
fX (x) = (nx )px (1 − p)n−x = (nx )px q n−x for x = 0, 1, ⋯ , n where p satisfies 0 < p < 1.
X is basically the number of successes that occur in n independent Bernoulli trials.
Bernoulli distribution is a special case of the binomial distribution, with n = 1.
. Mean, μ = E(X) = np.
. Variance, σ2 = V (X) = np(1 − p) = npq.
. Conditions for Binomial experiment: n repeated Bernoulli trials, probability of success is same, trials are independent.
Negative Binomial Distributions ~ NB(k, p)
Consider a binomial experiment, except that trials will be repeated until a fixed number of successes occur. We are interested in the probability of the k-th
success occurring on the x-th trial, where x is the random variable.
Random variable X is said tox−1follow a Negative Binomial distribution with parameters k and p (i.e. N B(k, p)). The probability function of X is given by
Pr(X = x) = fX (x) = ( k−1 )pk q x−k for x = k, k + 1, k + 2, ⋯.
. Mean, E(X) = kp .
. Variance, σ2 = (1−p)k
p 2 .
Geometric Distribution ~ Geometric(p)
This is a negative binomial distribution with k = 1, i.e. we stop after the first success.
Poisson Distribution ~ P (λ)
Poisson experiments yield the number of successes occurring during a given time interval or in a specified region.
Properties
ST2334 Notes by Hanming Zhu
. The number of successes occurring in one time interval or specified region are independent of those occurring in any other disjoint time interval or
region of space.
. The probability of a single success occurring during a very short time interval or in a small region is proportional to the length of the time interval or
size of the region and does not depend on the number of success occurring outside this time interval or region.
. The probability of more than one success occurring in such a short time interval or falling in such a small region is negligible.
The probability distribution of a Poisson random variable X is called the Poisson distribution and the function is given by fX (x) = Pr(X = x) = e x!λ −λ x

for x = 0, 1, 2, 3, ⋯ where λ is the average number of successes occurring in the given time interval or specified region and e ≈ 2.171828...
. Mean, E(X) = λ
. Variance, σ2 = V (X) = λ.
Poisson Approximation to Binomial Distribution
Let X be a binomial random variable with parameters n and p. Thus Pr(X = x) = fX (x) = (nx)px qn−x . Suppose that n → ∞ and p → 0 such that
λ = np remains a constant as n → ∞.
Then X will have approximately a Poisson distribution with parameter np. That is
e−np (np)x
lim Pr(X = x) =
p→0
n→∞
x!

Continuous Distributions
Continuous Uniform Distribution ~ U (a, b)
A random variable has a uniform distribution over the interval [a, b], −∞ < a < b < ∞, denoted by U (a, b), with a probability density function given by
1
fX (x) = b−a , a ≤ x ≤ b, and 0 otherwise.
. Mean, E(X) = a+b2
. Variance, σ2 = 121 (b − a)2
Exponential Distribution ~ Exp(α)
A continuous random variable X assuming all nonnegative values is said to have an exponential distribution with parameter α > 0 if its probability
density function is given by fX (x) = αe−αx for x > 0 and 0 otherwise.
. Mean, E(X) = α1
. Variance, σ2 = α1 2

. ∫∞∞ f (X)dx = 1
. Pr(X > t) = e−αt
. Pr(X ≤ t) = 1 − e−αt
The p.d.f. can be rewritten in the form fX (x) = μ1 e−x/μ for x > 0, then E(X) = μ and σ2 = μ2 .
No Memory Property
Pr(X > s + t ∣ X > s) = Pr(X > t).

Normal Distribution ~ N(μ, σ2 )


The random variable X assuming all real values, −∞ < x < ∞, has a normal distribution if its probability density function is given by fX (x) =
exp(− 2σ ), −∞ < x < ∞, where −∞ < μ < ∞ and σ > 0.
2
1 (x−μ)
2
2π σ

. Symmetrical about vertical line x = μ


. Maximum point is at x = μ and value is 2π1 σ
. Total area under the curve is 1
. Mean and variance is as given, μ and σ2
. As σ increases, the curve flattens, and as σ decreases, the curve sharpens
. If X has distribution N (μ, σ2 ), and if Z = (X−μ) , then Z has the N (0, 1) distribution (standardized normal distribution), and E(Z) = 0 and
V (Z) = σZ2 = 1.
σ

. x1 < X < x2 = (x1 − μ)/σ < Z < (x2 − μ)/σ


Statistical Tables
Statistical tables give values Φ(z) for a given z, where Φ(z) is the cumulative distribution function of a standardized normal random variable Z . 1 − Φ(z)
is the upper cumulative probability for a given z.
. Φ(z) = Pr(Z ≤ z)
ST2334 Notes by Hanming Zhu
. 1 − Φ(z) = Pr(Z > z)
Some statistical tables give the 100α percentage points, zα , of a standardized normal distribution, where α = Pr(Z ≥ zα ) = ∫z∞ α
1

2
exp(− z2 )dx
. Pr(Z ≥ zα ) = Pr(Z ≤ −zα ) = α
Normal Approximation to Binomial Distribution
When n → ∞ and p → 1/2, we can use normal distribution to approximate the binomial distribution. A good rule of thumb is to use normal approximation
only when np > 5 and n(1 − p) > 5.
If X is a binomial random variable with mean μ = np and variance σ2 = np(1 − p), then as n → ∞, Z = X−np npq is approximately ~N (0, 1). In other
words, we want to consider Y ~ N (μ, σ ).
2

Continuity Correction
. Pr(X = k) ≈ Pr(k − 12 < X < k + 21 )
. Pr(a ≤ X ≤ b) ≈ Pr(a − 12 < X < b + 21 )
. Pr(a < X ≤ b) ≈ Pr(a + 12 < X < b + 21 )
. Pr(a ≤ X < b) ≈ Pr(a − 12 < X < b − 21 )
. Pr(a < X < b) ≈ Pr(a + 12 < X < b − 21 )
. Pr(X ≤ c) = Pr(0 ≤ X ≤ c) ≈ Pr(− 21 < X < c + 21 )
. Pr(X > c) = Pr(c < X ≤ n) ≈ Pr(c + 12 < X < n + 21 )

Chapter 5: Sampling and Sampling Distributions


Population
The totality of all possible outcomes or observations of a survey or experiment is called a population.
Every outcome or observation can be recorded as a numerical or categorical value. Thus, each member of a population is a value of a random variable.
Finite Population
Consists of a finite number of elements, e.g. all citizens of Singapore.
Infinite Population
Consists of an infinitely (countable and uncountable) large number of elements, e.g. the results of all possible rolls of a pair of dice.
Sample
A sample is any subset of a population.
Random Sampling
A simple random sample of n members is a sample that is chosen in such a way that every subset of n observations of the population has the same
probability of being selected.
Random Sampling in General
The below sampling examples can be generalized as such:
Let X be a random variable with certain probability distribution, fX (x). Let X1 , X2 , ⋯ , Xn be n independent random variables each having the same
distribution as X . Then (X1 , X2 , ⋯ , Xn ) is a random sample of size n from a population with distribution fX (x).
The joint p.f. or p.d.f. of (X1 , X2 , ⋯ , Xn ) is given by fX ,X ,⋯ ,X (x1 , x2 , ⋯ , xn ) = fX (x1 )fX (x2 ) ⋯ fX (xn ).
1 2 n 1 2 n

Sampling from a Finite Population


Sampling Without Replacement
There are (Nn ) samples of size n that can be drawn from a finite population of size N without replacement.
Each sample has a probability of ( 1 ) of being selected.
N
n

Sampling With Replacement


Order of selection matters here. Hence, there are N n samples of size n that can be drawn from a finite population of size N with replacement.
Each sample has a probability of N1 being selected.
n

Sampling from an Infinite Population


Unfortunately, the concept of a random sample from an infinite population is more difficult to explain.
ST2334 Notes by Hanming Zhu
Refer to Chapter 5 slides 16-20 for some very unclear examples.
Sampling Distribution of Sample Mean
The main purpose in selecting random samples is to elicit information about unknown population parameters. Values calculated from the sample is used to
make some inference concerning the true value of the population.
Statistic
A function of a random sample (X1 , X2 , ⋯ , Xn ) is called a statistic. For example, the mean is a statistic. Hence, a statistic is a random variable, and it is
meaningful to consider the probability distribution of a statistic, which is also called a sampling distribution.
Sample Mean
For some random sample of size n represented by X1 , X2 , ⋯ , Xn , the sample mean is defined by the statistic X = n1 ∑ni=1 Xi .
If the values in the random sample are observed and they are x1 , x2 , ⋯ , xn , then the realization of the statistic X is given by x = n1 ∑ni=1 xi .
Sampling Distribution
For random samples of size n taken from an infinite population or a finite population with replacement having population mean μ and population standard
deviation σ, the sampling distribution of the sample mean X has its mean and variance given by:
. μX = μX , i.e. E(X ) = E(X).
. σX2 = σn , i.e. V (X ) = V (X)
2
X
n
.
Law of Large Number
Let X1 , X2 , ⋯ , Xn be a random sample of size n from a population having any distribution with mean μ and finite population variance σ2 .
For any ϵ ∈ R, P(∣X − μ∣ > ϵ) → 0 as n → ∞.
In other words, as the sample size increases, the probability that the sample mean differs from the population mean goes to zero.
Central Limit Theorem
The sampling distribution of the sample mean X is approximately normal with mean μ and variance σn if n is sufficiently large.
2

Hence Z = σ/X −μn follows approximately N (0, 1).


Sampling distribution properties of X :
. Central Tendency: μX = μ.
. Variation: σX = σn .
Normal Sampling Distributions
. If Xi , i = 1, 2, ⋯ , n are N (μ, σ2 ), then X is N (μ, σn ) regardless of the sample size n.
2

. If Xi , i = 1, 2, ⋯ , n are approximately N (μ, σ2 ), then X is approximately N (μ, σn ) regardless of the sample size n.
2

Sampling Distribution of Difference of Two Sample Means


If independent samples of sizes n1 (≥ 30) and n2 (≥ 30) are drawn from two populations, with means μ1 and μ2 and variances σ12 and σ22 respectively,
then the sampling distribution of the differences of the sample means X 1 and X 2 is approximately normally distributed with mean and standard deviation
given by:
. μX −X = μ1 − μ2 .
1 2

. σX −X = σn + σn .
1 2
2
1
1
2
2
2

. X −X −(μ −μ ) approx ~ N (0, 1).


1 2
σ2
1
σ2
2

1 + n2
n1 2

Chi-square Distribution ~ χ2 (n)


If Y is a random variable with probability density function fY (y) = 2 Γ(n/2)
1
n/2 y (n/2)−1 e−y/2 , for y > 0, and 0 otherwise, then Y is defined to have a chi-
square distribution with n degrees of freedom, denoted by χ2 (n), where n is a positive integer and Γ(⋅) is the gamma function.
The gamma function, Γ(⋅), is defined by Γ(n) = ∫0∞ xn−1 e−x dx = (n − 1)! for n = 1, 2, 3, ⋯.
. If Y ~χ2 (n), then E(Y ) = n and V (Y ) = 2n.
. For large n, χ2 (n) approx ~ N (n, 2n).
. If Y1 , Y2 , ⋯ , Yk are independent chi-square random variables with n1 , n2 , ⋯ , nk degrees of freedom respectively, then Y1 + Y2 + ⋯ + Yk has
a chi-square distribution with n1 + n2 + ⋯ + nk degrees of freedom. That is, ∑ki=1 Yi ~ χ2 (∑ki=1 ni ).
ST2334 Notes by Hanming Zhu
From Normal to Chi-square
. If X ~ N (0, 1), then X 2 ~ χ2 (1).
. Let X ~ N (μ, σ2 ), then [(X − μ)/σ]2 ~ χ2 (1).
. Let X1 , X2 , ⋯ , Xn be a random sample from a normal population with mean μ and variance σ2 . Define Y = ∑ni=1 (Xi −μ)2
σ2
. Then Y ~ χ2 (n).
χ2 -distribution Statistical Table
Let c be a constant satisfying Pr(Y ∞
≥ c) = ∫c fY (y)dy = α , where Y ~ χ2 (n). We use the notation χ2 (n; α) to denote this constant c. That is,
Pr(Y ≥ χ2 (n; α)) =

∫χ2 (n;α) .
fY (y)dy = α
Similarly, χ (n; 1 − α) is the constant satisfying
2

Pr(Y ≤ χ2 (n; 1 − α)) = ∫0


χ2 (n;1−α)
fY (y)dy = α .
Thus, we have:
. χ2 (10; 0.9) means Pr(Y ≥ χ2 (10; 0.9)) = 0.9 or Pr(Y ≤ χ2 (10; 0.9)) = 0.1.
. From the statistical table on χ2 -distribution, we have χ2 (10; 0.9) = 4.865.
Sampling Distribution of (n − 1)S2 /σ2
The statistic S 2 = n−11
∑i=1 (Xi − X )2 is the sample variance. However, it has little practical application. Instead, we shall consider the sampling
n

distribution of the random variable (n−1)S


σ
when Xi ~ N (μ, σ2 ) for all i.
2
2

If S 2 is the variance of a random sample of size n taken from a normal population having the variance σ2 , then the random variable (n−1)S σ has a chi-
2
2

square distribution with n − 1 degrees of freedom. That is, (n−1)S


σ
~ χ 2
2(n
2
− 1).
t -distribution
Suppose Z ~ N (0, 1) and U ~ χ2 (n). If Z and U are independent, and let T = Z
U /n
, then the random variable T follows the t-distribution with n
degrees of freedom.
Z
U /n
~ t(n).
The p.d.f. is given by
Γ( n+1
2
) t2 n+1
fT (t) = n (1 + ) 2 , −∞ < t < ∞
nπ Γ( 2 ) n

. The graph of the t-distribution is symmetric about the vertical axis and resembles the graph of the standard normal distribution.
. It can be shown that the p.d.f. of t-distribution with n d.f. (degrees of freedom) is approaching to the p.d.f. of standard normal distribution when n →
∞. That is, limn→∞ fT (t) = 12π e−t /2 .
2

. The values of Pr(T ≥ t) = ∫t∞ fT (x)dx for selected values of n and t are given in a statistical table. For example, Pr(T ≥ t10;0.05 ) = 0.05 gives
t10;0.05 = 1.812.
. If T ~ t(n), then E(T ) = 0 and V (T ) = n−2 n
for n > 2.
t-distribution from Random Sample from Normal Population
If the random sample was selected from a normal population, then Z = (σ/X −μ)n ~ N (0, 1) and U = (n−1)S
σ ~ χ2 (n − 1). It can be shown that X and S 2
2
2

are independent, and so are Z and U .


Therefore,
X −μ (X − μ)/(σ/ n) Z
T = = =
S/ n (n−1)S 2
/(n − 1) U /(n − 1)
σ2

~ tn−1 . That is, T has a t-distribution with n − 1 d.f.


F -distribution ~ F (n1 , n2 )
Let U and V be independent random variables having χ2 (n1 ) and χ2 (n2 ) respectively. Then, the distribution of the random variable, F = VU /n/n is called 1

a F -distribution with (n1 , n2 ) degrees of freedom.


2

The p.d.f. F is given by


n /2 n /2
n1 1 n2 2 Γ( n1 +n
2 )
2
x(n1 /2)−1
fF (x) = n1 n2
Γ( 2 )Γ( 2 ) (n1 x + n2 )(n1 +n2 )/2
ST2334 Notes by Hanming Zhu
for x > 0 and 0 otherwise.
. E(X) = n2 /(n2 − 2), with n2 > 2.
. V (X) = n2n(n (n−2)+n(n−2)−4) , with n2 > 4.
1
2
2
2
1
2
2
2

. If F ~ F (n, m), then 1/F ~ F (m, n).


F -distribution Statistical Table
Values of the F -distribution can be found in the statistical table, which gives the values of F (n1 , n2 ; α) such that Pr(F > F (n1 , n2 ; α)) = α.
For example, F (5, 4; 0.05) = 6.26 means Pr(F > 6.26) = 0.05, where F ~ F (5, 4).
. F (n1 , n2 ; 1 − α) = 1/F (n2 , n1 ; α).

Chapter 6: Estimation based on Normal Distribution


Parameter
Assume that some characteristics of the elements in a population can be presented by a random variable X whose p.d.f. or p.f. is fX (x; θ), where the
form of the p.d.f. or p.f. is assumed known except that it contains an unknown parameter θ.
Further assume that the values x1 , x2 , ⋯ , xn of a random sample X1 , X2 , ⋯ , Xn from fX (x; θ) can be observed.
On the basis of the observed sample values x1 , x2 , ⋯ , xn , it is desired to estimate the value of the unknown parameter θ.
Statistic
A statistic is a function of the random sample which does not depend on any unknown parameters. For example, X = n1 ∑ni=1 Xi or X(n) =
max(X1 , X2 , ⋯ , Xn ) are some examples of a statistic.
Let W = n1 ∑ni=1 (Xi − μ)2 , then W is a statistic if and only if μ is known.
Point Estimation
Point estimation is to let the value of some statistic, say Θ = Θ(X1 , X2 , ⋯ , Xn ) to estimate the unknown parameter θ. Such a statistic is called a point
estimator.
Point Estimate of Mean
Suppose μ is the population mean. The statistic that one uses to obtain a point estimate is called an estimator.
For example, X is an estimator of μ. The value of X , denoted by x, is an estimate of μ.
Unbiased Estimator
A statistic Θ is said to be an unbiased estimator of the parameter θ if E(Θ) = θ.
Interval Estimation
We define two statistics, ΘL and ΘU , where ΘL < ΘU , so that (ΘL , ΘU ) constitutes a random interval for which the probability of containing the
unknown parameter θ can be determined.
For example, suppose σ2 is known. Let ΘL = X − 2 σn and ΘU = X + 2 σn . Then (X − 2 σn , X + 2 σn ) is an interval estimator for μ.
Confidence Interval from Interval Estimation
An interval estimate of a population parameter θ is an interval of the form θL < θ < θU , where θL and θU depend on
. The value of the statistic Θ for a particular sample, and
. The sampling distribution of Θ.
θ L is also known as the lower confidence limit, θ the point estimate, and θ U the upper confidence limit.
Not all intervals will contain the parameter θ, since it depends on the sample. We thus seek a random interval (ΘL , ΘU ) containing θ with a given
probability 1 − α. That is, Pr(ΘL < θ < ΘU ) = 1 − α.
Then the interval θL < θ < θU , computed from the selected sample is called a (1 − α)100% confidence interval for θ, and the fraction (1 − α) is called
the confidence coefficient or degree of confidence.
Interpretation
This means that if samples of the same size n are taken, then in the long run, (1 − α)100% of the intervals will contain the unknown parameter θ, and
hence with a confidence of (1 − α)100%, we can say that the interval covers θ.
Confidence Interval for the Mean
Known Variance Case ST2334 Notes by Hanming Zhu
We can compute the confidence interval for mean with
. Known variance and
. The population is normal OR n is sufficiently large (≥ 30)
When the population is normal or by the CLT, we can expect that X ~ N (μ, σn ).
2

Thus Z = σ/X −μn ~ N (0, 1). Hence


Pr(−zα/2 < σ/ X −μ
n
< zα/2 ) = 1 − α or
Pr(X − zα/2 ( n ) < μ < X + zα/2 ( σn )) = 1 − α.
σ

z0.025 = 1.96

Confidence Interval for Mean with Known Variance


If X is the mean of a random sample of size n from a population with known variance σ2 , a (1 − α)100% confidence interval for μ is given by
σ σ
X − zα/2 ( ) < μ < X + zα/2 ( )
n n

Sample Size for Estimating μ


The size of the error with the point estimate is ∣X − μ∣, i.e. Pr(∣X − μ∣ < zα/2 σn ) = 1 − α.
Let e denote a margin of error that we do not want the error to exceed with a probability larger than 1 − α. Thus, we have e ≥ zα/2 ( σn ).
Hence, for a given margin of error e, the sample size is given by n ≥ (zα/2 σe )2 .
Unknown Variance Case
This case applies for when
. Unknown population variance and
. The population is normal or very close to a normal distribution
. The sample size is small
Let T = S/X −μn , where S 2 is the sample variance. We know that T ~ tn−1 .
Hence, Pr(−tn−1;α/2 < T < tn−1;α/2 ) = 1 − α, or
Pr(−tn−1;α/2 < (S/ X −μ)
n
< tn−1;α/2 ) = 1 − α, or
Pr(−tn−1;α/2 n < X − μ < tn−1;α/2 Sn ) = 1 − α, or
S

Pr(X − tn−1;α/2 Sn < μ < X + tn−1;α/2 Sn ) = 1 − α.

Confidence Interval for Mean with Unknown Variance


If X and S are the sample mean and standard deviation of a random sample of size n < 30 from an approximate normal population with unknown
variance σ2 , a (1 − α)100% confidence interval for μ is given by
S S
X − tn−1;α/2 ( ) < μ < X + tn−1;α/2 ( )
n n

For large n > 30, the t-distribution is approximately the same as the N (0, 1) distribution, hence for large n, a (1 − α)100% confidence interval for μ is
given by
S S
X − zα/2 ( ) < μ < X + zα/2 ( )
n n

Confidence Intervals for the Difference Between Two Means


If we have two populations with means μ1 and μ2 and variances σ12 and σ22 respectively, then X 1 − X 2 is the point estimator of μ1 − μ2 .
Known Variances Case
This case applies when
σ12 and σ22 are known and not equal
The two populations are normal OR n1 ≥ 30, n2 ≥ 30.
We have
(X 1 − X 2 ) ~ N (μ1 − μ2 , n + n )
2 2
σ 1 σ2
1 2

We can further assert that


ST2334 Notes by Hanming Zhu
(X 1 − X 2 ) − (μ1 − μ2 )
Pr(−zα/2 < < zα/2 ) = 1 − α
σ12 σ22
n1
+ n2

which leads us to the following (1 − α)100% confidence interval for μ1 − μ2


σ12 σ2 σ12 σ2
(X 1 − X 2 ) − zα/2 + 2 < μ1 − μ2 < (X 1 − X 2 ) + zα/2 + 2
n1 n2 n1 n2

Unknown Variances Case


This case applies when
σ12 and σ22 are unknown
n1 ≥ 30, n2 ≥ 30
We replace σ12 and σ22 by their estimates, S12 and S22 , giving us the following (1 − α)100% confidence interval for μ1 − μ2
S12 S2 S12 S2
(X 1 − X 2 ) − zα/2 + 2 < μ1 − μ2 < (X 1 − X 2 ) + zα/2 + 2
n1 n2 n1 n2

Unknown but Equal Variances (Small Samples) Case


This case applies when
σ12 and σ22 are unknown but equal
The two populations are normal
Small sample sizes, n1 ≤ 30, n2 ≤ 30
We let σ12 = σ22 = σ2 , then
(X 1 − X 2 ) ~ N (μ1 − μ2 , σ 2 ( n1 + n1 )) 1 2

and we obtain a standard normal variable


(X 1 − X 2 ) − (μ1 − μ2 )
Z=
σ 2 ( n11 + 1
n2
)

But this requires the actual population variance. We thus need to estimate σ2 using the pooled sample variance:
(n1 − 1)S12 + (n2 − 1)S22
Sp2 =
n1 + n2 − 2

Since the two populations are normal, then the two sample variances can also be used to obtain a Chi-squared distribution. We can actually combine the
two sample variances to get
2
(n −1)S +(n −1)S
1 1
σ 2 ~ χ2n +n −2
2
2
2
1 2

Finally, we can substitute Sp2 for σ2 , giving us the statistic:


T = (X −X )−(μ −μ ) ~ tn +n −2
1 2
2 1
1
1
2
1 2
S ( + ) p n1 n2

Note that we are using t-distribution because the variance is unknown.


We can assert that
(X 1 − X 2 ) − (μ1 − μ2 )
Pr(−tn1 +n2 −2;α/2 < < tn1 +n2 −2;α/2 ) = 1 − α
Sp2 ( n11 + 1
n2
)

Therefore a (1 − α)100% confidence interval for μ1 − μ2 is given by


1 1 1 1
(X 1 − X 2 ) − tn1 +n2 −2;α/2 Sp + < μ1 − μ2 < (X 1 − X 2 ) + tn1 +n2 −2;α/2 Sp +
n1 n2 n1 n2

Unknown but Equal Variances (Large Samples) Case


For large samples, we can replace tn +n −2;α/2 by zα/2 . Thus, a (1 − α)100% confidence interval for μ1 − μ2 is
1 2

1 1 1 1
(X 1 − X 2 ) − zα/2 Sp + < μ1 − μ2 < (X 1 − X 2 ) + zα/2 Sp +
n1 n2 n1 n2

Paired (Dependent) Data


When our two samples are dependent on each other, e.g. before and after, we need to work with the differences di = xi − yi of paired observations.
ST2334 Notes by Hanming Zhu
We assume these differences d1 , d2 , ⋯ , dn are normal with a mean μD and unknown variance σD2 .
μD = μ1 − μ2
The point estimate of μD is
1 n 1 n
d= n
∑i=1 di = n
∑i=1 (xi − yi )
The point estimate of variance σD2 is given by
1 n
s2D = n−1
∑i=1 (di − d )2
We can thus establish
Pr(−tn−1;α/2 < T < tn−1;α−2 ) = 1 − α

where T = sd−μ/ n ~ tn−1 distribution.


d
D

We have a (1 − α)100% confidence interval for μD = μ1 − μ2

SD SD
d − tn−1;α/2 ( ) < μD < d + tn−1;α/2 ( )
n n

For sufficiently large sample n > 30, we can replace tn−1;α/2 by zα/2 and get
SD SD
d − zα/2 ( ) < μD < d + zα/2 ( )
n n

Confidence Interval for Variances


The following applies for a (approximately) N (μ, σ2 ) distribution.
The sample variance,
n n
1 1
∑(Xi − X )2 = (∑ Xi2 − nX )
2
S2 =
n−1 n−1
i=1 i=1

is a point estimate of σ2
Known Mean Case
When μ is known, we have
X −μ
i
σ
~ N (0, 1) for all i
or ( σ ) ~ χ2 (1) for all i
X −μ 2i

and hence ∑ni=1 (X σ−μ) ~ χ2 (n).


i
2
2

We get a (1 − α)100% confidence interval for σ2 of N (μ, σ2 ) population with μ known:


∑ni=1 (Xi − μ)2 ∑n (Xi − μ)2
2 < σ 2 < i=12
χn;α/2 χn;1−α/2

For standard derivation, we just need to square root both sides.


n n
∑i=1 (Xi − μ)2 ∑i=1 (Xi − μ)2
<σ<
χ2n;α/2 χ2n;1−α/2

Unknown Mean Case


When μ is unknown, we have
(n−1)S
σ 2
2
n
= ∑i=1 σ
(X −X )
~ χ2 (n − 1)
i
2
2

This is true for both small and large n. Hence, we have


(n − 1)S 2 (n − 1)S 2
2 < σ2 < 2
χn−1;α/2 χn−1;1−α/2

where S 2 is the sample variance.


For standard derivation, we just need to square root both sides
(n − 1)S 2 (n − 1)S 2
<σ<
χ2n−1;α/2 χ2n−1;1−α/2

ST2334 Notes by Hanming Zhu


Confidence Interval for Ratio of Variances
Let us have two random samples from two approximately normal populations with unknown means.
We thus have (n −1)S
1
σ 2
1
~ χ2 (n1 − 1) and (n −1)S
2
1
σ
2
~ χ2 (n2 − 1), where S12 = n 1−1 ∑ni=1 (Xi − X )2 and S22 = n 1−1 ∑nj=1 (Yj − Y )2 .
2
2
2
2
1
1
2
2

Hence,
(n1 −1)S12
σ12
/(n1 − 1) S12 /σ12
F = =
(n2 −1)S22
/(n2 − 1) S22 /σ22
σ2
2

~ F (n1 − 1, n2 − 1).
Therefore, a (1 − α)100% confidence interval for the ratio σ12 /σ22 when μ1 and μ2 are unknown
S12 1 σ2 S2
2
< 12 < 12 Fn2 −1,n1 −1;α/2
S2 Fn1 −1,n2 −1;α/2 σ2 S2

To get a confidence interval for σ1 /σ2 , we just square root both sides.

Chapter 7: Hypotheses Testing Based on Normal Distribution


Null and Alternative Hypotheses
Null hypothesis, H0 , is the hypothesis we formulate with the hope of rejecting.
The rejection of H0 leads to the acceptance of an alternative hypothesis, denoted by H1 .
When we reject a hypothesis, we conclude that it is false. But if we accept it, it merely means we have insufficient evidence to believe otherwise.
Types I and II Error
Type I error occurs when we reject H0 given that H0 is true. This is considered as a serious type of error.
Type II error occurs when we do not reject H0 given that H0 is false.
Pr(Type I) = Pr(reject H0 ∣H0 ) = α, where α is the level of significance, usually 5% or 1%
Pr(Type II) = β , such that 1 − β = Pr(reject H0 ∣H1 ) = Power of a test

Level of Significance
The level of significant separates all possible values of the test statistic into two regions, the rejection region (or critical region) and the acceptance region.
The value that separates the rejection and acceptance regions is called the critical value.
Hypotheses Testing Concerning Mean
Known Variance (Two-sided) - Critical Value
This is for
. Variance, σ2 , is known, AND
. Underlying distribution is normal OR n > 30
Test H0 : μ = μ0 against H1 : μ = μ0 .
We can expect that X ~ N (μ, σn ), hence X ~ N (μ0 , σn ).
2 2

By using a significance level of α, we can find two critical values x1 and x2 such that
. x1 < X < x2 defines the acceptance region
. The two tails, X < x1 and X > x2 constitute the critical or rejection region.
We need x1 = μ0 − zα/2 σn and x2 = μ0 + zα/2 σn .
If X falls in the acceptance region, we accept the null hypothesis, else reject. The critical region is often stated in terms of Z instead of X , where
Z= X −μ
0
σ/ n
~ N (0, 1)
Basically, if the (1 − α)100% confidence interval covers μ0 , null hypothesis is accepted, else it's rejected.
Known Variance (Two-sided) - p-Value
The p-value is the probability of obtaining a test statistic more extreme (≤ or ≥) than the observed sample value given H0 is true. It is also called the
observed level of significance.
Here are the steps:
. Convert a sample statistic e.g. X into a test statistic e.g. Z statistic ST2334 Notes by Hanming Zhu
. Obtain the p-value
. Compare the p-value with α/2. If p-value < α/2, reject H0 , else ≥, do not reject.
Known Variance (One-sided) - Critical Value
Same as before but the alternative hypothesis is now either H1 : μ > μ0 or H1 : μ < μ0 .
In both cases, let Z = Xσ/−μn . Then we need to check if the observed values of Z is greater than zα or less than −zα respectively.
0

Known Variance (One-sided) - p-Value


Same as the two-sided known variance approach, just that we will compare against the relevant side, and against α itself.
Unknown Variance (Two-sided) - Critical Value
We use this case for
. Variance unknown, AND
. Underlying distribution is normal
Let T = XS/−μn , where S 2 is the sample variance.
0

Then H0 is rejected if the observed value of T , say t, > tn−1;α/2 or < −tn−1;α/2 .
Unknown Variance (One-sided) - Critical Value
We test the relevant side, t > tn−1;α or t < −tn−1;α .
Hypotheses Testing Concerning Difference Between Two Means
Known Variances
. Variances σ12 and σ22 are known and
. Underlying distribution is normal or both n1 ≥ 30, n2 ≥ 30.
Refer to section before on difference between two means with known variables. Generally, since variance is known, we will be using the Z distribution.
Unknown Variances (Large Samples)
. Variances σ12 and σ22 are unknown and
. Both n1 ≥ 30, n2 ≥ 30.
Refer to section before.
Unknown but Equal Variances (Small Samples)
. Variances σ12 and σ22 are unknown but equal and
. The populations are normal and
. Both are small samples n1 ≤ 30, n2 ≤ 30.
Refer to section before.
Paired Data
Just refer to section before.
Hypotheses Testing Concerning Variances
One Variance
The assumption is that the underlying distribution is normal.
We wish to test H0 : σ2 = σ02 . We know that χ2 = (n−1)Sσ 2
0
~ χ2 (n − 1), and we will use it as our test statistic.
2

. For σ2 > σ02 , we have critical region χ2 > χ2n−1;α


. For σ2 < σ02 , we have critical region χ2 < χ2n−1;1−α
. For σ2 = σ02 , we have χ2 < χ2n−1;1−α/2 or χ2 > χ2n−1;α/2
where Pr(W > χ2n−1;α ) = α with W ~ χ2 (n − 1).
Ratio of Variances
. Underlying distributions are normal
. Means are unknown
2 2 ST2334 Notes by Hanming Zhu
We have F = SS /σ/σ ~ F (n1 − 1, n2 − 1).
2 2
1 1
2 2
2 2

Under H0 : σ12 = σ22 , F = SS ~ F (n1 − 1, n2 − 1), which is our test statistic.


2
1
2
2

. For σ2 > σ02 , we have critical region F > Fn −1,n −1;α


1 2

. For σ2 < σ02 , we have critical region F < Fn −1,n −1;1−α


1 2

. For σ2 = σ02 , we have F < Fn −1,n −1;1−α/2 or F > Fn −1,n −1;α/2


1 2 1 2

where Pr(W > Fv ,v ;α ) = α with W ~ F (v1 , v2 ).


1 2

ST2334 Notes by Hanming Zhu

You might also like