0% found this document useful (0 votes)
1 views

Lecture03

Uploaded by

arigno
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

Lecture03

Uploaded by

arigno
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 112

Lecture 3.

Random Variables and Probability


Distributions

Hanyang University

MAT2022 Principles of Statistics, Spring 2025

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 1 / 112


Learning Objectives

1 Determine probabilities for i) discrete random variables from probability mass


functions and for ii) continuous random variables from probability density functions

2 Use cumulative distribution functions in both cases

3 Calculate means and variances for discrete and continuous random variables

4 Understand the assumptions for each of the probability distributions

5 Select an appropriate probability distribution, such as normal distribution


(continuous) and binomial and Poisson distributions (discrete), to calculate
probabilities in specific applications

6 Calculate covariances (and correlations) between random variables and means (and
variances) for linear combinations of random variables

7 Understand the central limit theorem

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 2 / 112


3.1 Set Notation and Probability

3.1 Set Notation and Probability

An event is a set of outcomes of an experiment (a subset of the sample space Ω) to


which a probability is assigned.

A probability is a numerical quantity that expresses the likelihood of an event. The


probability of an event E is written as P (E). The probability P (E) is always a
number between 0 and 1, inclusive.

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 3 / 112


3.1 Set Notation and Probability

Frequency Interpretation of Probability The probability P (E) is interpreted as the


relative frequency of occurrence of E in an indefinitely long series of repetitions of
the chance operation.
# of times E occurs
P (E) = .
# of times chance operation is repeated

In the penny tossing experiment, what is the probability of getting a head?

- http://www.stat.berkeley.edu/~aldous/Real-World/coin_tosses.html

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 4 / 112


3.1 Set Notation and Probability

The union of two events A and B is the event that occurs if either A or B (or both)
occurs on a single performance of the experiment. We denote the union of events A
and B by the symbol A ∪ B (read: “A union B”). A ∪ B consists of all the sample
points that belong to A or B or both.

The intersection of two events A and B is the event that occurs if both A and B
occur on a single performance of the experiment. We write A ∩ B for the
intersection of A and B (read: “A intersect B”). A ∩ B consists of all the sample
points belonging to both A and B.

The events A and B are called mutually exclusive (or disjoint) if A ∩ B is empty.

The complement of E, denoted as E ′ (read: “E complement”), is the event of


“not E”.

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 5 / 112


3.1 Set Notation and Probability

Axioms of Probability

1 The probability of an event E is always between 0 and 1. That is,

0 ≤ P (E) ≤ 1.
2 P (Ω) = 1; the probability of the whole sample space is one.

3 If E1 , E2 , . . . are mutually exclusive sets,

P (E1 ∪ E2 ∪ . . .) = P (E1 ) + P (E2 ) + . . .

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 6 / 112


3.1 Set Notation and Probability

Some rules

The probability that an event E does not happen, denoted by E ′ , is one minus the
probability that the event happens. That is,

P (E ′ ) = 1 − P (E).

Addition law of probability:

P (A ∪ B) = P (A) + P (B) − P (A ∩ B).

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 7 / 112


3.1 Set Notation and Probability

Example 1
A glass jar contains 1 red, 3 green, 2 blue, and 4 yellow marbles. If a single marble is
chosen at random from the jar, what is the probability that it is yellow or green?

A single card is chosen at random from a standard deck of 52 playing cards. What is
the probability of choosing a king or a club?

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 8 / 112


3.1 Set Notation and Probability

Example 2
Let E1 denote the event that a structural component fails during a test and E2 denote
the event that the component shows some strain but does not fail. Given P (E1 ) = 0.15
and P (E2 ) = 0.3,

(a) What is the probability that a structural component does not fail during a test?

(b) What is the probability that a component either fails or shows strain during a test?

(c) What is the probability that a component neither fails nor shows strain during a
test?

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 9 / 112


3.2 Permutations and Combinations

3.2 Permutations and Combinations

Calculating probabilities often involves counting number of certain arrangements. It


would be helpful by knowing how to compute those numbers.

Permutations

The number of ways you can choose r objects in order from n distinct objects
n!
n Pr = n(n − 1)(n − 2) · · · (n − r + 1) =
(n − r)!
where
n! = n × (n − 1) × (n − 2) × · · · × 2 × 1
and
0! = 1

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 10 / 112


3.2 Permutations and Combinations

Combinations

The number of ways you can choose r objects without regarding order from n
distinct objects
n Pr n!
n Cr = =
P
r r r!(n − r)!

Note: In permutations, order matters and in combinations order does not matter.

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 11 / 112


3.2 Permutations and Combinations

Example 3
(Permutation) There are five candidates Adam, Bette, Carl, David, Elle. Suppose we
want to choose two of them and then rank the two chosen persons, then how many
possible rankings are there?

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 12 / 112


3.2 Permutations and Combinations

Example 4
(Combination) Suppose 5 cards are randomly drawn from a full deck of 52 without
replacement to form a 5-card hand. What is the probability of getting a flush?

Note: A flush is a hand of playing cards where all cards are of the same suit
(spades, hearts, clubs, diamonds).

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 13 / 112


3.3 Conditional Probability and Bayes’ Theorem

3.3 Conditional Probability

Conditional probability provides us with a way to reason about the outcome of an


experiment and update the probability of one event, after we learn another event has
occurred.

The conditional probability of A given B, written P (A|B) is defined by

P (A ∩ B)
P (A|B) = , provided P (B) > 0.
P (B)

Here, P (A ∩ B) = P (A|B)P (B) is known as multiplication rule of probability.

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 14 / 112


3.3 Conditional Probability and Bayes’ Theorem

Note: The rule extends to more than two events.

P (A ∩ B ∩ C) = P (A|B ∩ C)P (B|C)P (C).

More generally,

let A1 , A2 , . . . , An be events such that P (A1 ∩ A2 ∩ · · · ∩ An−1 ) > 0. Then,

P (A1 ∩ · · · ∩ An ) = P (A1 )P (A2 |A1 ) · · · P (An |A1 ∩ · · · ∩ An−1 ).

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 15 / 112


3.3 Conditional Probability and Bayes’ Theorem

Bayes’ Theorem

A collection P = {B1 , . . . , Bk } of disjoint events is called a partition of Ω if


Ω = B1 ∪ · · · ∪ Bk .

Note: A partition of sample space is simply the decomposition of the sample space
into a collection of disjoint (mutually exclusive) events with positive probability.

Let A be an event and let {B1 , . . . , Bk } be a partition.

We can write A = (A ∩ B1 ) ∪ · · · ∪ (A ∩ Bk ), and these events (A ∩ Bi ,


i = 1, . . . , k) are disjoint. Then, we have

P (A) = P (B1 )P (A|B1 ) + · · · + P (Bk )P (A|Bk ).

This is called the law of total probability.

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 16 / 112


3.3 Conditional Probability and Bayes’ Theorem

Suppose that we assign probabilities to events B1 , . . . , Bk in a partition and then we


learn that some event A occurs.

How do we revise the probabilities of the partition event given A?


P (A ∩ Bi ) P (Bi )P (A|Bi )
P (Bi |A) = = Pk ,
P (A) j=1 P (Bj )P (A|Bj )

which is called the Bayes’ Theorem.

Particularly, if the partition is simply {B, B c }, then we have

P (A|B)P (B)
P (B|A) = .
P (A|B)P (B) + P (A|B c )P (B c )

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 17 / 112


3.3 Conditional Probability and Bayes’ Theorem

Example 5
According to the genetics of color blindness, it is believed to be more common in men
than in women. It is usually said that 5 men in 100 and 25 women in 10,000 suffer from
the condition. Thus, the probability of being color blind is conditional on whether it is for
men or women.

What is the probability of being color blind, assuming that the proportions of men and
women are equal?

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 18 / 112


3.3 Conditional Probability and Bayes’ Theorem

Example 6
Suppose you meet two arbitrarily chosen people. What is the probability that their
birthdays are different? What about more than two people?

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 19 / 112


3.3 Conditional Probability and Bayes’ Theorem

Example 7
An urn contain two type A coins and one type B coin. When a type A coin is flipped, it
comes up a head with probability 1/4, whereas when a type B coin is flipped, it comes
up a head with probability 3/4. A coin is randomly chosen from the urn and flipped.
Given that the flip landed on a head, what is the probability that it was a type A coin?

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 20 / 112


3.4 Random Variable

3.4 Random Variable

A random variable is a mapping from the (abstract) sample space to a real number.
It assigns a real number to each member of the sample space.

A discrete random variable is a random variable with a finite (or countably infinite)
set of real numbers for its range.
(e.g., number of scratches on a surface, proportion of defective parts among 1000
tested, number of transmitted bits received in error)

A continuous random variable is a random variable with an interval (either finite or


infinite) of real numbers for its range.
(e.g., electrical current, length, pressure, temperature, time, voltage, weight)

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 21 / 112


3.4 Random Variable

Random Variables: Some Properties

Suppose X is a random variable.

P (X ∈ R) = 1, where R is the set of all real numbers

0 ≤ P (X ∈ E) ≤ 1, for any set E of real numbers

P (a < X ≤ b) = P (X ≤ b) − P (X ≤ a) for any a < b

P (X > a) = 1 − P (X ≤ a) for any real number a

Example 8
Assume that the following probabilities apply to the random variable X that denotes the
life in hours of standard fluorescent tubes: P (X ≤ 5000) = 0.1,
P (5000 < X ≤ 6000) = 0.3, P (X > 8000) = 0.4. What is P (6000 < X ≤ 8000)?

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 22 / 112


3.5 Continuous Random Variables

3.5 Continuous Random Variables

ndProbabilityDistributions.qxd 9/21/10 9:33 AM Page 67


The probability distribution or simply distribution of a random variable X is a
description of the set of the probabilities associated with the possible values for X.

The probability density function (or pdf) f (x) of a continuous random variable X is
used to determine probabilities as follows:
Z b
P (a < X < b) = f (x)dx.
3-4 CONTINUOUS RANDOM VAR
a

f(x) f(x)
P(a < X < b)

a b x
x
Figure 3-6 Probability determined from the area
-5 Density func- under f(x). Figure 3-7 A histo
loading on a long, mates a probability
m. The area of each bar
tive frequency of the
MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 23 / 112
3.5 Continuous Random Variables

Properties of pdf

It is important that f (x) is used to calculate an area that represents the probability
that X assumes a value in (a, b).

f (x) ≥ 0
R∞
−∞
f (x)dx = 1

Because every point has zero width, the integral that determines the loading at any
point is zero
P (X = x) = 0.

If X is a continuous random variable, for any a and b,

P (a < X < b) = P (a ≤ X < b) = P (a < X ≤ b) = P (a ≤ X ≤ b).

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 24 / 112


3.5 Continuous Random Variables

Example 9
Let the continuous random variable X denote the current measured in a thin copper
wire in milliamperes. Assume that the range of X is [0, 20], and assume that the
probability density function of X is uniform. What is the probability that a current
measurement is less than 10 milliamperes.

Let the continuous random variable X denote the distance in micrometers from the
start of a track on a magnetic disk until the first flaw. Historical data show that the
distribution of X can be modeled by a pdf
1 −x/2000
f (x) = e , x ≥ 0.
2000
What proportion of parts is between 1000 and 2000 micrometers?

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 25 / 112


3.5 Continuous Random Variables

Cumulative Distribution Function

The cumulative distribution function (or cdf) of a continuous random variable X


with probability density function f (x) is
Z x
F (x) = P (X ≤ x) = f (u)du,
−∞

for −∞ < x < ∞.

For a continuous random variable X, F (x) = P (X < x) because P (X = x) = 0.


0 ≤ F (x) ≤ 1 for all x
F (x) is a non-decreasing function.
F (x) tends to 0 when x tends to −∞.
F (x) tends to 1 when x tends to ∞.
pdf f (x) can be recovered through the fundamental theorem of calculus:
Z x
d d
F (x) = f (u)du = f (x).
dx dx −∞

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 26 / 112


3.5 Continuous Random Variables

The cdf F (x) can be related to the pdf f (x) and can be used to obtain probabilities
as follows:
Z b Z b Z a
P (a < X < b) = f (x)dx = f (x)dx − f (x)dx = F (b) − F (a).
a −∞ −∞

(Example) Consider the distance to flaws in previous example with pdf


1
f (x) = exp(−x/2000)
2000
for x ≥ 0. The cdf is determined from
Z x
1
F (x) = exp(−u/2000)du = 1 − exp(−x/2000)
0 2000
d
for x ≥ 0. You can check that dx
F (x) = f (x).

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 27 / 112


3.6 Mean and Variance of a Continuous Random Variable

3.6 Mean and Variance of a Continuous Random Variable

The mean or expected value of a continuous random variable X, denoted as µ or


E(X), is Z ∞
µ = E(X) = xf (x)dx.
−∞

The integral in E(X) is the population analogous to the sum that is used to
calculate x̄.

Recall that x̄ is the balance point when an equal weight is placed at the location of
each measurement along a number line.

If f (x) is the density function of a loading on a long, thin beam, E(X) is the point
at which the beam balances.

Example 10
Suppose X has a pdf f (x) = 2x for 0 < x < 1.
(a) Show that f (x) is a valid pdf.
(b) Compute the mean of X.

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 28 / 112


3.6 Mean and Variance of a Continuous Random Variable

The variance of a continuous random variable X is a measure of dispersion or


scatter in the possible values for X. The variance of X, denoted as σ 2 , Var(X) or
V (X), is Z ∞
σ 2 = Var(X) = (x − µ)2 f (x)dx.
−∞

The standard derivation of X is σ = σ2 .

Example 11
Suppose X has a pdf f (x) = 2x for 0 < x < 1.
What are the variance and standard deviation of X?

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 29 / 112


3.6 Mean and Variance of a Continuous Random Variable

Computational Formula for Var(X)

There is also a computational formula for Var(X):


Z ∞
Var(X) = E(X 2 ) − µ2 = x2 f (x)dx − µ2 ,
−∞
Z ∞
where E(X 2 ) = x2 f (x)dx is called the second moment of X.
−∞

Z ∞
Var(X) = (x − µ)2 f (x)dx
−∞
Z ∞
= (x2 f (x) − 2xµf (x) + µ2 f (x))dx
−∞
Z ∞ Z ∞ Z ∞
= x2 f (x)dx − 2xµf (x)dx + µ2 f (x)dx
−∞ −∞ −∞
Z ∞ Z ∞ Z ∞
= x2 f (x)dx − 2µ xf (x)dx + µ2 f (x)dx
−∞ −∞ −∞
Z ∞ Z ∞
= x2 f (x)dx − 2µ × µ + µ2 × 1 = x2 f (x)dx − µ2 .
−∞ −∞

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 30 / 112


3.6 Mean and Variance of a Continuous Random Variable

More Complicated Example

To facilitate this example, we first state the formula of integration by parts


Z Z
udv = uv − vdu.

We will be using this formula a few times.

Suppose X has a pdf


f (x) = e−x , x ≥ 0.
The mean of X is

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 31 / 112


3.6 Mean and Variance of a Continuous Random Variable

The second moment of X is

Therefore, the variance of X is

σ 2 = Var(X) = E(X 2 ) − µ2 = 2 − 12 = 1.

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 32 / 112


3.7 Normal Distribution

3.7 Normal Distribution

The most widely used model for the distribution of a random variable is a normal
distribution.

A normal random variable X has probability density function

(x − µ)2
1 −
f (x) = √ e 2σ 2
2πσ
for −∞ < x < ∞ with parameters −∞ < µ < ∞ and σ > 0. The distribution of X
is called normal distribution. Also, E(X) = µ and Var(X) = σ 2 .

The notation N (µ, σ 2 ) is often used to denote a normal distribution with mean µ
and variance σ 2 .

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 33 / 112


3.7 Normal Distribution

3-5 IMPORTANT CONTINUOUS

f(x) σ2 = 1
σ2 = 1
σ2 = 9

!=5 ! = 15 x

Figure 3-11 Normal probability density functions for


selected values of the parameters ! and " 2.

Random variables with different means and variances can be mo


bility density functions with appropriate choices of the center and
value of E(X ) &! determines the center of the probability density fu
V(X ) &" 2 determines the width. Figure 3-11 illustrates several nor
functions with selected values of ! and " 2. Each has the characteristic
curve, but the centers and dispersions differ. The following definition
MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 34 / 112
3.7 Normal Distribution

Empirical Rules

Empirical rules is a rule of thumb that applies to data sets with distributions that
VARIABLES AND PROBABILITY DISTRIBUTIONS
are approximate normal, as follow

f(x)

0 13 x " – 3% " – 2% "–% " " + % " + 2% " + 3% x

! 13 for a normal random 68%


in Example 3-7. 95%
99.7%

Figure 3-13 Probabilities associated with a normal distri-


bution.

a. About 68% of the observations will fall within one standard deviation of the mean.
om the symmetry of f (x), P(X ! ") # P(X $ ") # 0.5. Because f (x) is positive for all x,
b. About
del assigns some95% of thetoobservations
probability each interval ofwill
thefall
real within two standard
line. However, deviations of the mean.
the probability
function decreases
c. About 99.7%as x (essentially
moves fartherall) from theConsequently,
of ". observations thewill
probability that three
fall within a standard
ement falls far from " of
deviations is small,
the mean.and at some
P (µdistance
− 3σ <fromX< " theµ+probability of an interval
3σ) = 0.9973.
approximated as zero. The area under a normal pdf beyond 3% from the mean is quite
This factMAT2022
is convenient
(Principlesfor quick, rough sketches of a normal
of Statistics) Lecture 3probability density function. Spring 2025 35 / 112
3.8 Standard Normal Distribution

3.8 Standard Normal Distribution

A normal random variable with µ = 0 and σ 2 = 1 is called a standard normal


random variable.

A standard normal random variable is denoted as Z. Its pdf is given by


1 2
f (z) = √ e−z /2 ,

where −∞ < z < ∞.

The function z 2
Z
1
Φ(z) = P (Z ≤ z) = √ e−u /2 du
−∞ 2π
is used to denote the cumulative distribution function of a standard normal random
variable.

A table (or computer software) is required because the probability cannot be


calculated in general by elementary methods.

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 36 / 112


3.8 Standard Normal Distribution

3-5 IMPORTANT CONTINUOUS DISTRIBUTIONS

P( Z ≤ 1.5) = Φ(1.5)
z 0.00 0.01 0.02 0.03
= shaded area
0 0.50000 0.50399 0.50398 0.51197

...

...
1.5 0.93319 0.93448 0.93574 0.93699
0 1.5 z

Figure 3-14 Standard normal probability density function.

Standard normal table can be used to find the probability that a standard normal
random variable is observed below, above, or between values.
(5)
Examples:

1 P

=(Z1< 1.53)
1.26
2 P (Z > 1.53) 0 1.26 –4.6 –3.99 0
3 P (−1.2 < Z < 0.8)
(6)

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 37 / 112


3.8 Standard Normal Distribution

Standardization of Normal Random Variables

All normal densities have the same essential shape. They can be made equivalent
with respect to areas under them by suitable rescaling of the horizontal axis. The
rescaled variable is denoted by Z.

The correspondence between the Z scale and the X scale can be expressed by
X −µ
Z= , Z ∼ N (0, 1).
σ
Creating a new random variable by this transformation is referred to as
standardizing. The random variable Z represents the distance of X from its mean in
terms of standard deviations. It is the key step in calculating a probability for an
arbitrary normal random variable.

Suppose X is a normal random variable with mean µ and variance σ 2 . Then


 
X −µ x−µ
P (X ≤ x) = P ≤ = P (Z ≤ z)
σ σ
X−µ x−µ
where Z = σ
and z = σ
.

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 38 / 112


3.8 Standard Normal Distribution

Example 12
Suppose the current measurements in a strip of wire are assumed to follow a normal
distribution with a mean of 10 milliamperes and a variance of 4 milliamperes.

(a) What is the probability that a measurement will exceed 13 milliamperes?

(b) What is the probability that a current measurement is between 9 and 11


milliamperes?

(c) Determine the value for which the probability that a current measurement is below
this value is 0.98. (This value is also called the 98th percentile.)

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 39 / 112


3.8 Standard Normal Distribution

Some Useful Facts

There are some useful facts to manipulate standard normal probabilities.

1 P (Z > z) = 1 − P (Z ≤ z)

2 P (Z > z) = P (Z < −z)

3 P (a < Z < b) = P (Z < b) − P (Z < a) for a < b

4 P (|Z| > a) = P (Z > a) + P (Z < −a) = 2 × P (Z > a) = 2 × P (Z < −a) for a > 0

5 P (−a < Z < a) = 1 − 2 × P (Z > a) = 1 − 2 × P (Z < −a) for a > 0

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 40 / 112


3.8 Standard Normal Distribution

Inverting Normal Probabilities

Let pα be the value of z such that the tail area lying to its left equals to α. i.e.,
P (Z < pα ) = α. Given the value of α, we are able to use the standard normal table
to find pα . The value pα is called the 100 × α percentile of a standard normal
distribution.

Examples:
1 p0.90 = 1.281552 from the standard normal table
2 p0.95 = 1.644854
3 p0.99 = 2.326348
4 p0.10 = −1.281552

Suppose X ∼ N (55, 25)


1 The 70th percentile of X

2 The 20th percentile of X

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 41 / 112


3.9 Normal Probability Plot

3.9 Normal Probability Plot

How do we know whether a normal distribution is a reasonable model for data?

Probability plotting is a graphical method for determining whether sample data


conform to a hypothesized distribution based on a subjective visual examination of
the data.

A very important application of normal probability plotting is in verification of


assumptions when using statistical inference procedures that require the normality
assumption.

Ten observations on the effective service life in minutes of batteries used in a


portable personal computer are as follows:

176, 191, 214, 220, 205, 192, 201, 190, 183, 185.

We hypothesize that battery life is adequately modeled by a normal distribution.

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 42 / 112


3.9 Normal Probability Plot

CHAPTER 3 RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS

3.30
j x( j) ( j ! 0.5)!10 zj
1 176 0.05 !1.64
1.65 2 183 0.15 !1.04
3 185 0.25 !0.67
zj 4 190 0.35 !0.39
0 5 191 0.45 !0.13
6 192 0.55 0.13
7 201 0.65 0.39
–1.65 8 205 0.75 0.67
9 214 0.85 1.04
10 220 0.95 1.64
–3.30
170 180 190 200 210 220
x ( j)

Figure 3-25 Normal probability plot


obtained from standardized normal scores.

The zj are the hypothetical standardized normal score of the j th measurements if the
distribution is normal. The values of zj are given by Φ(zj ) = P (Z ≤ zj ) = j−0.5
n
.
2 Other Probability Plots
If the Probability
distribution of the measurements is closely aligned with the normal
plots are extremely useful and are often the first technique used when we need to
distribution,
determinethe
whichstandardized normal isscores
probability distribution likely toand theaactual
provide measurements
reasonable should
model for data. In
form ausing probability
straight line.plots, usually the distribution is chosen by subjective assessment of the
probability plot. More formal goodness-of-fit techniques can be used in conjunction with
probability plotting. We will describe a very simple goodness-of-fit test in Section 4-10.
ting a MAT2022 To illustrate
(Principles of Statistics)how probability plotting Lecture
can be3useful in determining the appropriate distri-
Spring 2025 43 / 112
3.10 Discrete Random Variables

3.10 Discrete Random Variables

(Example) A voice communication network for a business contains 48 external lines.


At a particular time, the system is observed and some of the lines are being used.
Let the random variable X denote the number of lines in use. Then X can assume
any of the integer values 0 through 48.

For a discrete random variable X with possible values x1 , x2 , . . ., the probability


mass function (or pmf) is

f (x) = P (X = x), x = x1 , x2 , . . . .

Since f (x) is defined as a probability, we have


1 f (x) ≥ 0
X
2 f (x) = 1
x=x1 ,x2 ,...

The cumulative distribution function of a discrete random variable X is


X
F (x) = P (X ≤ x) = f (u).
u≤x

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 44 / 112


3.10 Discrete Random Variables

Suppose a discrete random variable X taking values x1 , x2 , . . . has a pmf f (x). The
mean of X is X
µ = E(X) = xf (x).
x=x1 ,x2 ,...

And the variance of X is


X
σ 2 = Var(X) = (x − µ)2 f (x).
x=x1 ,x2 ,...

The standard deviation of X is σ = σ 2 .

Of course, we have a computational formula for σ 2 again.


X
σ 2 = E(X 2 ) − µ2 = x2 f (x) − µ2 .
x=x1 ,x2 ,...

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 45 / 112


3.10 Discrete Random Variables

Example 13
The possible values for X are {0, 1, 2, 3, 4}. Suppose that the probabilities are

P (X = 0) = 0.3, P (X = 1) = 0.3, P (X = 2) = 0.2


P (X = 3) = 0.1, P (X = 4) = 0.1

Compute E(X) and Var(X).

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 46 / 112


3.11 Binomial Distribution

3.11 Binomial Distribution

A random experiment consisting of n repeated trials such that


1 each trial results in only two possible outcomes, labeled as success and failure
2 the trials are independent
3 the probability of a success on each trial, denoted as p, remains constant
is called a binomial experiment.

The number of successes, X, in the experiment is called a binomial random


variable. The distribution of X is called a binomial distribution and we write
X ∼ B(n, p).

Example 14
1 Flip a fair coin 10 times. Let X = the number of heads obtained.
2 A multiple choice test contains 10 questions, each with four choices, and you guess
at each question. Let X = the number of questions answered correctly.
3 Of all bits transmitted through a digital transmission channel, 10% are received in
error. Let X = the number of bits in error in the next 3 bits transmitted.

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 47 / 112


3.11 Binomial Distribution

(Example) Suppose we toss a coin n times.

1 Think about a coin that gives head with probability p.

2 Every tosses, we get either head (Success, S) or tail (Failure, F).

3 Each coin toss gives outcome independently.

4 The probability of success is p.

If we toss a coin four times, how many ways of getting only one success?

1 The number of ways of selecting one of the four trials to be success; the number of
combination of size 1 that can be constructed from four distinct objects, 41 . i.e.,

possible outcomes: SF F F, F SF F, F F SF, F F F S.

2 Now P (SF F F ) = P (S)P (F )P (F )P (F ) = p(1 − p)3 .

3 All the four events have the same probability.


4
4 Therefore, P (getting exactly one head out of four tosses) = p(1 − p)3 .
1

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 48 / 112


3.11 Binomial Distribution

ablesandProbabilityDistributions.qxd 9/21/10 9:27 AM Page 105

If X ∼ B(n, p), the pmf of X is given by

f (x) = n Cx px (1 − p)n−x = n Cx px q n−x , x = 0, 1, . . . , n


n!
as nx .

where q = 1 − p. Note: n Cx = (n−x)!x!
can also3-8beBINOMIAL
writtenDISTRIBUTION 105

0.18 0.4

n p
0.15 n p
10 0.1
20 0.5 10 0.9
0.3
0.12

0.09 0.2
f(x)

f(x)
0.06
0.1
0.03

0 0
0 1 2 3 4 5 6 7 8 9 10 11 12 1314 15 16 17 1819 20 0 1 2 3 4 5 6 7 8 9 10
x x
(a) (b)

Figure 3-32 Binomial distribution for selected values of n and p.

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 49 / 112


3.11 Binomial Distribution

Example 15
Each sample of water has a 10% chance of containing high levels of organic solids.
Assume that the samples are independent with regard to the presence of the solids.

(a) Determine the probability that in the next 18 samples, exactly 2 contain high solids.

(b) Determine the probability that at least four samples contain high solids.

(c) Determine the probability that 3 ≤ X ≤ 7.

(d) Determine the probability that at least one samples contain high solids.

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 50 / 112


3.11 Binomial Distribution

Several Facts about Binomial Distributions

If the number of trials is one, X ∼ B(1, p) has a Bernoulli distribution. The pmf of
X is
f (0) = 1 − p, f (1) = p, f (x) = 0 for x ̸= 0 or 1

Pn
If X ∼ B(n, p), when X = i=1 Yi , where Yi ’s are independent Bernoulli random
variables.

If X ∼ B(n, p), Y ∼ B(m, p) and they are independent, then


X + Y ∼ B(n + m, p).

If X is a binomial random variable with parameters p and n,

µ = E(X) = np and σ 2 = Var(X) = np(1 − p) = npq.

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 51 / 112


3.11 Binomial Distribution

Example 16
The probability that a student will make a mistake on any single question is 0.25. If there
are 8 questions, and the student’s answers are mutually independent of each other, then
what is the probability that he or she makes exactly two mistakes?

2
!
X 8
1 0.25k 0.758−k
k
k=0

2 0.689

3 0.311
6
!
X 8
4 0.25k 0.758−k
k
k=2

5 1, since 2 is the expected value in this case

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 52 / 112


3.12 Poisson Process

3.12 Poisson Process

The number of cars that pass through a certain point on a road during a given
period of time

The number of spelling mistakes a secretary makes while typing a single page

The number of phone calls at a call center per minute

The number of times a web server is accessed per minute

The number of road kill found per unit length of road

The number of mutations in a given stretch of DNA after a certain amount of


radiation

The number of pine trees per unit area of mixed forest

The number of stars in a given volume of space

The number of light bulbs that burn out in a certain amount of time

Q. What are the common characteristics among the above items?


MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 53 / 112
3.12 Poisson Process
number of events over an interval (such as the n
discrete random variable that is often modeled
interval between events (such as the time betwee
tial distribution. These distributions are relat
random variables in the same random experimen
The number of events over an interval is a discrete random variable that is often
modeled by a Poisson distribution.
3-9.1 Poisson Distribution
The length of the interval between events (such as the time between messages) is
often modeled by an exponential distribution.
We introduce the Poisson distribution with an ex

Figure 3-33 In a
Events occur at random
Poission process,
events occur at random x x x x
in an interval. Interval

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 54 / 112


3-9.1 Poisson
3.12 Poisson Distribution
Process

We introduce the Poisson distribution with an ex

Figure 3-33 In a
Events occur at random
Poission process,
events occur at random x x x x
in an interval. Interval

In general, consider an interval T of real numbers partitioned into subintervals of


small length ∆t and assume that as ∆t tends to zero,

1 the probability of more than one event in a subinterval tends to zero,

2 the probability of one event in a subinterval tends to λ∆t/T ,

3 the event in each subinterval is independent of other subintervals.

A random experiment with these properties is called a Poisson process. Here, λ is


the average rate of events that occur during the specified period.

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 55 / 112


3.12 Poisson Process

Poisson Distribution

Previous assumptions for a Poisson process imply that the subintervals can be
thought of as approximately independent Bernoulli trials with success probability
λ∆t/T and the number of trials equal to n = T /∆t.

Here, pn = λ, and as ∆t tends to zero, n tends to infinity. This leads to the


following definition.

The random variable X that equals the number of events in a Poisson process is a
Poisson random variable with parameter λ > 0, and the probability mass function of
X is
e−λ λx
f (x) = , x = 0, 1, . . . ,
x!
The mean and variance of X are

E(X) = λ and Var(X) = λ.

If a Poisson random variable represents the number of events in some interval, the
mean of the random variable must be the expected number of events in the same
length of interval.

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 56 / 112


3.12 Poisson Process

The property of the Poisson distribution suggested by the Binomial analogy: If X


has a Poisson(λ1 ) distribution and Y has a Poisson(λ2 ) distribution independent of
X, then X + Y has a Poisson(λ1 + λ2 ) distribution.

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 57 / 112


3.12 Poisson Process

Example: Prussian Cavalryman Data

Using army records, von Bortkiewicz (1898) noted the chance of Prussian
cavalryman being killed by the kick of a horse.

The records of ten army corps were examined over 20 years, giving a total of 200
observations of one corps for one year.

The total deaths from horse kicks were 122, and the average deaths per year per
corps was thus 122/200 = 0.61.

Let X be the number of deaths occurred from horse kick in one corps in a given
year. Then, X ∼ Poisson(λ) where λ = 0.61.

What is the probability that one death occurred from horse kick in one corps in a
given year?

Given that P (X = 1) = 0.3314, then over the 200 years observed, how many years
with one death should we expect to find?

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 58 / 112


3.12 Poisson Process

For the entire set of Prussian data, let p be the pmf of the Poisson distribution
(frequency for a given number of deaths per year).

E be the corresponding number of years in which that number of deaths is expected


to occur in our 200 samples. i.e., p × 200.

A be the actual number of years in which that many deaths were observed.

Deaths p E A
0 0.54335 108.67 109
1 0.33145 66.29 65
2 0.10110 20.22 22
3 0.02055 4.11 3
4 0.00315 0.63 1
5 0.00040 0.08 0
6 0.00005 0.01 0

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 59 / 112


3.12 Poisson Process

Example 17
It has been observed that the average number of traffic accidents on the Hollywood
Freeway between 7 and 8 AM on Tuesday mornings is 1 per hour. What is the
chance that there will be 2 accidents on the Freeway, on some specified Tuesday
morning (per hour)?

Coliform bacteria are randomly distributed in a certain Arizona river at an average


concentration of 1 per 20 cc of water. If we draw from the river a test tube
containing 10 cc of water, what is the chance that the sample contains exactly 2
coliform bacteria?

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 60 / 112


3.12 Poisson Process

Example 18
For the case of the thin copper wire, suppose that the number of flaws follows a Poisson
distribution with a mean of 2.3 flaws per millimeter.

(a) Determine the probability of exactly 2 flaws in 1 millimeter of wire.

(b) Determine the probability of 10 flaws in 5 millimeters of wire.

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 61 / 112


3.12 Poisson Process

Exponential Distribution

The random variable X that equals the distance between successive events of a
Poisson process with mean λ > 0 has an exponential distribution with parameter λ.
The pdf of X is
f (x) = λe−λx , x ≥ 0.
Exponential distribution is a continuous distribution. The mean and variance of X is

E(X) = 1/λ and Var(X) = 1/λ2 .

The cdf of X is given by

F (x) = 1 − e−λx , x ≥ 0.

For any value of λ, the exponential distribution is quite skewed. The formulas for
the mean and variance can be obtained by integration (by parts).

It is important to use consistent units in the calculation of probabilities, means, and


variances involving exponential random variables.

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 62 / 112


3.12 Poisson Process

Example 19
In a large corporate computer network, user log-ons to the system can be modeled as a
Poisson process with a mean of 25 log-ons per hour.

(a) What is the probability that the first log-on occurs in less than 6 minutes?

(b) What is the probability that there are no log-ons in an interval of 6 minutes?

(c) Determine the interval of time such that the probability that no log-on occurs in the
interval is 0.9.

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 63 / 112


3.12 Poisson Process

In previous example, the probability that there are no log-ons in a 6-minute interval
is 0.082 regardless of the starting time of the interval (our starting point for
observing the system does not matter). A Poisson process assumes that events occur
independently, with constant probability, throughout the interval of observation.

However, if there are high-use periods during the day, such as right after 8 AM,
followed by a period of low use, a Poisson process is not an appropriate model for
log-ons. In that case, modeling each of the high- and low-use periods by a separate
Poisson process might be reasonable.

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 64 / 112


3.12 Poisson Process

In the development of a Poisson process, we assumed that an interval could be


partitioned of a small intervals that were independent (the presence or absence
of events in subintervals is similar to independent Bernoulli trials); knowledge of
previous results does not affect the probabilities of events in future subintervals
(the lack of memory property).

(Memoryless property of an Exponential Random Variable) If X is an


Exponential random variable with parameter λ, then for any positive constants a
and b,

P (X > a + b|X > a) = P (X > b).

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 65 / 112


3.12 Poisson Process

The exponential distribution is often used in reliability studies as the model for the
time until failure of a device.

e.g., the lifetime of a semiconductor device might be modeled as an exponential


random variable with a mean of 40,000 hours. The lack of memory property of
the exponential distribution implies that the device does not wear out. i.e.,
regardless of how long the device has been operating, the probability of a failure in
the next 1,000 hours is the same as the probability of a failure in the first 1,000
hours of operation.

The lifetime of a device with failures caused by random shocks might be


appropriately modeled as an exponential random variable. However, the lifetime of
a device that suffers slow mechanical wear, such as bearing wear, is better
modeled by a distribution that does not lack memory such as the Weibull
distribution.

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 66 / 112


3.13 Connections between Binomial and Poisson Distributions

3.13 Connections between Binomial and Poisson Distributions

Binomial distribution models the number of successes in a binomial experiment,


where the number of trials, n, is known and the probability of success for each trial,
p, is fixed.

Poisson distribution models the number of occurrences of certain event types in a


fixed interval and the average number of occurrences, λ, is fixed.

If X has a binomial distribution, the pmf of X is given by

f (x) = n Cx px (1 − p)n−x , x = 0, 1, . . . , n.

This tells us how likely the values of X are distributed and it is important to note
that X is always between 0 and n inclusively. Moreover, the average number of
successes is given by np.

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 67 / 112


3.13 Connections between Binomial and Poisson Distributions

Now, suppose we let n goes to ∞ while keeping np as a constant. For instance, let

λ = np

be the constant such that p = λ/n. For any fixed x,


 x  n−x
n! λ λ
f (x) = 1−
(n − x)!x! n n
x
 n−x
n! λ λ
= × × 1 −
(n − x)!nx x! n
When n → ∞,

λx e−λ λx
lim f (x) = 1 × × e−λ = ,
n→∞ x! x!
which is the pmf of Poisson distribution.

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 68 / 112


3.13 Connections between Binomial and Poisson Distributions

Conclusion: When n is large and p is small so that λ = np, the pmf of binomial
distribution is close to the pmf of Poisson distribution.

p.m.f. of B(10, 0.8) p.m.f. of B(20, 0.4) p.m.f. of B(40, 0.2) p.m.f. of Poisson(8)
0.30

0.30

0.30

0.30
0.20

0.20

0.20

0.20
0.10

0.10

0.10

0.10
0.00

0.00

0.00

0.00
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15

As a general rule, if it is a binomial problem with n ≥ 20 and p ≤ 0.05, then we can


use the Poisson probability distribution as an approximation to the binomial
distribution.

Note: If n > 100 and np < 10, the approximation is excellent.

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 69 / 112


3.14 Connections between Binomial and Normal Distributions

3.14 Connections between Binomial and Normal Distributions

If p is fixed when n increases, the Poisson approximation will no longer hold, instead,
a normal distribution may give better approximation. To see this, consider the
following binomial distributions.

p.m.f. of B(10,0.2) p.m.f. of B(20,0.2) p.m.f. of B(40,0.2) p.d.f. of N(8,6.4)


0.30

0.30

0.30

0.30
0.20

0.20

0.20

0.20
0.10

0.10

0.10

0.10
0.00

0.00

0.00

0.00
0 2 4 6 8 10 0 2 4 6 8 10 0 5 10 15 20 0 5 10 15 20

In the case p is small, the binomial distribution with small n is right-skewed.


When n increases, the distribution becomes more symmetrical and bell-shaped.
This is the result of a more general phenomenon — Central Limit Theorem.

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 70 / 112


3.14 Connections between Binomial and Normal Distributions

Binomial to Normal Approximation

If X is a binomial random variable with parameters n and p,


X − np
Z= p
np(1 − p)
is approximately a standard normal random variable. Consequently, probabilities
computed from Z can be used to approximate probabilities for X.

The normal approximation to the binomial distribution is good if n is large enough


relative to p, in particular, whenever np > 5 and n(1 − p) > 5.

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 71 / 112


3.14 Connections between Binomial and Normal Distributions

Continuity correction can be used to further improve the approximation. If X has a


binomial distribution, P (X ≤ x) = P (X < x + 1). Then
!
X − np x + 0.5 − np
P (X ≤ x) = P (X ≤ x + 0.5) = P p ≤ p
np(1 − p) np(1 − p)
!
x + 0.5 − np
≈P Z≤ p ,
np(1 − p)

for any integer x.

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 72 / 112


3.14 Connections between Binomial and Normal Distributions

Continuity Correction

Binomial probability mass function and


normal probability density function
approximation for n = 6 and p = 0.5.

If X ∼ B(n, p) and
Y ∼ N (np, np(1 − p)), then we
approximate

P (X ≤ x) ≈ P (Y ≤ x + 1/2),
P (X ≥ x) ≈ P (Y ≥ x − 1/2).

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 73 / 112


3.14 Connections between Binomial and Normal Distributions

Example 20
In a digital communication channel, assume that the number of bits received in error
can be modeled by a binomial random variable, and assume that the probability that
a bit is received in error is 1 × 10−5 . If 16 million bits are transmitted, what is the
probability that more than 150 errors occur?

For a large sample, with np > 5 and n(1 − p) > 5, the normal approximation is
reasonable. However, if np or n(1 − p) is small, the binomial distribution is quite
skewed and the symmetric normal distribution is not a good approximation.

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 74 / 112


3.15 Connections between Poisson and Normal Distributions

3.15 Connections between Poisson and Normal Distributions

Recall that the Poisson distribution was developed as the limit of a binomial
distribution as the number of trials increased to infinity. Consequently, the normal
distribution can also be used to approximate probabilities of a Poisson random
variable. The approximation is good for

λ>5

and a continuity correction can also be applied.

Poisson to Normal approximation If X has a Poisson distribution with mean λ,


then
   
X −λ x + 0.5 − λ x + 0.5 − λ
P (X ≤ x) = P (X ≤ x+0.5) = P √ ≤ √ ≈P Z≤ √
λ λ λ
for any integer x.

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 75 / 112


3.15 Connections between Poisson and Normal Distributions

Example 21
Assume that the number of contamination particles in a liter water sample follows a
Poisson distribution with a mean of 1000.
(a) If a sample is analyzed, what is the probability that fewer than or equal to 950
particles are found?

(b) What is the probability that more than 25 particles in 20 milliliters of water?

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 76 / 112


3.16 Independence of Events and Random Variables

3.16 Independence of Events and Random Variables

Very often, multiple variables are measured in a single experiment. It would be of


interest to explore if the variables are related or dependent on each others. To
understand the relationship, we need the following concepts.

Events A and B are said to be independent if and only if

P (A ∩ B) = P (A)P (B).

If two events are not independent, they are dependent.

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 77 / 112


3.16 Independence of Events and Random Variables

Example 22
Consider an experiment of tossing two dice, let A be the event that the first die is a “1”
and B be the event that the sum is “7”.

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 78 / 112


3.16 Independence of Events and Random Variables

A more general definition of independence is as follows.

- Events A1 , A2 , . . . , An are independent if and only if


n
! n
\ Y
P Ai = P (A1 ∩ A2 ∩ . . . ∩ An ) = P (A1 )P (A2 ) . . . P (An ) = P (Ai ).
i=1 i=1

- Random variables X1 , X2 , . . . , Xn are independent if and only if


n
! n
\ Y
P {Xi ∈ Ai } = P (Xi ∈ Ai ).
i=1 i=1

for any sets A1 , . . . , An .

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 79 / 112


3.16 Independence of Events and Random Variables

Example 23
The diameter of a shaft in a storage drive is normally distributed with mean 0.2508
inch and standard deviation 0.0005 inch. The specifications on the shaft are
0.2500 ± 0.0015 inch. The probability that a diameter meets specifications was
determined to be 0.919. What is the probability that 10 diameters all meet
specifications, assuming that the diameters are independent?

Suppose X1 , X2 , and X3 represent the thickness in micrometers of a substrate, an


active layer, and a coating layer of a chemical product, respectively. Assume that
X1 , X2 , and X3 are independent and normally distributed with µ1 = 10, 000,
µ2 = 1000, µ3 = 80, σ1 = 250, σ2 = 20, σ3 = 4. The specifications for the
thickness of the substrate, active layer, and coating layer are 9200 < x1 < 10, 800,
950 < x2 < 1050, and 75 < x3 < 85, respectively. What proportion of chemical
products meets all thickness specifications?

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 80 / 112


!
3.16 Independence of Events and Random Variables

Some additional applications of independence frequently occur in the area of system


nalysis. Consider a system that consists of devices that are either functional or failed. It is as-
umed that the
Example 24devices are independent.
(Series System): The system shown here operates only if there is a path of functional
he system shownfrom
components here operates only if
left to right. there
The is a path of
probability functional
that components
each component from leftistoshown
functions right. The
obability
in the that each component
diagram. Assume that functions is shown infunction
the components the diagram. Assume
or fail that the components
independently. What is the func-
on or fail independently. What is the
probability that the system operates?probability that the system operates?

C1 C2
0.9 0.95

olution. Let C1 and C2 denote the events that components 1 and 2 are functional, respectively. For the
ystem to operate, both components must be functional. The probaility that the system operates is

P(C1, C2) " P(C1)P(C2) " (0.9)(0.95) " 0.855

Note that the probability that the system operates is smaller than the probability that any compo-
ent operates. This system fails whenever any component fails. A system of this type is called a series
ystem. ■
MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 81 / 112
Note that the probability that the system operates is smaller than the probability that any compo
3.16 Independence of Events and Random Variables

ent operates. This system fails whenever any component fails. A system of this type is called a series
ystem. ■
Example 25

he (Parallel System):
system shown The system
here operates onlyshown
if therehere operates
is a path only if there
of functional is a path
components of left
from functional
to right. The
components from left to right. The probability that each component functions
robability that each component functions is shown. Assume that the components function is or
shown.
fail inde
Assume
endently. thatisthe
What componentsthat
the probability function or fail
the system independently. What is the probability that
operates?
the system operates?

C1
0.9

C2
0.95

olution. Let C1 and C2 denote the events that components 1 and 2 are functional, respectively. Also
1# and C2# denote the events that components 1 and 2 fail, respectively, with associated probabilities
(C1#) " 1 $ 0.9 " 0.1 and P(C2#) " 1 $ 0.95 " 0.05. The system will operate if either component is
unctional. The probability that the system operates is 1 minus the probability that the system fails, and
his occurs whenever both independent components fail. Therefore, the requested probability is

P(C or C ) " 1 $ P(C¿, C ¿) " 1 $Lecture


MAT2022 (Principles of Statistics) P(C¿)P(C¿)
3 " 1 $ (0.1)(0.05) " 0.995
Spring 2025 82 / 112
3.16 Independence of Events and Random Variables

Reliability

The probability that a component does not fail over the time of its mission is called
its reliability. Let ri denotes the reliability of component i in a system that consists
of k components and that r denotes the probability that the system does not fail
over the time of the mission. i.e., r can be called the system reliability.

The previous examples can be extended to obtain the following results for a series
system
r = r1 r2 · · · rk

and for a parallel system

r = 1 − (1 − r1 )(1 − r2 ) · · · (1 − rk ).

The analysis of a complex system can be accomplished by a partition into


subsystems, which are sometimes called blocks.

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 83 / 112


3.17 Functions of Random Variables

3.17 Functions of Random Variables

Let X be a random variable (either continuous or discrete) with mean µ and


variance σ 2 , and let c be a constant. Define a new random variable Y as

Y =X +c

It follows that

E(Y ) = E(X) + c = µ + c
Var(Y ) = Var(X) = σ 2

Now suppose that the random variable X is multiplied by a constant, resulting in

Y = cX

It follows that

E(Y ) = cE(X) = cµ
Var(Y ) = c2 Var(X) = c2 σ 2

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 84 / 112


3.17 Functions of Random Variables

Linear Functions of Independent Random Variables

Let C0 , C1 , C2 , . . . , Cn be constants, and let X1 , X2 , . . . , Xn be independent


random variables with means E(Xi ) = µi and variances Var(Xi ) = σi2 ,
i = 1, 2, . . . , n.

The mean and variance of the linear function of independent random variables

Y = C0 + C1 X1 + C2 X2 + . . . + Cn Xn

are

E(Y ) = C0 + C1 µ1 + C2 µ2 + . . . + Cn µn and Var(Y ) = C12 σ12 + C22 σ22 + . . . + Cn2 σn2 .

If X1 , X2 , . . . , Xn are independent and normally distributed,

Y = C0 + C1 X1 + C2 X2 + . . . + Cn Xn

has a normal distribution with mean and variance

E(Y ) = C0 + C1 µ1 + C2 µ2 + . . . + Cn µn and Var(Y ) = C12 σ12 + C22 σ22 + . . . + Cn2 σn2 .

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 85 / 112


3.17 Functions of Random Variables

Example 26
Suppose that the random variables X1 and X2 represent the length and width,
respectively, of a manufactured part. For X1 , suppose that we know that µ1 = 2
centimeters and σ1 = 0.1 centimeter and for X2 , we know that µ2 = 5 centimeters and
σ2 = 0.2 centimeter. Also, assume that X1 and X2 are independent.

(a) We wish to determine the mean and standard deviation of the perimeter
(Y = 2X1 + 2X2 ) of the part.

(b) Suppose that X1 and X2 are normally distributed. Determine the probability that
the perimeter of the part exceeds 14.5 centimeters.

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 86 / 112


3.17 Functions of Random Variables

Linear Functions of Random Variables That Are Not Independent

We first have to discuss the concept of covariance. Covariance measures how strong
two random variables are linearly related.

If X1 and X2 are two random variables and their means are µ1 and µ2 respectively,
then the covariance between X1 and X2 is

Cov(X1 , X2 ) = E{(X1 − µ1 )(X2 − µ2 )} = E(X1 X2 ) − µ1 µ2 .

Given a joint distribution of X1 and X2 , one can compute the covariance.

The correlation coefficient between X1 and X2 is


Cov(X1 , X2 )
ρ=
σ1 σ2
where σ1 and σ2 are the s.d.’s of X1 and X2 respectively.

Note: −1 ≤ ρ ≤ 1 and the sample correlation coefficient r is an estimate of ρ.

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 87 / 112


3.17 Functions of Random Variables

Joint Probability for Discrete Random Variables

Sometimes we are interested in looking at the probabilities of multiple outcomes


simultaneously.

Multiple variables have some relationship or dependency on each other.


- Collecting multiple variables (e.g., temperature, precipitation, wind speed)
- Studying the relationship between two variables (e.g., is smoking related to incidents
of lung cancer?)

The joint probability mass function for X and Y is defined as

fX,Y (a, b) = P (X = a, Y = b) = P ({X = a} ∩ {Y = b})


and must satisfy the usual rules

- fX,Y (a, b) ≥ 0, for all a, b,


XX
- fX,Y (ai , bj ) = 1, where ai , bj are all possible outcomes for X and Y , and
i j

- fX,Y (a, b) = fX (a) × fY (b) for all pairs (a, b) if and only if X and Y are independent.

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 88 / 112


3.17 Functions of Random Variables

Example 27
You are sending a binary message over a wireless network. Each bit sent has some
probability of being corrupted. Let S be a binary random variable representing a sent bit
and R be a binary random variable representing the corresponding received bit. The joint
pmf, P (S = a, R = b), is given by four probabilities.
S\R 0 1
0 0.45 0.08
1 0.06 0.41

(a) What is the probability of sending a 0 and receiving 0 or 1?

(b) What is P (R = 1)?

(c) Are S and R independent?

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 89 / 112


3.17 Functions of Random Variables

Joint Probability for Continuous Random Variables

The joint probability density function for X and Y , fX,Y (x, y), is defined as
Z Z
P [(x, y) ∈ A] = fX,Y (x, y)dxdy
A
Z b Z d
P (a ≤ X ≤ b, c ≤ Y ≤ d) = fX,Y (x, y)dydx
a c

and must satisfy the usual rules

- fX,Y (x, y) ≥ 0, for all x, y,


Z ∞ Z ∞
- fX,Y (x, y)dxdy = 1,
−∞ −∞
Z ∞
- fX (x) = fX,Y (x, y)dy, −∞ < x < ∞, and
−∞
Z ∞
- fY (y) = fX,Y (x, y)dx, −∞ < y < ∞.
−∞

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 90 / 112


3.17 Functions of Random Variables

Example 28
Let X and Y be two continuous random variables with joint probability density function,

fX,Y (x, y) = 4xy, 0 ≤ x ≤ 1, 0 ≤ y ≤ 1

(a) What is P (Y < X)? Let the set B in the xy-plane such that (x, y) ∈ [0, 1]2 and
y < x is B = {(x, y)|0 ≤ y < x ≤ 1}.

(b) What are fX (x) and fY (y)?

(c) Are X and Y independent?

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 91 / 112


3.17 Functions of Random Variables

Example 29
Suppose X1 and X2 have the following joint pmf:
X1 \X2 2 4
1 0.1 0.2
2 0.3 0.3
3 0 0.1

(a) Compute the covariance between X1 and X2 .

(b) Compute the correlation coefficient between X1 and X2 .

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 92 / 112


3.17 Functions of Random Variables

Let X1 and X2 be random variables with means µ1 , µ2 and variances σ12 , σ22
respectively. The linear function

Y = C0 + C1 X1 + C2 X2

has mean
E(Y ) = C0 + C1 µ1 + C2 µ2
and variance
Var(Y ) = C12 σ12 + C22 σ22 + 2C1 C2 Cov(X1 , X2 )

Though we are only showing the expressions for linear functions of two random
variables, it could be easily generalized to multiple variables.

Can you compute the E(Y ) and Var(Y ) from the previous example?

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 93 / 112


3.17 Functions of Random Variables

Nonlinear Functions of Independent Random Variables

Many problems in engineering involve nonlinear functions of random variables. e.g.,

1 The power P dissipated by the resistance R in an electrical circuit is given by the


relationship
P = I2R
where I is the current. If the resistance is a known constant and the current is a
random variable, the power is a random variable that is nonlinear function of the
current.

2 We can experimentally measure the acceleration due to gravity by dropping a baseball


and measuring the time T it takes for the ball to travel a known distance d. The
relationship is
G = 2d/T 2
Since the time T is measured with error, it is a random variable. Thus, the
acceleration due to gravity is a nonlinear function of the random variable T .

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 94 / 112


3.17 Functions of Random Variables

Suppose that the random variable Y is a function of the random variable X, say,

Y = h(X)

then a general solution for the mean and variance of Y can be difficult. It depends
on the complexity of the function h(X).

2
(Propagation of Error Formula) If X has mean µX and variance σX , then the
approximate mean and variance of Y can be computed using the following results:

E(Y ) = µY ≈ h(µX ) and Var(Y ) = σY2 ≈ (dh/dX)2 σX


2

where the derivative dh/dX is evaluated at µX .

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 95 / 112


3.18 Sampling Distribution and Central Limit Theorem

3.18 Sampling Distribution and Central Limit Theorem

Independent random variables X1 , X2 , . . . , Xn with the same distribution are called


a random sample.

A statistic is a function of the random variables in a random sample.

The probability distribution of a statistic is called its sampling distribution.

Sampling distribution of sample mean: Suppose X1 , X2 , . . . , Xn are normally


and independently distributed with mean µ and variance σ 2 . The sample mean
X1 + X2 + . . . + Xn
X̄ =
n
is a linear function of the Xi s, then from previous result, we conclude that X̄ is also
normally distributed with mean µX̄ = E(X̄) = n1 (µ + µ + . . . + µ) = µ and variance
2
2 2 2
σX̄ = Var(X̄) = n1 (σ 2 + σ 2 + . . . + σ 2 ) = σn . We write X̄ ∼ N (µ, σn ).

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 96 / 112


3.18 Sampling Distribution and Central Limit Theorem

Example 30
Soft-drink cans are filled by an automated filling machine. The mean fill volume is 12.1
fluid ounces, and the standard deviation is 0.05 fluid ounce. Assume that the fill volumes
of the cans are independent, normal random variables. What is the probability that the
average volume of 10 cans selected from this process is less than 12 fluid ounces?

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 97 / 112


3.18 Sampling Distribution and Central Limit Theorem

Central Limit Theorem

If we are sampling from a population that has an unknown probability distribution,


the sampling distribution of the sample mean is still approximately normal with
mean µ and variance σ 2 /n if the sample size n is large.

Central Limit Theorem (CLT): If X1 , X2 , . . . , Xn is a random sample of size n


taken from a population with mean µ and variance σ 2 , and if X̄ is the sample mean,
the limiting form of the distribution of

X̄ − µ
Z= √
σ/ n
as n → ∞, is the standard normal distribution.

In other words, the sampling distribution of the sample mean is approximately


normal if the sample size is large regardless the distribution of the population.

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 98 / 112


Although
3.18the central
Sampling limit theorem
Distribution will
and Central work
Limit well for small samples (n ! 4, 5) in most
Theorem
cases—particularly where the population is continuous, unimodal, and symmetric—larger

1 2 3 4 5 6 x
Distributions of average scores from throw-
(a) One die ing dice. Notice that, while the distribution
of a single die is relatively far from normal,
the distribution of averages is approximated
1 2 3 4 5 6 x reasonably well by the normal distribution for
(b) Two dice sample sizes as small as 5.

As a rule of thumb, CLT works if n ≥ 30


1 2 3 4 5 6 x
regardless the shape of the population dis-
(c) Three dice tribution where the data come from. If the
population is reasonably symmetric, n can be
as few as 4.
1 2 3 4 5 6 x
(d) Five dice

1 2 3 4 5 6 x
(e) Ten dice

Figure 3-43 Distributions of average


scores from throwing dice. [Adapted
with permission from Box, Hunter, and
Hunter (1978).]
MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 99 / 112
3.18 Sampling Distribution and Central Limit Theorem

Central Limit Theorem and Binomial Distribution

Suppose X ∼ B(n, p), it could be shown that X is the sum of n independent


Bernoulli random variables:
Xn
X= Yi ,
i=1

where Yi are independent Bernoulli random variables. Note that the mean and
variance of Yi are µ = p and σ 2 = p(1 − p), respectively.

If we denote the proportion of successes as P̂ ,


Pn
X i=1 Yi
P̂ = =
n n
is essentially
 the sample mean of Yi s. By applying CLT, P̂ is approximately
N p, p(1−p)
n
, i.e.,
P̂ − p X − np
Z= q = p
p(1−p) np(1 − p)
n

is approximately standard normal. Therefore, we could use the normal distribution to


approximate Binomial probabilities.
MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 100 / 112
3.19 Other Continuous Distributions (optional): Lognormal Distribution

3.19 Other Continuous Distributions (optional): Lognormal Distribution

Variable in a system sometimes follows an exponential relationship as x = exp(w).

Let X = exp(W ) is a random variable and the distribution of X is of interest.

If W has a normal distribution, then the distribution of X is called a lognormal


distribution. The name follows from the transformation ln(X) = W (i.e., the
natural logarithm of X is normally distributed).

The range of X is (0, ∞). Suppose that W ∼ N (θ, ω 2 ), then the cdf of X is

F (x) = P (X ≤ x) = P (eW ≤ x) = P [W ≤ ln(x)]


" # " #
ln(x) − θ ln(x) − θ
= P Z≤ =Φ
ω ω

for x > 0, where Z is a standard normal variable.

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 101 / 112


3.19 Other Continuous Distributions (optional): Lognormal Distribution

Let W have a normal distribution with mean θ and variance ω 2 , then X = exp(W )
is a lognormal random variable with probability density function
" #
1 (ln(x) − θ)2
f (x) = √ exp − , 0 < x < ∞.
xω 2π 2ω 2

The mean and variance of X are


2 2 2
E(X) = eθ+ω /2
and Var(X) = e2θ+ω (eω − 1).

The lifetime of a product that degrades over time is often modeled by a lognormal
random variable. e.g., this is a common distribution for the lifetime of a
semiconductor laser.

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 102 / 112


3.19 Other Continuous Distributions (optional): Lognormal Distribution

Figure below illustrates lognormal distributions for selected values of the parameters.

Since the lognormal distribution is derived from a simple exponential function of a


normal random variables, it is easy to understand and evaluate probabilities.

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 103 / 112


3.19 Other Continuous Distributions (optional): Lognormal Distribution

Example 31
(Lifetime of a Laser) The lifetime of a semiconductor laser has a lognormal distribution
with θ = 10 and ω = 1.5 hours.

(a) What is the probability the lifetime exceeds 10,000 hours?

(b) What lifetime is exceeded by 99% of lasers?

(c) Determine the mean and standard deviation of lifetime.

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 104 / 112


3.19 Other Continuous Distributions (optional): Lognormal Distribution

Gamma Distribution

The gamma function is


Z ∞
Γ(r) = xr−1 e−x dx, for r > 0.
0

Moreover, by using integration by parts it can be shown that Γ(r) = (r − 1)Γ(r − 1)


(a generalization of the factorial function).

The random variable X with probability density function

λr xr−1 e−λx
f (x) = , for x > 0
Γ(r)
is a gamma random variable with parameter λ > 0 and r > 0.

The mean and variance are

E(X) = r/λ and Var(X) = r/λ2 .

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 105 / 112


3.19 Other Continuous Distributions (optional): Lognormal Distribution

The chi-square distribution is a


special case of the gamma
distribution in which λ = 1/2 and
r equals one of the values
1/2, 1, 3/2, 2, . . . .

The chi-square distribution is used


extensively in interval estimation
and tests of hypotheses that are
discussed in Lecture 4 and 5.

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 106 / 112


3.19 Other Continuous Distributions (optional): Lognormal Distribution

Weibull Distribution

The Weibull distribution is often used to model the time until failure of many
different physical systems.

The parameters in the distribution provide a great deal of flexibility to model


systems in which the number of failures increases with time (bearing wear),
decreases with time (some semi-conductors), or remains constant with time (failures
caused by external shocks to the system).

The random variable X with probability density function


!β−1 " !β #
β x x
f (x) = exp − , for x > 0
δ δ δ

is a Weibull random variable with scale parameter δ < 0 and shape parameter
β > 0.
#2
h
E(X) = δΓ(1 + 1/β) and Var(X) = δ 2 Γ(1 + 2/β) − δ 2 Γ(1 + 1/β) .

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 107 / 112


3.19 Other Continuous Distributions (optional): Lognormal Distribution

If X has a Weibull distribution with


parameter δ and β, the cumulative
distribution function of X is
" !β #
x
F (X) = 1 − exp − .
δ

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 108 / 112


3.19 Other Continuous Distributions (optional): Lognormal Distribution

Example 32
(Lifetime of a Bearing) The time to failure (in hours) of a bearing in a mechanical shaft is
satisfactorily modeled as a Weibull random variable with β = 1/2 and δ = 5000 hours.

(a) Determine the mean time until failure

(b) Determine the probability that a bearing lasts at least 6000 hours

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 109 / 112


3.19 Other Continuous Distributions (optional): Lognormal Distribution

Beta Distribution

For probability models, a continuous distribution that is flexible, but bounded over a
finite range can be useful. e.g., the proportion of solar radiation absorbed by a
material and the proportion (of the maximum time) required to complete a task in a
project are examples of continuous random variables over the interval [0, 1].

The random variable X with probability density function


Γ(α + β) α−1
f (x) = x (1 − x)β−1 , for x ∈ [0, 1]
Γ(α)Γ(β)
is a beta random variable with parameter α > 0 and β > 0.

If α = β the distribution is symmetric about x = 0.5, and if α = β = 1 the beta


distribution equals a continuous uniform distribution.

αβ
E(X) = α/(α + β) and Var(X) = (α+β)2 (α+β+1)
.

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 110 / 112


3.19 Other Continuous Distributions (optional): Lognormal Distribution

Beta Distribution

A closed-form expression
for the cumulative
distribution function is not
available in general, and
probabilities for beta
random variables need to
be computed numerically.

If α > 1 and β > 1, the


mode is in the interior of
[0, 1] and equals
α−1
mode = .
α+β−2

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 111 / 112


3.19 Other Continuous Distributions (optional): Lognormal Distribution

Reference

Montgomery, D., Runger, G., and Hubele, N. Engineering Statistics (Fifth Edition).
Wiley. Chapter 3.

Wheelan, C. Naked Statistics. NORTON

Jeong, J. Course Materials for Principles of Statistics I (STAT 211), TAMU

MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 112 / 112

You might also like