Lecture03
Lecture03
Hanyang University
3 Calculate means and variances for discrete and continuous random variables
6 Calculate covariances (and correlations) between random variables and means (and
variances) for linear combinations of random variables
- http://www.stat.berkeley.edu/~aldous/Real-World/coin_tosses.html
The union of two events A and B is the event that occurs if either A or B (or both)
occurs on a single performance of the experiment. We denote the union of events A
and B by the symbol A ∪ B (read: “A union B”). A ∪ B consists of all the sample
points that belong to A or B or both.
The intersection of two events A and B is the event that occurs if both A and B
occur on a single performance of the experiment. We write A ∩ B for the
intersection of A and B (read: “A intersect B”). A ∩ B consists of all the sample
points belonging to both A and B.
The events A and B are called mutually exclusive (or disjoint) if A ∩ B is empty.
Axioms of Probability
0 ≤ P (E) ≤ 1.
2 P (Ω) = 1; the probability of the whole sample space is one.
Some rules
The probability that an event E does not happen, denoted by E ′ , is one minus the
probability that the event happens. That is,
P (E ′ ) = 1 − P (E).
Example 1
A glass jar contains 1 red, 3 green, 2 blue, and 4 yellow marbles. If a single marble is
chosen at random from the jar, what is the probability that it is yellow or green?
A single card is chosen at random from a standard deck of 52 playing cards. What is
the probability of choosing a king or a club?
Example 2
Let E1 denote the event that a structural component fails during a test and E2 denote
the event that the component shows some strain but does not fail. Given P (E1 ) = 0.15
and P (E2 ) = 0.3,
(a) What is the probability that a structural component does not fail during a test?
(b) What is the probability that a component either fails or shows strain during a test?
(c) What is the probability that a component neither fails nor shows strain during a
test?
Permutations
The number of ways you can choose r objects in order from n distinct objects
n!
n Pr = n(n − 1)(n − 2) · · · (n − r + 1) =
(n − r)!
where
n! = n × (n − 1) × (n − 2) × · · · × 2 × 1
and
0! = 1
Combinations
The number of ways you can choose r objects without regarding order from n
distinct objects
n Pr n!
n Cr = =
P
r r r!(n − r)!
Note: In permutations, order matters and in combinations order does not matter.
Example 3
(Permutation) There are five candidates Adam, Bette, Carl, David, Elle. Suppose we
want to choose two of them and then rank the two chosen persons, then how many
possible rankings are there?
Example 4
(Combination) Suppose 5 cards are randomly drawn from a full deck of 52 without
replacement to form a 5-card hand. What is the probability of getting a flush?
Note: A flush is a hand of playing cards where all cards are of the same suit
(spades, hearts, clubs, diamonds).
P (A ∩ B)
P (A|B) = , provided P (B) > 0.
P (B)
More generally,
Bayes’ Theorem
Note: A partition of sample space is simply the decomposition of the sample space
into a collection of disjoint (mutually exclusive) events with positive probability.
P (A|B)P (B)
P (B|A) = .
P (A|B)P (B) + P (A|B c )P (B c )
Example 5
According to the genetics of color blindness, it is believed to be more common in men
than in women. It is usually said that 5 men in 100 and 25 women in 10,000 suffer from
the condition. Thus, the probability of being color blind is conditional on whether it is for
men or women.
What is the probability of being color blind, assuming that the proportions of men and
women are equal?
Example 6
Suppose you meet two arbitrarily chosen people. What is the probability that their
birthdays are different? What about more than two people?
Example 7
An urn contain two type A coins and one type B coin. When a type A coin is flipped, it
comes up a head with probability 1/4, whereas when a type B coin is flipped, it comes
up a head with probability 3/4. A coin is randomly chosen from the urn and flipped.
Given that the flip landed on a head, what is the probability that it was a type A coin?
A random variable is a mapping from the (abstract) sample space to a real number.
It assigns a real number to each member of the sample space.
A discrete random variable is a random variable with a finite (or countably infinite)
set of real numbers for its range.
(e.g., number of scratches on a surface, proportion of defective parts among 1000
tested, number of transmitted bits received in error)
Example 8
Assume that the following probabilities apply to the random variable X that denotes the
life in hours of standard fluorescent tubes: P (X ≤ 5000) = 0.1,
P (5000 < X ≤ 6000) = 0.3, P (X > 8000) = 0.4. What is P (6000 < X ≤ 8000)?
The probability density function (or pdf) f (x) of a continuous random variable X is
used to determine probabilities as follows:
Z b
P (a < X < b) = f (x)dx.
3-4 CONTINUOUS RANDOM VAR
a
f(x) f(x)
P(a < X < b)
a b x
x
Figure 3-6 Probability determined from the area
-5 Density func- under f(x). Figure 3-7 A histo
loading on a long, mates a probability
m. The area of each bar
tive frequency of the
MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 23 / 112
3.5 Continuous Random Variables
Properties of pdf
It is important that f (x) is used to calculate an area that represents the probability
that X assumes a value in (a, b).
f (x) ≥ 0
R∞
−∞
f (x)dx = 1
Because every point has zero width, the integral that determines the loading at any
point is zero
P (X = x) = 0.
Example 9
Let the continuous random variable X denote the current measured in a thin copper
wire in milliamperes. Assume that the range of X is [0, 20], and assume that the
probability density function of X is uniform. What is the probability that a current
measurement is less than 10 milliamperes.
Let the continuous random variable X denote the distance in micrometers from the
start of a track on a magnetic disk until the first flaw. Historical data show that the
distribution of X can be modeled by a pdf
1 −x/2000
f (x) = e , x ≥ 0.
2000
What proportion of parts is between 1000 and 2000 micrometers?
The cdf F (x) can be related to the pdf f (x) and can be used to obtain probabilities
as follows:
Z b Z b Z a
P (a < X < b) = f (x)dx = f (x)dx − f (x)dx = F (b) − F (a).
a −∞ −∞
The integral in E(X) is the population analogous to the sum that is used to
calculate x̄.
Recall that x̄ is the balance point when an equal weight is placed at the location of
each measurement along a number line.
If f (x) is the density function of a loading on a long, thin beam, E(X) is the point
at which the beam balances.
Example 10
Suppose X has a pdf f (x) = 2x for 0 < x < 1.
(a) Show that f (x) is a valid pdf.
(b) Compute the mean of X.
Example 11
Suppose X has a pdf f (x) = 2x for 0 < x < 1.
What are the variance and standard deviation of X?
Z ∞
Var(X) = (x − µ)2 f (x)dx
−∞
Z ∞
= (x2 f (x) − 2xµf (x) + µ2 f (x))dx
−∞
Z ∞ Z ∞ Z ∞
= x2 f (x)dx − 2xµf (x)dx + µ2 f (x)dx
−∞ −∞ −∞
Z ∞ Z ∞ Z ∞
= x2 f (x)dx − 2µ xf (x)dx + µ2 f (x)dx
−∞ −∞ −∞
Z ∞ Z ∞
= x2 f (x)dx − 2µ × µ + µ2 × 1 = x2 f (x)dx − µ2 .
−∞ −∞
σ 2 = Var(X) = E(X 2 ) − µ2 = 2 − 12 = 1.
The most widely used model for the distribution of a random variable is a normal
distribution.
(x − µ)2
1 −
f (x) = √ e 2σ 2
2πσ
for −∞ < x < ∞ with parameters −∞ < µ < ∞ and σ > 0. The distribution of X
is called normal distribution. Also, E(X) = µ and Var(X) = σ 2 .
The notation N (µ, σ 2 ) is often used to denote a normal distribution with mean µ
and variance σ 2 .
f(x) σ2 = 1
σ2 = 1
σ2 = 9
!=5 ! = 15 x
Empirical Rules
Empirical rules is a rule of thumb that applies to data sets with distributions that
VARIABLES AND PROBABILITY DISTRIBUTIONS
are approximate normal, as follow
f(x)
a. About 68% of the observations will fall within one standard deviation of the mean.
om the symmetry of f (x), P(X ! ") # P(X $ ") # 0.5. Because f (x) is positive for all x,
b. About
del assigns some95% of thetoobservations
probability each interval ofwill
thefall
real within two standard
line. However, deviations of the mean.
the probability
function decreases
c. About 99.7%as x (essentially
moves fartherall) from theConsequently,
of ". observations thewill
probability that three
fall within a standard
ement falls far from " of
deviations is small,
the mean.and at some
P (µdistance
− 3σ <fromX< " theµ+probability of an interval
3σ) = 0.9973.
approximated as zero. The area under a normal pdf beyond 3% from the mean is quite
This factMAT2022
is convenient
(Principlesfor quick, rough sketches of a normal
of Statistics) Lecture 3probability density function. Spring 2025 35 / 112
3.8 Standard Normal Distribution
The function z 2
Z
1
Φ(z) = P (Z ≤ z) = √ e−u /2 du
−∞ 2π
is used to denote the cumulative distribution function of a standard normal random
variable.
P( Z ≤ 1.5) = Φ(1.5)
z 0.00 0.01 0.02 0.03
= shaded area
0 0.50000 0.50399 0.50398 0.51197
...
...
1.5 0.93319 0.93448 0.93574 0.93699
0 1.5 z
Standard normal table can be used to find the probability that a standard normal
random variable is observed below, above, or between values.
(5)
Examples:
1 P
–
=(Z1< 1.53)
1.26
2 P (Z > 1.53) 0 1.26 –4.6 –3.99 0
3 P (−1.2 < Z < 0.8)
(6)
All normal densities have the same essential shape. They can be made equivalent
with respect to areas under them by suitable rescaling of the horizontal axis. The
rescaled variable is denoted by Z.
The correspondence between the Z scale and the X scale can be expressed by
X −µ
Z= , Z ∼ N (0, 1).
σ
Creating a new random variable by this transformation is referred to as
standardizing. The random variable Z represents the distance of X from its mean in
terms of standard deviations. It is the key step in calculating a probability for an
arbitrary normal random variable.
Example 12
Suppose the current measurements in a strip of wire are assumed to follow a normal
distribution with a mean of 10 milliamperes and a variance of 4 milliamperes.
(c) Determine the value for which the probability that a current measurement is below
this value is 0.98. (This value is also called the 98th percentile.)
1 P (Z > z) = 1 − P (Z ≤ z)
4 P (|Z| > a) = P (Z > a) + P (Z < −a) = 2 × P (Z > a) = 2 × P (Z < −a) for a > 0
Let pα be the value of z such that the tail area lying to its left equals to α. i.e.,
P (Z < pα ) = α. Given the value of α, we are able to use the standard normal table
to find pα . The value pα is called the 100 × α percentile of a standard normal
distribution.
Examples:
1 p0.90 = 1.281552 from the standard normal table
2 p0.95 = 1.644854
3 p0.99 = 2.326348
4 p0.10 = −1.281552
176, 191, 214, 220, 205, 192, 201, 190, 183, 185.
3.30
j x( j) ( j ! 0.5)!10 zj
1 176 0.05 !1.64
1.65 2 183 0.15 !1.04
3 185 0.25 !0.67
zj 4 190 0.35 !0.39
0 5 191 0.45 !0.13
6 192 0.55 0.13
7 201 0.65 0.39
–1.65 8 205 0.75 0.67
9 214 0.85 1.04
10 220 0.95 1.64
–3.30
170 180 190 200 210 220
x ( j)
The zj are the hypothetical standardized normal score of the j th measurements if the
distribution is normal. The values of zj are given by Φ(zj ) = P (Z ≤ zj ) = j−0.5
n
.
2 Other Probability Plots
If the Probability
distribution of the measurements is closely aligned with the normal
plots are extremely useful and are often the first technique used when we need to
distribution,
determinethe
whichstandardized normal isscores
probability distribution likely toand theaactual
provide measurements
reasonable should
model for data. In
form ausing probability
straight line.plots, usually the distribution is chosen by subjective assessment of the
probability plot. More formal goodness-of-fit techniques can be used in conjunction with
probability plotting. We will describe a very simple goodness-of-fit test in Section 4-10.
ting a MAT2022 To illustrate
(Principles of Statistics)how probability plotting Lecture
can be3useful in determining the appropriate distri-
Spring 2025 43 / 112
3.10 Discrete Random Variables
f (x) = P (X = x), x = x1 , x2 , . . . .
Suppose a discrete random variable X taking values x1 , x2 , . . . has a pmf f (x). The
mean of X is X
µ = E(X) = xf (x).
x=x1 ,x2 ,...
Example 13
The possible values for X are {0, 1, 2, 3, 4}. Suppose that the probabilities are
Example 14
1 Flip a fair coin 10 times. Let X = the number of heads obtained.
2 A multiple choice test contains 10 questions, each with four choices, and you guess
at each question. Let X = the number of questions answered correctly.
3 Of all bits transmitted through a digital transmission channel, 10% are received in
error. Let X = the number of bits in error in the next 3 bits transmitted.
If we toss a coin four times, how many ways of getting only one success?
1 The number of ways of selecting one of the four trials to be success; the number of
combination of size 1 that can be constructed from four distinct objects, 41 . i.e.,
possible outcomes: SF F F, F SF F, F F SF, F F F S.
0.18 0.4
n p
0.15 n p
10 0.1
20 0.5 10 0.9
0.3
0.12
0.09 0.2
f(x)
f(x)
0.06
0.1
0.03
0 0
0 1 2 3 4 5 6 7 8 9 10 11 12 1314 15 16 17 1819 20 0 1 2 3 4 5 6 7 8 9 10
x x
(a) (b)
Example 15
Each sample of water has a 10% chance of containing high levels of organic solids.
Assume that the samples are independent with regard to the presence of the solids.
(a) Determine the probability that in the next 18 samples, exactly 2 contain high solids.
(b) Determine the probability that at least four samples contain high solids.
(d) Determine the probability that at least one samples contain high solids.
If the number of trials is one, X ∼ B(1, p) has a Bernoulli distribution. The pmf of
X is
f (0) = 1 − p, f (1) = p, f (x) = 0 for x ̸= 0 or 1
Pn
If X ∼ B(n, p), when X = i=1 Yi , where Yi ’s are independent Bernoulli random
variables.
Example 16
The probability that a student will make a mistake on any single question is 0.25. If there
are 8 questions, and the student’s answers are mutually independent of each other, then
what is the probability that he or she makes exactly two mistakes?
2
!
X 8
1 0.25k 0.758−k
k
k=0
2 0.689
3 0.311
6
!
X 8
4 0.25k 0.758−k
k
k=2
The number of cars that pass through a certain point on a road during a given
period of time
The number of spelling mistakes a secretary makes while typing a single page
The number of light bulbs that burn out in a certain amount of time
Figure 3-33 In a
Events occur at random
Poission process,
events occur at random x x x x
in an interval. Interval
Figure 3-33 In a
Events occur at random
Poission process,
events occur at random x x x x
in an interval. Interval
Poisson Distribution
Previous assumptions for a Poisson process imply that the subintervals can be
thought of as approximately independent Bernoulli trials with success probability
λ∆t/T and the number of trials equal to n = T /∆t.
The random variable X that equals the number of events in a Poisson process is a
Poisson random variable with parameter λ > 0, and the probability mass function of
X is
e−λ λx
f (x) = , x = 0, 1, . . . ,
x!
The mean and variance of X are
If a Poisson random variable represents the number of events in some interval, the
mean of the random variable must be the expected number of events in the same
length of interval.
Using army records, von Bortkiewicz (1898) noted the chance of Prussian
cavalryman being killed by the kick of a horse.
The records of ten army corps were examined over 20 years, giving a total of 200
observations of one corps for one year.
The total deaths from horse kicks were 122, and the average deaths per year per
corps was thus 122/200 = 0.61.
Let X be the number of deaths occurred from horse kick in one corps in a given
year. Then, X ∼ Poisson(λ) where λ = 0.61.
What is the probability that one death occurred from horse kick in one corps in a
given year?
Given that P (X = 1) = 0.3314, then over the 200 years observed, how many years
with one death should we expect to find?
For the entire set of Prussian data, let p be the pmf of the Poisson distribution
(frequency for a given number of deaths per year).
A be the actual number of years in which that many deaths were observed.
Deaths p E A
0 0.54335 108.67 109
1 0.33145 66.29 65
2 0.10110 20.22 22
3 0.02055 4.11 3
4 0.00315 0.63 1
5 0.00040 0.08 0
6 0.00005 0.01 0
Example 17
It has been observed that the average number of traffic accidents on the Hollywood
Freeway between 7 and 8 AM on Tuesday mornings is 1 per hour. What is the
chance that there will be 2 accidents on the Freeway, on some specified Tuesday
morning (per hour)?
Example 18
For the case of the thin copper wire, suppose that the number of flaws follows a Poisson
distribution with a mean of 2.3 flaws per millimeter.
Exponential Distribution
The random variable X that equals the distance between successive events of a
Poisson process with mean λ > 0 has an exponential distribution with parameter λ.
The pdf of X is
f (x) = λe−λx , x ≥ 0.
Exponential distribution is a continuous distribution. The mean and variance of X is
F (x) = 1 − e−λx , x ≥ 0.
For any value of λ, the exponential distribution is quite skewed. The formulas for
the mean and variance can be obtained by integration (by parts).
Example 19
In a large corporate computer network, user log-ons to the system can be modeled as a
Poisson process with a mean of 25 log-ons per hour.
(a) What is the probability that the first log-on occurs in less than 6 minutes?
(b) What is the probability that there are no log-ons in an interval of 6 minutes?
(c) Determine the interval of time such that the probability that no log-on occurs in the
interval is 0.9.
In previous example, the probability that there are no log-ons in a 6-minute interval
is 0.082 regardless of the starting time of the interval (our starting point for
observing the system does not matter). A Poisson process assumes that events occur
independently, with constant probability, throughout the interval of observation.
However, if there are high-use periods during the day, such as right after 8 AM,
followed by a period of low use, a Poisson process is not an appropriate model for
log-ons. In that case, modeling each of the high- and low-use periods by a separate
Poisson process might be reasonable.
The exponential distribution is often used in reliability studies as the model for the
time until failure of a device.
f (x) = n Cx px (1 − p)n−x , x = 0, 1, . . . , n.
This tells us how likely the values of X are distributed and it is important to note
that X is always between 0 and n inclusively. Moreover, the average number of
successes is given by np.
Now, suppose we let n goes to ∞ while keeping np as a constant. For instance, let
λ = np
λx e−λ λx
lim f (x) = 1 × × e−λ = ,
n→∞ x! x!
which is the pmf of Poisson distribution.
Conclusion: When n is large and p is small so that λ = np, the pmf of binomial
distribution is close to the pmf of Poisson distribution.
p.m.f. of B(10, 0.8) p.m.f. of B(20, 0.4) p.m.f. of B(40, 0.2) p.m.f. of Poisson(8)
0.30
0.30
0.30
0.30
0.20
0.20
0.20
0.20
0.10
0.10
0.10
0.10
0.00
0.00
0.00
0.00
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
If p is fixed when n increases, the Poisson approximation will no longer hold, instead,
a normal distribution may give better approximation. To see this, consider the
following binomial distributions.
0.30
0.30
0.30
0.20
0.20
0.20
0.20
0.10
0.10
0.10
0.10
0.00
0.00
0.00
0.00
0 2 4 6 8 10 0 2 4 6 8 10 0 5 10 15 20 0 5 10 15 20
Continuity Correction
If X ∼ B(n, p) and
Y ∼ N (np, np(1 − p)), then we
approximate
P (X ≤ x) ≈ P (Y ≤ x + 1/2),
P (X ≥ x) ≈ P (Y ≥ x − 1/2).
Example 20
In a digital communication channel, assume that the number of bits received in error
can be modeled by a binomial random variable, and assume that the probability that
a bit is received in error is 1 × 10−5 . If 16 million bits are transmitted, what is the
probability that more than 150 errors occur?
For a large sample, with np > 5 and n(1 − p) > 5, the normal approximation is
reasonable. However, if np or n(1 − p) is small, the binomial distribution is quite
skewed and the symmetric normal distribution is not a good approximation.
Recall that the Poisson distribution was developed as the limit of a binomial
distribution as the number of trials increased to infinity. Consequently, the normal
distribution can also be used to approximate probabilities of a Poisson random
variable. The approximation is good for
λ>5
Example 21
Assume that the number of contamination particles in a liter water sample follows a
Poisson distribution with a mean of 1000.
(a) If a sample is analyzed, what is the probability that fewer than or equal to 950
particles are found?
(b) What is the probability that more than 25 particles in 20 milliliters of water?
P (A ∩ B) = P (A)P (B).
Example 22
Consider an experiment of tossing two dice, let A be the event that the first die is a “1”
and B be the event that the sum is “7”.
Example 23
The diameter of a shaft in a storage drive is normally distributed with mean 0.2508
inch and standard deviation 0.0005 inch. The specifications on the shaft are
0.2500 ± 0.0015 inch. The probability that a diameter meets specifications was
determined to be 0.919. What is the probability that 10 diameters all meet
specifications, assuming that the diameters are independent?
C1 C2
0.9 0.95
olution. Let C1 and C2 denote the events that components 1 and 2 are functional, respectively. For the
ystem to operate, both components must be functional. The probaility that the system operates is
Note that the probability that the system operates is smaller than the probability that any compo-
ent operates. This system fails whenever any component fails. A system of this type is called a series
ystem. ■
MAT2022 (Principles of Statistics) Lecture 3 Spring 2025 81 / 112
Note that the probability that the system operates is smaller than the probability that any compo
3.16 Independence of Events and Random Variables
ent operates. This system fails whenever any component fails. A system of this type is called a series
ystem. ■
Example 25
he (Parallel System):
system shown The system
here operates onlyshown
if therehere operates
is a path only if there
of functional is a path
components of left
from functional
to right. The
components from left to right. The probability that each component functions
robability that each component functions is shown. Assume that the components function is or
shown.
fail inde
Assume
endently. thatisthe
What componentsthat
the probability function or fail
the system independently. What is the probability that
operates?
the system operates?
C1
0.9
C2
0.95
olution. Let C1 and C2 denote the events that components 1 and 2 are functional, respectively. Also
1# and C2# denote the events that components 1 and 2 fail, respectively, with associated probabilities
(C1#) " 1 $ 0.9 " 0.1 and P(C2#) " 1 $ 0.95 " 0.05. The system will operate if either component is
unctional. The probability that the system operates is 1 minus the probability that the system fails, and
his occurs whenever both independent components fail. Therefore, the requested probability is
Reliability
The probability that a component does not fail over the time of its mission is called
its reliability. Let ri denotes the reliability of component i in a system that consists
of k components and that r denotes the probability that the system does not fail
over the time of the mission. i.e., r can be called the system reliability.
The previous examples can be extended to obtain the following results for a series
system
r = r1 r2 · · · rk
r = 1 − (1 − r1 )(1 − r2 ) · · · (1 − rk ).
Y =X +c
It follows that
E(Y ) = E(X) + c = µ + c
Var(Y ) = Var(X) = σ 2
Y = cX
It follows that
E(Y ) = cE(X) = cµ
Var(Y ) = c2 Var(X) = c2 σ 2
The mean and variance of the linear function of independent random variables
Y = C0 + C1 X1 + C2 X2 + . . . + Cn Xn
are
Y = C0 + C1 X1 + C2 X2 + . . . + Cn Xn
Example 26
Suppose that the random variables X1 and X2 represent the length and width,
respectively, of a manufactured part. For X1 , suppose that we know that µ1 = 2
centimeters and σ1 = 0.1 centimeter and for X2 , we know that µ2 = 5 centimeters and
σ2 = 0.2 centimeter. Also, assume that X1 and X2 are independent.
(a) We wish to determine the mean and standard deviation of the perimeter
(Y = 2X1 + 2X2 ) of the part.
(b) Suppose that X1 and X2 are normally distributed. Determine the probability that
the perimeter of the part exceeds 14.5 centimeters.
We first have to discuss the concept of covariance. Covariance measures how strong
two random variables are linearly related.
If X1 and X2 are two random variables and their means are µ1 and µ2 respectively,
then the covariance between X1 and X2 is
- fX,Y (a, b) = fX (a) × fY (b) for all pairs (a, b) if and only if X and Y are independent.
Example 27
You are sending a binary message over a wireless network. Each bit sent has some
probability of being corrupted. Let S be a binary random variable representing a sent bit
and R be a binary random variable representing the corresponding received bit. The joint
pmf, P (S = a, R = b), is given by four probabilities.
S\R 0 1
0 0.45 0.08
1 0.06 0.41
The joint probability density function for X and Y , fX,Y (x, y), is defined as
Z Z
P [(x, y) ∈ A] = fX,Y (x, y)dxdy
A
Z b Z d
P (a ≤ X ≤ b, c ≤ Y ≤ d) = fX,Y (x, y)dydx
a c
Example 28
Let X and Y be two continuous random variables with joint probability density function,
(a) What is P (Y < X)? Let the set B in the xy-plane such that (x, y) ∈ [0, 1]2 and
y < x is B = {(x, y)|0 ≤ y < x ≤ 1}.
Example 29
Suppose X1 and X2 have the following joint pmf:
X1 \X2 2 4
1 0.1 0.2
2 0.3 0.3
3 0 0.1
Let X1 and X2 be random variables with means µ1 , µ2 and variances σ12 , σ22
respectively. The linear function
Y = C0 + C1 X1 + C2 X2
has mean
E(Y ) = C0 + C1 µ1 + C2 µ2
and variance
Var(Y ) = C12 σ12 + C22 σ22 + 2C1 C2 Cov(X1 , X2 )
Though we are only showing the expressions for linear functions of two random
variables, it could be easily generalized to multiple variables.
Can you compute the E(Y ) and Var(Y ) from the previous example?
Suppose that the random variable Y is a function of the random variable X, say,
Y = h(X)
then a general solution for the mean and variance of Y can be difficult. It depends
on the complexity of the function h(X).
2
(Propagation of Error Formula) If X has mean µX and variance σX , then the
approximate mean and variance of Y can be computed using the following results:
Example 30
Soft-drink cans are filled by an automated filling machine. The mean fill volume is 12.1
fluid ounces, and the standard deviation is 0.05 fluid ounce. Assume that the fill volumes
of the cans are independent, normal random variables. What is the probability that the
average volume of 10 cans selected from this process is less than 12 fluid ounces?
X̄ − µ
Z= √
σ/ n
as n → ∞, is the standard normal distribution.
1 2 3 4 5 6 x
Distributions of average scores from throw-
(a) One die ing dice. Notice that, while the distribution
of a single die is relatively far from normal,
the distribution of averages is approximated
1 2 3 4 5 6 x reasonably well by the normal distribution for
(b) Two dice sample sizes as small as 5.
1 2 3 4 5 6 x
(e) Ten dice
where Yi are independent Bernoulli random variables. Note that the mean and
variance of Yi are µ = p and σ 2 = p(1 − p), respectively.
The range of X is (0, ∞). Suppose that W ∼ N (θ, ω 2 ), then the cdf of X is
Let W have a normal distribution with mean θ and variance ω 2 , then X = exp(W )
is a lognormal random variable with probability density function
" #
1 (ln(x) − θ)2
f (x) = √ exp − , 0 < x < ∞.
xω 2π 2ω 2
The lifetime of a product that degrades over time is often modeled by a lognormal
random variable. e.g., this is a common distribution for the lifetime of a
semiconductor laser.
Figure below illustrates lognormal distributions for selected values of the parameters.
Example 31
(Lifetime of a Laser) The lifetime of a semiconductor laser has a lognormal distribution
with θ = 10 and ω = 1.5 hours.
Gamma Distribution
λr xr−1 e−λx
f (x) = , for x > 0
Γ(r)
is a gamma random variable with parameter λ > 0 and r > 0.
Weibull Distribution
The Weibull distribution is often used to model the time until failure of many
different physical systems.
is a Weibull random variable with scale parameter δ < 0 and shape parameter
β > 0.
#2
h
E(X) = δΓ(1 + 1/β) and Var(X) = δ 2 Γ(1 + 2/β) − δ 2 Γ(1 + 1/β) .
Example 32
(Lifetime of a Bearing) The time to failure (in hours) of a bearing in a mechanical shaft is
satisfactorily modeled as a Weibull random variable with β = 1/2 and δ = 5000 hours.
(b) Determine the probability that a bearing lasts at least 6000 hours
Beta Distribution
For probability models, a continuous distribution that is flexible, but bounded over a
finite range can be useful. e.g., the proportion of solar radiation absorbed by a
material and the proportion (of the maximum time) required to complete a task in a
project are examples of continuous random variables over the interval [0, 1].
αβ
E(X) = α/(α + β) and Var(X) = (α+β)2 (α+β+1)
.
Beta Distribution
A closed-form expression
for the cumulative
distribution function is not
available in general, and
probabilities for beta
random variables need to
be computed numerically.
Reference
Montgomery, D., Runger, G., and Hubele, N. Engineering Statistics (Fifth Edition).
Wiley. Chapter 3.