MZB127 Topic 7 Lecture Notes (Unannotated Version)
MZB127 Topic 7 Lecture Notes (Unannotated Version)
Preface
Over the next few weeks, we will consider some core concepts and techniques from the field of
probability and statistics. These concepts and techniques are useful in a wide range of contexts:
not only in engineering and other technical fields, but also in most other areas of research and
policy development.
1
2 Chapter 7. Events, Conditional Probabilities, Probability Distributions
“fine”, “snowing” and so on. Each of these possible outcomes is a separate event; but “weather”
itself is not an event for probability purposes.
In some cases, the event(s) of interest may be defined in terms of a numerical variable: for
example, the event that an AFL team scores more than 100 points in a certain match, or the
event that the crowd at the match is smaller than 10,000 people. However, the variable is also
not an event: the event is defined by a specific value or range of values for the variable, so the
variable is only part of the definition of the event.
When events are defined in words, it can be quite time-consuming to write the full definition of
the event every time we wish to refer to it. Instead, we typically define a label that we can use
to refer to that event and then use the label from then on. For example, we might say:
For events defined by values of a numerical variable, we usually start by defining and labelling
the variable. We may then label the event as well, or we may just use an expression involving
the variable. For example:
Let S be the Brisbane Lions’ score in their next match [defining a variable, not an event]
Let H be the event that this score is greater than 100 points [defining an event]
{or: Let H be the event that S > 100} [again, defining an event]
Alternatively, instead of labelling this event we could just use the expression “S > 100” to
describe (and define) it. Either way, it is important to note the distinction between the variable
(S) and the event (S > 100).
The full set of possible events in a given context is called the sample space.
(a) The event that A and B both happen is denoted as A ∩ B [“A intersection B”] or just as
AB for short. The logical expression A ∧ B [“A and B”] is also sometimes used.
(b) The event that (at least) one of A or B happens, that is, that either or both of A or B
happens, is denoted as A ∪ B [“A union B”]. The logical expression A ∨ B [“A or B”] is
also sometimes used.
(c) The event that A does not happen is denoted as Ā [“A-bar”] or AC [“A complement”].
The logical expressions ¬A or ∼ A [“not A”] are also sometimes used.
7.1. Probabilities and Events 3
Examples
7.1.2.1 Example – exploring events
As above, let R be the event that it rains tomorrow, B be the event that the Broncos win
their next match and C be the event that my car breaks down in the coming week.
What do the following events mean?
(a) RB
(b) R ∪ C
(c) C
2. It can take values between 0 (“no chance”) and 1 (100% likely = guaranteed to happen).
3. If we list every possible event in a given context, the probability that (at least) one of
them happens is 100%. (Otherwise, we haven’t listed all the options!)
4. Redefining an event to include extra alternatives cannot decrease its probability. (In fact,
it will usually increase the probability.)
5. Conversely, redefining an event by imposing extra restrictions cannot increase its probability
(and will usually decrease the probability).
6. If two events don’t overlap, the probability that one of them happens is just the sum of
their individual probabilities.
7. If two events do overlap, we have to take the probability of the overlap into account when
finding the probability that (at least) one of the events occurs.
8. The probability that an event doesn’t happen is 1 minus the probability that it does
happen. [This follows from Properties 3 and 6.]
9. Events that don’t overlap (as in Property 6) are known as mutually exclusive (or disjoint)
events. These events cannot occur at the same time, so the probability of them both
happening at the same time is zero.
It may be helpful to see some of these properties expressed in symbols, as follows. In these
expressions A, B and E are events and Pr(E) denotes the probability of an event E:
1. We can make some assumptions and build up the probabilities we want using mathematical
arguments.
We next go through examples that estimate probabilities from each of these two approaches.
Examples
7.2.1.1 Example – rolling one six-sided die
Consider that you are playing a game that requires a six-sided die to be rolled only once.
Examples
7.2.2.1 Example – counting cars outside QUT
Consider that you have been standing outside QUT and writing down how many cars
travel past, how many of these cars are red, and how many of these cars are electric
vehicles.
You observe 100 cars, of which 25 are red, 10 are electric, and 1 is both red and electric.
(b) Using your data, estimate the probability that, in the future, a car passing QUT
will be red.
(c) Using your data, estimate the probability that, in the future, a car passing QUT
will be not an electric vehicle.
(d) Using your data, estimate the probability that, in the future, a car passing QUT
will be red or electric.
Based on this sample, we can estimate the theoretical probability of each of these outcomes
(for this type of circuit board) using the corresponding proportions from the sample:
[Note: This may not be a completely realistic estimate of probabilities for this situation.
(Why not?)]
When we talk of two events being independent, we mean that these events don’t affect or
influence one another. In terms of probabilities, this means that the likelihood of one event
occurring is completely unaffected by whether the other event occurs.
If two events A and B are independent, then
The reverse of this is also true; that is, if Pr(AB) = Pr(A) × Pr(B) for two events A and B,
then A and B are independent. In fact, this condition is often taken to be the mathematical
definition of independent events. We can also define independence for more than two events in
a similar fashion:
Note that independence of events is a completely different notion from events being mutually
exclusive. In fact, if events A and B are mutually exclusive and you know that B has occurred,
it drastically affects the probability of A occurring (it makes the probability of A become zero!)
so A and B can’t be independent.
7.3. Independence and Conditional Probabilities 9
Examples
7.3.1.1 Example – fair vs biased dice
Consider the case that two fair six-sided dice are thrown, and let A be the event that we
get a 3 when we throw the first die and B be the event that we get a 3 when we throw
the second die. Since the throws should not affect one another, it is reasonable to assume
that their outcomes are independent, in which case the probability of getting 3s on both
dice is Pr(AB) = Pr(A) Pr(B) = 1/6 × 1/6 = 1/36.
However, now consider the case where two dice are thrown, but instead Pr(result = 3) =
1/5 for both dice. Since Pr(A) = Pr(B) = 1/5, it follows that Pr(AB) = 1/5 × 1/5 = 1/25.
Notice that this result doesn’t depend on the outcomes for either throw being equally
likely, but only on the fact that the outcomes from the two throws are independent.
Note also that the probabilities of events A and B don’t have to be equal for this principle
to apply: for example, if we have Pr(A) = 1/6 and Pr(B) = 1/5, then we will obtain
Pr(AB) = 1/6 × 1/5 = 1/30.
Let A, B and C be the events that the first, second and third components fail
(respectively) and let S be the event that the signal reaches its destination. Note
that S is the event that all components are working; in other words, S is the event
ĀB̄ C̄.
A system such as this communications line is said to be a series system, and can be
represented by a diagram such as Figure 7.1. Another common type of system is
the parallel system represented in Figure 7.2: in this type of system, the system
works as long as any one of the components works. Failures of components in these
systems are often independent of one another as well, allowing these systems to be
analysed straightforwardly also.
Let A, B and C be the events that the first, second and third switches fail
(respectively) and let S be the event that the system operates correctly. In this
case, the system can only fail if all switches fail, so we can see that S̄ is the event
ABC and S is the complement of this.
they give the probability of R occurring under the condition that some other event occurs. If
we want to indicate the probability of R when no conditions apply, we just use the expression
Pr(R), as before.
Once we have clarified our notation, we can start to consider how to calculate a conditional
probability, say Pr(A | B). The essential notion here is that the condition reduces the scope of
the context (sample space) we are working in. Thinking in terms of counting arguments, we
now only need to consider the possible outcomes in event B; it also means that we only need to
consider possible outcomes for event A that are also included in event B. In other words, the
only outcomes of A that we are interested in here are those in A ∩ B. From these considerations,
it follows that
Pr(A ∩ B) Pr(AB)
Pr(A | B) = = .
Pr(B) Pr(B)
This argument is also illustrated in Figure 7.3.
To identify the presence of a condition in “word” problems, look for words such as “if . . . ”,
“when . . . ” and so on. In some situations, it may be less obvious that we are using conditional
probabilities, so you may need to think carefully.
Figure 7.3: Graphical representation of conditional probability. Event A occurs in the left
regions, and event A (i.e. the event that A doesn’t happen) occurs in the right regions. Event
B occurs in the upper (shaded) regions, and event B (i.e. the event that B doesn’t happen)
occurs in the lower (unshaded) regions. The overall rectangle is the sample space.
Figure 7.3 is very useful, because we can extract many intuitive formulas from this figure by
considering the conditional probability space visually. Below are some formulas that we can
draw out from Figure 7.3 (although you may be able to find others as well!).
Pr(A) + Pr A = 1
Pr(A|B) + Pr A|B = 1
Pr A B + Pr(A ∪ B) = 1
Pr AB + Pr(AB) = Pr(A)
How does independence relate to conditional probability?
If two events (A and B, say) are independent, then the occurrence of B has no effect on the
probability of A occurring; in other words, we are saying that Pr(A | B) = Pr(A). But our
conditional probability formula then tells us that Pr(AB) = Pr(A | B)×Pr(B) = Pr(A) Pr(B) in
this situation, which is exactly the definition we gave previously for two events to be independent.
Thus we see that independence can be treated as a special case of conditional probability.
12 Chapter 7. Events, Conditional Probabilities, Probability Distributions
Examples
7.3.2.1 Example – milk and sugar
Suppose 55% of people like milk in their coffee, 35% like sugar and 25% like neither.
(a) What proportion of people like both milk and sugar? Define M to be the event that
a randomly chosen person likes milk in their coffee and S to be the event that they
like sugar in their coffee.
7.3. Independence and Conditional Probabilities 13
(d) Why are these different? When would they be the same?
(b) If the second card is a diamond, what is the probability that the first card was also
a diamond?
7.3. Independence and Conditional Probabilities 15
(c) Now consider an ordinary pack of playing cards from which the Jokers, red Jacks,
red Queens and red Kings have been removed, so that there are 46 cards, of which
20 cards are red, 4 cards are aces, and 2 cards are red aces. Suppose that someone
draws a card at random from this pack. If the person drawing the card tells us that
the card is red, what is the probability that it is also an ace?
Let A be the event that the card is an ace and R be the event that the card is red.
There are 20 red cards (outcomes in event R), of which 2 are aces (outcomes in RA),
so we can calculating the required probability directly as Pr(A | R) = 2/20 = 0.1.
To verify the formula given above, note that there are 46 cards in the pack, so Pr(R) =
20/46, Pr(AR) = 2/46 and Pr(A | R) = Pr(AR)/ Pr(R) = (2/46)/(20/46) = 2/20 =
0.1, as before.
Note also that Pr(A | R) is not the same as the unconditional probability Pr(A) =
4/46 ≈ 0.087 in this situation.
As always, we start by defining events. Let F be the event that a randomly chosen item
passes the first inspection and S be the event that an item passes the second inspection.
Before proceeding further, we should note that the probability we have been given for
passing the second inspection is almost certainly based only on items that have already
passed the first inspection. (Why would anyone bother to check an item that has already
failed the first inspection?) So the information we have been given is actually: Pr(F ) = 0.9
and Pr(S | F ) = 0.8.
Once we know this, we can complete the calculation: Pr(F S) = Pr(S | F ) Pr(F ) =
0.8 × 0.9 = 0.72.
Pr(A ∩ B) Pr(AB)
Pr(A | B) = = ,
Pr(B) Pr(B)
Similarly,
Because the event AB is the same as the event BA, combining the above equations yields
which is a famous theorem in statistics known as Bayes’ Theorem. We won’t dwell on this
theorem too much in the present course, but you may see it in later statistics courses; this
theorem is a direct consequence of conditional probability rules.
Examples
7.3.3.1 Example – testing for chemical residues
A test for chemical residues is found to indicate the presence of a certain compound 99%
of the time when the compound is present but also says it is present 5% of the times when
it is not. The compound is actually present in 20% of the samples that are tested.
(a) In what proportion of inspections does the test indicate that the compound is
present?
7.3. Independence and Conditional Probabilities 17
(b) If the test indicates that the compound is present, what is the probability that it
actually is present?
We also note that E, S and O are mutually exclusive events, and one of these events
will definitely occur, so Pr(E) + Pr(S) + Pr(O) = 1. Therefore
(b) What proportion of the students in first year maths classes are studying Science?
Again, we just apply the formula for conditional probabilities. Using the same
notation as before, Pr(S | M ) = Pr(SM )/ Pr(M ) = 0.18/0.5 = 0.36.
observe that 450 out of 500 bananas are yellow (the rest are green); 120 out of 300 plums
are yellow, 160 are black and the rest are reddish in colour; and 300 out of 1200 apples
are yellow, with 700 being red and 200 green. Based on the training sample, what is the
probability that a yellow fruit is a banana?
First, define events as follows: let B, P and A be the events that a piece of fruit is a
banana, a plum or an apple and let Y be the event that it is yellow. To illustrate the
classification approach, we will work with the following sample probabilities:
450 120 300
Pr(Y | B) = = 0.9, Pr(Y | P ) = = 0.4, Pr(Y | A) = = 0.25,
500 300 1200
500 300 1200
Pr(B) = = 0.25, Pr(P ) = = 0.15, Pr(A) = = 0.6.
2000 2000 2000
Then Pr(Y B) = 0.9 × 0.25 = 0.225, Pr(Y P ) = 0.4 × 0.15 = 0.06, Pr(Y A) = 0.25 × 0.6 =
0.15 and Pr(Y ) = 0.225 + 0.06 + 0.15 = 0.435, so that Pr(B | Y ) = 0.225/0.435 ≈ 0.517,
Pr(P | Y ) = 0.06/0.435 ≈ 0.138, Pr(A | Y ) = 0.15/0.435 ≈ 0.345.
From this result, we conclude that a yellow fruit in this region is most likely to be a
banana and least likely to be a plum.
Examples
7.4.1.1 Example – circuit board faults revisited
A random sample of circuit boards made on a production line is subjected to an inspection
in which the number of faults on each board in recorded (See Example 7.2.2.3). The
following estimates are obtained for the probabilities of each possible number of faults.
We can think of the number of faults as a variable, with possible events described by
7.4. Probability Distributions 19
values of this variable. The list (table) of probabilities gives us a probability distribution
for the relevant variable. A variable which has a probability distribution associated with
it in this way is known as a random variable.
Once we have such a set of probabilities for individual events, we can find the probabilities
of combined events in the same way that we discussed last week. In fact, because the
events in our list are mutually exclusive (disjoint) events, this makes estimation of
combined probabilities relatively straightforward. For example, if we wish to find the
(estimated) probability that a randomly chosen circuit board has no more than two faults,
we proceed as follows:
For clarity, we define N as the number of faults on a circuit board. Based on the table
from the previous page, we obtain
Paper Toner
Type of fault: Electrical Software Electronics
transport system
Probability: 0.11 0.09 0.37 0.25 0.18
In this example, the type of fault can be considered as a (non-numerical) random variable
whose possible values are actually descriptions (categories). Again, the list (table)
of probabilities gives the probability distribution for the random variable. Estimate
the probability that the next fault on the photocopier involves either the electrical or
electronics components.
Let E1 be the event that a fault on the copier is electrical and E2 be the event that a
fault is in the electronics. From the table above, we have
From the properties of probability that we discussed in Section 7.1.3, it follows that a probability
mass function (pmf) must satisfy two properties:
The probability mass function (pmf) can also be represented using a bar chart, as in Figure 7.4.
Representing the probabilities as a probability mass function is particularly useful to us when
our variable takes a numerical value.
Figure 7.4: Probability function for a discrete random variable represented using a bar chart.
Examples
n 0 1 3 5
Pr(N = n) 0.35 0.1 0.25 0.3
We have seen that it is possible to associate probabilities with values of discrete numerical
variables; is this also possible for continuous variables? If so, how do we do it? The answer
is that we can associate probabilities with continuous variables; however, because continuous
variables take values in intervals, we associate the probabilities with intervals of values rather
than with individual values. For example, instead of saying that a person’s height is 1.75 metres,
we might talk instead about the person’s height being between 1.745 metres and 1.755 metres.
Ultimately, continuous values can only be measured to some specified degree of accuracy, so the
measured value always carries an implied interval around it.
The probability distribution for a continuous random variable X can be represented formally by
using a probability density function (pdf) f (x) such that
Z b
Pr(a < X < b) = f (x)dx,
a
that is, the probability of an observed value of X lying in a specified interval is given by the
integral of f (x) over that interval. In this case, the properties of probability that we discussed
in Section 7.1.3 require that a probability density function (pdf) must satisfy two properties:
The way in which probabilities are found from a density function can be represented visually
by a graph of this function, as in Figure 7.5. Conceptually, the graph of a pdf is similar to
what would happen to the bar chart of a discrete probability function if the variable had a large
number of values. (See Figure 7.6 for an illustration of this.)
Figure 7.5: The probability Pr(4 < X < 9) is given by the shaded area under the pdf f (x).
An alternative method of finding probabilities for continuous variables is to use the cumulative
distribution function (cdf), which is defined as
Z x
F (x) = f (u)du = Pr(X ≤ x).
−∞
Note here that we use u in the integrand f (u) rather than x for notational reasons only: x is
not allowed to appear in the integral limits and integrand simultaneously.
By the properties of integrals, we then have
Z b Z a
Pr(a < X < b) = f (x)dx − f (x)dx = F (b) − F (a).
−∞ −∞
7.4. Probability Distributions 23
Examples
(c) Find the cumulative distribution function F (x) for this distribution.
7.4. Probability Distributions 25
We take “at random” here to mean that every sub-interval of a given length of time
should be equally likely; that is, any 1-minute interval should have a probability of
0.1, every 3-minute interval should have a probability of 0.3, and any time interval
of length L should have a probability of L/10. In particular, we want to have
RL
Pr(0 < X < L) = 0 f (x)dx = L/10, where f (x) is the density function for the
distribution of X.
Using our knowledge of differentiation and integration, the pdf which satisfies this
requirement is
Rx
Alternatively, using the cdf, we have F (x) = 0 0.1du = [0.1u]x0 = 0.1x and hence
Pr(0 < X < L) = F (L) − F (0) = L/10 − 0 = L/10, as before.
Figure 7.7: Probability density function (pdf) of the uniform distribution on the interval
(0, 10). The probability Pr(1.5 < X < 4.5) is given by the area of the shaded region.
26 Chapter 7. Events, Conditional Probabilities, Probability Distributions
(b) Suppose instead now that the pdf of X is given by f (x) = 1/5 − x/50, where again
0 < x < 10. What is the probability that the customer arrives between 9:05am and
9:10am in this scenario?
To obtain this probability, we simply integrate over the relevant interval, as follows:
Z 10
Pr(5 < X < 10) = f (x)dx
5
Z 10
1 x
= − dx
5 5 50
10
x2
1
= x−
5 100 5
1
= (2 − 1) − 1 −
4
= 0.25.
This result is illustrated in Figure 7.8. Unlike the previous example, in this case the
probability of X lying in a given interval is not proportional to the length of the
interval.
Figure 7.8: Probability density function (pdf) f (x) = 1/5 − x/50, where 0 < x < 10. The
probability Pr(5 < X < 10) is given by the area of the shaded region.
and Pr(5 < X < 10) = F (10) − F (5) = (2 − 1) − (1 − 0.25) = 1 − 0.75 = 0.25 as
before.