Chapter 4 Discrete Probability Distribution
Chapter 4 Discrete Probability Distribution
Chapter 4
Discrete Probability
Distribution
Objectives
After completing this chapter, you will be able to:
INTRODUCTION
A random variable can be either discrete or continuous. In this module, we cover
the first type, and the module Continuous probability distributions covers the second.
The idea of a random variable builds on the fundamental ideas of probability.
Students need to understand that random variables are conceptually different from the
mathematical variables that they have met before. A random variable is linked to
observations in the real world, where uncertainty is involved.
An informal — but important — understanding of a random variable is that it is a
variable whose numerical value is determined by the outcome of a random procedure. In
this module, we also see the more formal understanding, which is that a random variable
is a function from the event space of a random procedure to the set of real numbers.
Random variables are central to the use of probability in practice. They are used
to model random phenomena, which mean that they are relevant to a wide range of
human activity. In particular, they are used extensively in many branches of research,
including agriculture, biology, ecology, economics, medicine, meteorology, physics,
psychology and others. They provide a structure for making inferences about the world,
when it is impossible to measure things comprehensively. They are used to model
outcomes of processes that cannot be predicted in advance.
Random variables have distributions. In this module, we describe the essential
properties of distributions of discrete random variables. Distributions can have many
forms, but there are some special types of distributions that arise in many different
ENGINEERING DATA ANALYSIS | MODULE 2
practical contexts. In this module, we discuss two of these special distributions: discrete
uniform distributions and geometric distributions.
This module also covers the mean of a discrete random variable, which is a
measure of central location, and the variance and standard deviation, which are
measures of spread.
Random variables
A random variable is a variable whose value is determined by the outcome of
a random procedure. The concept of a random procedure was discussed in the
module Probability. What makes the variable random is that — unlike the kind of variable
we see in a quadratic equation — we cannot say what the observed value of the random
variable is until we actually carry out the random procedure.
a. A discrete random variable takes values confined to a range of separate
or `discrete' values. (More formally, a discrete random variable takes either
a finite number of values or a countably infinite number of values.)
b. A continuous random variable can take any value in an interval.
The probability that a discrete random variable X takes the value x is denoted Pr(X
= x). We read this as ‘the probability that X equals x’, which means the probability that X
takes the value x when we actually obtain an observation. For the die-rolling example, for
each real number x. Often, for discrete random variables, it is sufficient to specify in some
way the values with non-zero probability only; the values with zero probability are usually
clear, or clearly implied.
There are other simple random variables that can be defined for the random
procedure of rolling a die. For example:
• Let Y be the number of even numbers appearing. Then Y takes value 1 if a 2, 4 or
6 is rolled, and Y takes value 0 otherwise.
• Let Z be the number of prime numbers appearing. Here Z takes value 1 if a 2, 3 or
5 is rolled, and takes value 0 otherwise.
These two examples are not terribly interesting; but they illustrate the important
point that a single random procedure can accommodate several random variables.
Note that “Y = 1” and “Z = 0” are events, in that they define subsets of the event space E
. The event “Y = 1” is {2, 4, 6}. This is a crucial insight; it makes it feasible to obtain the
probability distribution of a random variable.
This leads to the definition of a random variable from a formal perspective.
Students accustomed to formal mathematical treatments of topics sometimes find the
description of a random variable given so far somewhat elusive. A random variable can
be defined formally in a way that strongly
relates to mathematical topics that
students have covered elsewhere,
specifically, functions.
A random variable is a numerical-
valued function that maps the event space
E to the set of real numbers. Students will
not be familiar with a ‘variable’ that is a function. It is an important conceptual point,
represented in the following diagram.
From this diagram, we see that the random variable must take exactly one value
for each element of the event space E . So each possible outcome in the event space
has a corresponding value for the random variable. As with functions generally, a number
of possible outcomes in E may have the same value of the random variable, and in
practice this occurs frequently.
Probability functions
To work out the probability that a discrete random variable X takes a particular
value x, we need to identify the event (the set of possible outcomes) that corresponds to
“X = x”. In general, the function used to describe the probability distribution of a discrete
random variable is called its probability function (abbreviated as pf). The probability
function of X is the function pX : R → [0, 1] given by
ENGINEERING DATA ANALYSIS | MODULE 4
Then X has a discrete uniform distribution. This is a distribution that arises often in
lotteries and games of chance. We have seen this distribution in the Powerball example
considered in the module Probability. In commercial lotteries, such as Powerball, it is a
regulatory requirement that each outcome is equally likely. There are 45 possible
Powerball numbers (1, 2,..., 45). So if X is the Powerball drawn on a particular occasion,
the pf pX (x) of X is given by
If this model for the drawing of Powerball numbers is correct, we should expect
that, over a large number of draws, the relative frequencies of the 45 possible numbers
are all approximately equal to 1 45 ≈ 0.0222. The following graph shows the relative
frequencies observed in the 853 draws from May 1996 to September 2012.
ENGINEERING DATA ANALYSIS | MODULE 5
Geometric distribution
We have already met one special distribution that is given a name (the discrete
uniform distribution). The distribution in the previous example also has a name; it is called
a geometric distribution. Suppose that a sequence of independent ‘trials’ occur, and at
each trial the probability of ‘success’ equals p. Define X to be the number of trials that
occur before the first success is observed. Then X has a geometric distribution with
parameter p. We introduce here a symbol used throughout the modules on probability
and statistics. If X has a geometric distribution with parameter p, we write X d = G(p). The
symbol d = stands for ‘has the distribution’, meaning the distribution indicated immediately
to the right of the symbol.
Note the use of the generic terms ‘trial’ and ‘success’. They are arbitrary, but they
carry with them the idea of each observation involving a kind of test (i.e., trial) in which
we ask the question: Which one of the two possibilities will be observed, a ‘success’ or a
‘failure’? In this sense, the words ‘success’ and ‘failure’ are just labels to keep track of the
two possibilities for each trial.
Note that X can take the value 0, if a success is observed at the very first trial. Or
it can take the value 1, if a failure is observed at the first trial and then a success at the
second trial. And so on. What is the largest value that X can take? There is no upper limit,
in theory. As the values of X increase, the probabilities become smaller and smaller.
Another important distribution arises in the context of a sequence of independent trials
that each have the same probability of success p. This is the binomial distribution. There
is an entire module devoted to it (Binomial distribution), so we do not consider it further
here.
1000 occurrences of each possible outcome. What would be the average value of the
outcomes obtained? Approximately, the average or mean would be
This can be thought of as the weighted average of the six possible values 1, 2,...,
6, with weights given by the relative frequencies. Note that 3.5 is not a value that we can
actually observe. By analogy with data and relative frequencies, we can define the mean
of a discrete random variable using probabilities from its distribution, as follows. The mean
µX of a discrete random variable X with probability function pX (x) is given by
where the sum is taken over all values x for which pX (x) > 0.
The mean can be regarded as a measure of ‘central location’ of a random variable.
It is the weighted average of the values that X can take, with weights provided by the
probability distribution.
The mean is also sometimes called the expected value or expectation of X and
denoted by E(X). These are both somewhat curious terms to use; it is important to
understand that they refer to the long-run average. The mean is the value that we expect
the long-run average to approach. It is not the value of X that we expect to observe.
Consider a random variable U that has the discrete uniform distribution with
possible values 1, 2,...,m. The mean is given by
Students have met the concepts of variance and standard deviation when
summarising data. These were the sample variance and the sample standard deviation.
The difference here is that we are referring to properties of the distribution of a random
variable.
where the sum is taken over all values of x for which pX (x) > 0.
So the variance of X is the weighted average of the squared deviations from the
mean µ, where the weights are given by the probability function pX (x) of X.
The standard deviation of X is defined to be the square root of the variance of X.
That is
In some ways, the standard deviation is the more tangible of the two measures,
since it is in the same units as X. For example, if X is a random variable measuring lengths
in metres, then the standard deviation is in metres (m), while the variance is in square
metres (𝑚2 ).
Unlike the mean, there is no simple direct interpretation of the variance or
standard deviation. The variance is analogous to the moment of inertia in physics, but
that is not necessarily widely understood by students. What is important to understand is
that, in relative terms:
• a small standard deviation (or variance) means that the distribution of the
random variable is narrowly concentrated around the mean
• a large standard deviation (or variance) means that the distribution is spread
out, with some chance of observing values at some distance from the mean.
Note that the variance cannot be negative, because it is an average of squared
quantities. This is appropriate, as a negative spread for a distribution does not make
sense. Hence, var(X) ≥ 0 and sd(X) ≥ 0 always.
ENGINEERING DATA ANALYSIS | MODULE 8
Video Links:
Discrete Probability Distribution
• https://www.khanacademy.org/math/ap-statistics/random-
variables-ap/discrete-random-variables/v/discrete-probability-
distribution
• https://courses.lumenlearning.com/introstats1/chapter/probabi
lity-distribution-function-pdf-for-a-discrete-random-variable/
• https://study.com/academy/lesson/discrete-probability-
distributions-equations-examples.html
• https://faculty.elgin.edu/dkernler/statistics/ch06/6-1.html
References
• https://amsi.org.au/ESA_Senior_Years/PDF/DiscreteProbabilit
y4c.pdf
• https://courses.lumenlearning.com/introstats1/chapter/probabi
lity-distribution-function-pdf-for-a-discrete-random-variable/
• https://www.coconino.edu/resources/files/pdfs/academics/sab
batical-reports/kate-kozak/chapter_5.pdf