Discrete Color
Discrete Color
Lecture 1
Discrete random variables
1 We will not worry about measurability and similar subtleties in this class.
Example 1.2.4. A die is thrown and the number obtained is recorded and
denoted by X. The possible values of X are {1, 2, 3, 4, 5, 6} and each
happens with probability 1/6, so X is certainly S -valued. Since S is
finite, X is discrete.
One still needs to argue that S is the support S X of X. The alternative
would be that S X is a proper subset of S , i.e., that there are redundand
elements in S . This is not the case since all elements in S are “impor-
tant”, i.e., happen with positive probability. If we remove anything
from S , we are omitting a possible value for X.
On the other hand, it is certainly true that X always takes its values in
the finite set S 0 = {1, 2, 3, 4, 5, 6, 7}, i.e., that X is S 0 -valued. One has
to be careful with the terminology here: it is correct to say that X is
an S 0 -valued (or even N-valued) random variable, even though it only
takes the values 1, 2, . . . , 6 with positive probabilities.
Discrete random variables are very nice due to the following fact: in or-
der to be able to compute any conceivable probability involving a discrete
random variable X, it is enough to know how to compute the probabili-
ties P[ X = x ], for all x ∈ S . Indeed, if you are interested in figuring out
what P[ X ∈ B] is, for some set B ⊆ R (e.g., B = {5, 6, 7}, B = [3, 6], or
B = [−2, ∞)), we simply pick all x ∈ S X which are also in B and sum their
probabilities. In mathematical notation, we have
P[ X ∈ B ] = ∑ P[ X = x ]. (1.2.1)
x ∈S X ∩ B
p X ( x ) = P[ X = x ], x ∈ S X .
where the top row lists all the elements x of the support S X of X, and the
bottom row lists their probabilities p X ( x ) = P[ X = x ]. It is easy to see that
the function p X has the following properties:
1. p X ( x ) ∈ [0, 1] for all x, and
2. ∑ x∈SX p X ( x ) = 1.
Here is a first round of examples of discrete random variables and their
supports.
Example 1.2.6.
x H T
pX (x) 1/2 1/2
x 1 2 3 4 5 6
pX (x) 1/6 1/6 1/6 1/6 1/6 1/6
3. A fair coin is thrown repeatedly until the first H is observed; the number
of Ts observed before that is denoted by X. In this case we know that
X can take any of the values N0 = {0, 1, 2, . . . } and that there is
no finite upper bound for it. Nevertheless, we know that X cannot
take values that are not non-negative integers. Therefore, X is N0 -
valued and, in fact, S X = N0 is its support. Indeed, we have
P[ X = x ] = 2− x−1 , for x ∈ N0 , i.e.,
x 0 1 2 ...
X∼
pX (x) 1/2 1/4 1/8 ...
4. A card is drawn randomly form a standard deck, and the result is de-
noted by X. This example is similar to the 2., above, since X takes
one of finitely many values, and all values are equally likely. The
Example 1.3.1. Suppose that two dice are thrown so that X1 and X2 are
the numbers obtained (both X1 and X2 are discrete random variables
with S X1 = S X2 = {1, 2, 3, 4, 5, 6}). If we are interested in the probabil-
ity that their sum is at least 9, we proceed as follows. We define the
random variable W - the sum of X1 and X2 - by W = X1 + X2 . An-
other random variable, let us call it X, is a Bernoulli random variable
defined by (
1, W ≥ 9,
X=
0, W < 9.
With such a set-up, X signals whether the event of interest has hap-
pened, and we can state our original problem in terms of X, namely
“Compute P[ X = 1] !”.
This example is, admittedly, a little contrived. The point, however, is that
anything can be phrased in terms of random variables; thus, if you know how
to work with random variables, i.e., know how to compute their distributions,
you can solve any problem in probability that comes your way.
Another reason Bernoulli random variables are useful is the fact that we
can do arithmetic with them.
Example 1.3.2. 70 coins are tossed and their outcomes are denoted by
W1 , W2 , . . . , W70 . All Wi are random variables with values in { H, T }
(and therefore not Bernoulli random variables), but they can be easily
recoded into Bernoulli random variables as follows:
(
1, if Wi = H,
Xi =
0, if Wi = T.
M = X1 × X2 × · · · × X70
Example 1.4.1.
1. Bernoulli distribution. We have already encountered this distri-
bution in our discussion of indicator random variables above. It is
characterized by the distribution table of the form
0 1
, (1.4.1)
1− p p
where p can be any number in (0, 1). Strictly speaking, each value
of p defines a different distribution, so it would be more correct to
speak of a parametric family of distributions, with p ∈ (0, 1) being
the parameter.
In order not to write down the table (1.4.1) every time, we also use
the notation X ∼ B( p). For example, the Bernoulli random variable
which takes the value 1 when a fair coin falls H and 0 when it falls
T has a B(1/2)-distribution.
An experiment (random occurrence) which can end in two possible
ways (usually called success and failure, even though those names
should not always be taken literally) is often called a Bernoulli
trial. If we “encode” success as 1 and failure by 0, each Bernoulli
trial gives rise to a Bernoulli random variable.
2. Binomial distribution. A random variable whose distribution ta-
ble looks like this
0 1 ... ( n − 1) n
qn (n1 ) pqn−1 ... n
(n− 1) qp
n −1 pn
for some n ∈ N, p ∈ (0, 1) and q = 1 − p, is called the binomial
distribution, usually denoted by b(n, p). Remember that the bino-
mial coefficient (nk) is given by
n n!
= where n! = n(n − 1)(n − 2) · . . . · 2 · 1.
k k!(n − k)!
Binomial distribution(s) form a parametric family with two param-
eters n ∈ N and p ∈ (0, 1), and each pair (n, p) corresponds to a
different binomial distribution.
●
●
●
●
●
●
●
● ●
●
● ●
● ● ● ● ● ● ● ● ● y
0 1 2 n
0 1 2 3 ...
.
p qp q2 p q3 p ...
●
●
●
● ● ● ● ● ● ● y
0 1 2
0 1 2 3 4 ...
2 3 4 .
e−λ e−λ λ e−λ λ2 e−λ λ3! e−λ λ4! ...
● ●
●
●
●
●
●
● ●
● ● ● ● ● ● ● ● ● ● ● y
0 1 2
E[ X ] = ∑ x p X ( x ), (1.5.1)
x ∈S X
∑ | x | p X ( x ) < ∞. (1.5.2)
x ∈S X
When the sum in (1.5.2) above diverges (i.e., takes the value +∞), we
say that the expectation of X is not defined.
In order to define the standard deviation, we first need to define the vari-
ance. Like the expectation, the variance may or may not be defined (depend-
ing on whether the sums used to compute it converge absolutely or not).
Since we will be working only with distributions for which the existence of
expectation(s) is never a problem, we do not mention this issue in the sequel.
2 this should be taken with a grain of salt. After all, what exactly do we mean by a center or
a spread of a distribution?
Theorem 1.5.4. Suppose that X and Y are random variables and that α is a
constant. Then
1. Var[αX ] = α2 Var[ X ], and
2. if, additionally, X and X are independent, then
Caveat: These properties are not the same as the properties of the ex-
pectation. First of all the constant comes out of the variance with a
square, and second, the variance of the sum is the sum of the indi-
vidual variances only if additional assumptions, such as the indepen-
dence between the two variables, are imposed.
Example 1.5.6.
E[ X ] = 0 × q + 1 × p = p.
3 we will talk about independence in detail in the next lecture. An intuitive understanding
E[ X 2 ] = p × 0 + qE[(1 + X )2 ] = q + 2q E[ X ] + qE[ X 2 ]
= q + 2q2 /p + qE[ X 2 ],
√
which yields Var[ X ] = E[ X 2 ] − (E[ X ])2 = q/p2 and sd[ X ] = q/p.
and use Proposition 1.5.5. The sums can be evaluated explicitly, but
since the focus of these notes is not on evaluation of infinite sums,
so we skip the details.
1.6 Problems
Problem 1.6.1. Two people are picked at random from a group of 50 and
given $10 each. After that, independently of what happened before, three
people are picked from the same group - one or more people could have
been picked both times - and given $10 each. What is the probability that at
least one person received $20?
Problem 1.6.2. A die is rolled 5 times; let the obtained numbers be given by
Y1 , . . . , Y5 . Use counting to compute the probability that
1. all Y1 , . . . , Y5 are even?
2. at most 4 of Y1 , . . . , Y5 are odd?
3. the values of Y1 , . . . , Y5 are all different from each other?
3. Y − 5, where Y ∼ g( p) (geometric),
4. 2Y, where Y ∼ P(λ) (Poisson).
Problem 1.6.4. Let Y denote the number of tosses of a fair die until the first
6 is obtained (if we get a 6 on the first try, Y = 0). The support SY of Y is
(a) {0, 1, 2, 3, 4, . . . }
(b) {1, 2, 3, 4, 5, 6}
(c) { 16 , 16 , 16 , 61 , 16 }
5 2 5 3
(d) { 16 , 56 × 16 , × 16 , × 16 , . . . }
6 6
Problem 1.6.5. The probability that Janet makes a free throw is 0.6. What is
the probability that she will make at least 16 out of 23 (independent) throws?
Write down the answer as a sum - no need to evaluate it.
Problem 1.6.6. Three unbiased and independent coins are tossed. Let Y1 be
the total number of heads on the first two coins, and let Y be the random
variable which is equal to Y1 if the third coin comes up heads and −Y1 if it
comes up tails. Compute Var[Y ].
Problem 1.6.7. A die is thrown and a coin is tossed independently of it. Let
Y be the random variable which is equal to the number on the die in case the
coin comes up heads and twice the number on the die if it comes up tails.
1. What the support of SY of Y? What is its distribution (pmf)?
2. Compute E[Y ] and Var[Y ].
Problem 1.6.8. n people vote in a general election, with only two candidates
running. The vote of person i is denoted by Yi and it can take values 0 and 1,
depending which candidate they voted for (we encode one of them as 0 and
the other as 1). We assume that votes are independent of each other and that
each person votes for candidate 1 with probability p. If the total number of
votes for candidate 1 is denoted by Y, then
Compute the expectation and the variance of u(n). You may use the fol-
lowing identities: 1 + 2 + · · · + n = 12 n(n + 1) and 12 + 22 + · · · + n2 =
1
6 n ( n + 1)(2n + 1).
1. P[ X ≥ 3],
2. (*) E[ X 3 ]. Note: The sum you need to evaluate is quite difficult. If you don’t know
the trick, do not worry. If you know how to use symbolic-computation software such as
Mathematica, feel free to use it. We will learn how to do this using generating functions later
in the class.
1 2 3 ...
.
C 112 C 212 C 312 ...
E [Y ] = ∑ P [Y ≥ n ] .
n ∈N
Problem 1.6.16. (*) Bob and Alice alternate taking customer calls at a call
center, with Alice always taking the first call. The number of calls during a
day has a Poisson distribution with parameter λ > 0.
1. What is the probability that Bob will take the last call of the day (that
includes the case when there are 0 calls). (Hint: What is the Taylor series
for the function cosh( x ) = 12 (e x + e− x ) around x = 0?)
2. Who is more likely to take the last call? Alice or Bob? As above, if there
are no calls, we give the “last call” to Bob.
Problem 1.6.17 (*). A mail lady has l ∈ N letters in her bag when she starts
her shift and is scheduled to visit n ∈ N different households during her
round. If each letter is equally likely to be addressed to any one of the n
households, and the letters are delivered independently of each other, what
is the expected number of households that will receive at least one letter?
Note: It is quite possible that some households will receive more than 1 letter.