Probab Refresh
Probab Refresh
q
,
q
q ,
q
q
q
,
q
q q
q
,
q
q
q q
q
,
q q
q
q q
q
; a count of telephone call attempts has sample space 0, 1, 2, . . .. This
latter sample space is innite, but still countable. It contains some very large numbers (like
10
99
). While one may argue that the occurrence of such a large number of telephone call
attempts is absurd, we nd it convenient to include such outcomes. Later we can assign a
vanishingly small (or even zero) probability to the absurd outcomes.
1
Let X be a random variable with sample space o
X
. A probability mass function (pmf) for
X is a mapping
p
X
: o
X
[0, 1]
from o
X
to the closed unit interval [0, 1] satisfying
xS
p
X
(x) = 1.
The number p
X
(x) is the probability that the outcome of the given random experiment is x,
i.e.,
p
X
(x) := P[X = x].
Example. A Bernoulli random variable X has sample space o
X
= 0, 1. The pmf is
p
X
(0) = 1 p,
p
X
(1) = p
, 0 p 1.
The sum of N independent
1
Bernoulli random variables, Y =
N
i=1
X
i
has o
Y
= 0, 1, . . . , N.
The pmf for Y is
p
Y
(k) =
N
k
p
k
(1 p)
Nk
, k o
Y
.
This represents the probability of having exactly k heads in N independent coin tosses, where
P[heads] = p.
Some Notation:
To avoid excessive use of subscripts, we will identify the a random variable by the letter used
in the argument of its probability mass function, i.e., we will use the convention
p
X
(x) p(x)
p
Y
(y) p(y).
Strictly speaking this is ambiguous, since the same symbol p is used to identify two dierent
probability mass functions; however, no confusion should arise with this notation, and we
can always make use of subscripts to avoid ambiguity if necessary.
2 Vector Random Variables
Often the elements of the sample space o
X
of a random variable X are real numbers, in
which case X is a (real) scalar random variable. If the elements of o
X
are vectors of real
numbers, then X is a (real) vector random variable.
Suppose Z is a vector random variable with a sample space in which each element has has
two components (X, Y ), i.e.,
o
Z
= z
1
, z
2
, . . .
= (x
1
, y
1
), (x
2
, y
2
), . . ..
1
Independence is dened formally later.
2
The projection of o
Z
on its rst coordinate is
o
X
= x : for some y, (x, y) S
Z
.
Similarly, the projection of o
Z
on its second coordinate is
o
Y
= y : for some x, (x, y) S
Z
.
Example. If Z = (X, Y ) and o
Z
= (0, 0), (1, 0), (1, 1), then o
X
= o
Y
= 0, 1.
In general, if Z = (X, Y ), then
o
Z
o
X
o
Y
, (1)
where
o
X
o
Y
= (x, y) : x o
X
, y o
Y
yS
Y
p
X,Y
(x, y);
similarly,
p
Y
(y) p(y) =
xS
X
p
X,Y
(x, y).
These probability mass functions are usually referred to as the marginal pmfs associated
with vector random variable (X, Y ).
Some More Notation:
Again, to avoid the excessive use of subscripts, we will use the convention
p
X,Y
(x, y) p(x, y).
3
3 Events
An event A is a subset of the discrete sample space o. The probability of the event A is
P[A] = P[some outcome contained in A occurs]
=
xA
p(x).
In particular,
P[o] =
xS
p(x) = 1
P[] =
x
p(x) = 0,
where is the empty (or null) event.
Example. A fair coin is tossed N times, and A is the event that an even number of heads
occurs. Then
P[A] =
N
k=0
k even
P[exactly k heads occurs]
=
N
k=0
k even
N
k
(
1
2
)
k
(
1
2
)
Nk
= (
1
2
)
N
N
k=0
k even
N
k
=
2
N1
2
N
=
1
2
.
4 Conditional Probability
Let A and B be events, with P[A] > 0. The conditional probability of B, given that A
occurred, is
P[B[A] =
P[A B]
P[A]
.
Thus, P[A[A] = 1, and P[B[A] = 0 if A B = .
Also, if Z = (X, Y ) and p
X
(x
k
) > 0, then
p
Y |X
(y
j
[x
k
) = P[Y = y
j
[X = x
k
]
=
P[X = x
k
, Y = y
j
]
P[X = x
k
]
=
p
X,Y
(x
k
, y
j
)
p
X
(x
k
)
4
The random variables X and Y are independent if
(x, y) o
X,Y
(p
X,Y
(x, y) = p
X
(x)p
Y
(y)) .
If X and Y are independent, then
p
X|Y
(x[y) =
p
X,Y
(x, y)
p
Y
(y)
=
p
X
(x)p
Y
(y)
p
Y
(Y )
= p
X
(x),
and
p
Y |X
(y[x) =
p
X,Y
(x, y)
p
X
(x)
=
p
X
(x)p
Y
(y)
p
X
(X)
= p
Y
(y),
i.e., knowledge of X does not aect the statistics of Y , and vice versa. As we will see later
in the course, if X and Y are independent, then X provides no information about Y and
vice-versa.
More generally, n random variables X
1
, . . . , X
n
are independent if their joint probability
mass function factors as a product of marginals, i.e., if
p
X
1
,...,Xn
(x
1
, . . . , x
n
) =
n
i=1
p
X
i
(x
i
)
for all possible values x
1
, x
2
, . . . , x
n
. A collection X
1
, . . . , X
n
of random variables is said to
be i.i.d. (independent, identically distributed) if they are independent and if the marginal
pmfs are all the same, i.e., if p
X
i
= p
X
j
for i and j.
Still More Notation:
Again, well avoid subscripts, and use the notation
p
Y |X
(y[x) p(y[x).
In the simplied notation, p(y[x) = p(x, y)/p(x) and p(x[y) = p(x, y)/p(y). Similarly, in this
notation, if X
1
, . . . , X
n
is a collection of independent random variables, the joint probability
mass function p(x
1
, . . . , x
n
) factors as
p(x
1
, . . . , x
n
) =
n
i=1
p(x
i
).
5 Expected Value
If X is a random variable, the expected value (or mean) of X, denoted E[X], is
E[X] =
xS
X
xp
X
(x).
The expected value of the random variable g(X) is
E[g(X)] =
xS
X
g(x)p
X
(x).
5
In particular, E[X
n
], for n a positive integer, is the nth moment of X. Thus the expected
value of X is the rst moment of X. The variance of X, dened as the second moment of
XE[X], can be computed as VAR[X] = E[X
2
] E[X]
2
. The variance is a measure of the
spread of a random variable about its mean. Note that for any constant a, E[aX] = aE[X]
and VAR[aX] = a
2
VAR[X].
The correlation between two random variables X and Y is the expected value of their
product, i.e., E[XY ]. If E[XY ] = E[X]E[Y ], then X and Y are said to be uncorrelated.
Clearly if X and Y are independent, then they are uncorrelated, but the converse is not
necessarily true.
If X
1
, X
2
, . . . , X
n
is any sequence of random variables, then
E[X
1
+ X
2
+ . . . + X
n
] = E[X
1
] + E[X
2
] + + E[X
n
],
i.e., the expected value of a sum of random variables is the sum of their expected values. If,
in addition, X
1
, X
2
, . . . , X
n
are pairwise uncorrelated, then the additive property holds also
for the variance, i.e.,
VAR[X
1
+ X
2
+ + X
n
] = VAR[X
1
] + VAR[X
2
] + + VAR[X
n
].
6 The Markov and Chebyshev Inequalities
If X is a random-variable taking on non-negative values only and having expected value
E[X], then, for every value a > 0,
P[X a]
E[X]
a
,
a result known as Markovs Inequality. This result can be derived from the following chain
of inequalities. We have
E[X] =
x0
xp(x) =
0x<a
xp(x) +
xa
xp(x)
xa
xp(x)
xa
ap(x)
= aP[X a]
Now if X is any random variable, then Y = (X E[X])
2
is a random variable taking on
non-negative values only, and hence Markovs Inequality applies. Take a = k
2
for some
positive value k, we nd
P[Y k
2
] = P[(X E[X])
2
k
2
] = P[[X E[X][ k]
VAR[X]
k
2
,
a result known as Chebyshevs Inequality.
6
7 The Weak Law of Large Numbers
Let X
1
, X
2
, . . . , be an i.i.d. sequence of random variables with mean m and nite variance
2
. Suppose we observe the rst n of these variables. An estimator for the mean m is then
M
n
=
1
n
n
i=1
X
i
.
As the following theorem shows, if n is suciently large, then with high probability M
n
is
close to the mean m.
Theorem 1 (The Weak Law of Large Numbers) For all > 0 and all > 0 there
exists a positive integer n
0
such that for all n n
0
,
P[[M
n
m[ ] .
Proof: Note that M
n
is a random variable with mean m and variance
2
/n. It follows from
Chebyshevs Inequality that
P[[M
n
m[ ]
2
n
2
.
Take n
0
=
2
/(
2
)|. Then for every n n
0
, we have P[[M
n
m[ ] .
A more complicated argument would allow us to omit the requirement that the random
variables have nite variance.
We sometimes write that M
n
p
m (read M
n
converges in probability to m), meaning
that P[[M
n
m[ ] 0 as n .
References
[1] A. Leon-Garcia, Probability and Random Processes for Electrical Engineering, 2nd Edi-
tion. Don Mills, Ontario: Addison-Wesley, 1994.
[2] A. Papoulis, Probability, Random Variables, and Stochastic Processes, 2nd Edition.
Toronto: McGraw-Hill, 1984.
7