Mathematical Expectation: Examples
Mathematical Expectation: Examples
If A occurs, then a profit will be 1 − 2 = −1 dollar, i.e., you will lose 1 dollar. If B occurs,
a profit will be 2 − 2 = 0. If C occurs, a profit will be 6 − 2 = 4 dollars. Therefore, we may
compute the average profit as follows:
That is, you can expect to make 1/6 dollar on the average every time you play this game.
This is the mathematical expectation of the payment.
We can define a random variable X which represents a profit, where X takes a value of −1,
0, and 4 with probabilities 1/2, 1/3, and 1/6, respectively. Namely, P (X = −1) = 1/2,
P (X = 0) = 1/3, and P (X = 4) = 1/6. Then this mathematical expectation is written as
X
E(X) = xP (X = x) = (−1) × (1/2) + 0 × (1/3) + 4 × (1/6) = 1/6.
x∈{−1,0,4}
• Roll a die twice. Let X be the number of times 4 comes up. X takes three possible values
0, 1, or 2. X = 0 when the event {1, 2, 3, 5, 6} occurs for both cases so that P (X = 0) =
(5/6) × (5/6) = 25/36. X = 1 either when the event {1, 2, 3, 5, 6} occurs for the first die and
the event {4} occurs for the second die or when the event {4} occurs for the first die and the
event {1, 2, 3, 5, 6} occurs for the second die so that P (X = 1) = (5/6)×(1/6)+(1/6)×(5/6) =
10/36. Finally, X = 2 when the event {4} for both dies so that P (X = 2) = (1/6) × (1/6) =
1/36. Note that P (X = 0) + P (X = 1) + P (X = 2) = 1. Therefore, the mathematical
expectation of X is
X
E(X) = xP (X = x) = 0 × (25/36) + 1 × (10/36) + 2 × (1/36) = 1/3.
x=0,1,2
• Toss a coin 3 times. Let X be the number of heads. There are 8 possible outcomes:
{T T T, T T H, T HT, T HH, HT T, HT H, HHT, HHH}, where H indicates “Head” and T in-
dicates “Tail” X takes four possible values 0, 1, 2, and 3 with probabilities P (X = 0) = 1/8,
P (X = 1) = 3/8, P (X = 2) = 3/8, and P (X = 3) = 1/8. Therefore, the mathematical
expectation of X is
X
E(X) = xP (X = x) = 0×(1/8)+1×(3/8)+2×(3/8)+3×(1/8) = (0+3+6+3)/8 = 12/8 = 3/2.
x=0,1,2,3
1
Properties of Mathematical Expectation
Let X be a random variable and suppose that the mathematical expectation of X, E(X), exists.
1. If a is a constant, then
E(a) = a.
2. If b is a constant, then
E(bX) = bE(X).
3. If a and b are constants, then
E(a + bX) = a + bE(X). (1)
Proof: Let X be a discrete random variable, where possible values for X is {x1 , . . . , xn } with
probability mass function of X given by
pX
i = P (X = xi ) , i = 1, . . . n.
For the proof of 1, we have
n
X
E(a) = apX
i
i=1
= (apX X X
1 + ap2 + ... + apn )
= a × (pX X X
1 + p2 + ... + pn )
Xn
=a pX
i
i=1
=a
Pn X
where the last equality holds because i=1 pi = 1.
For the proof of 2, we have
n
X
E(bX) = bxi pX
i
i=1
= (bx1 pX X X
1 + bx2 p2 + .... + bxn pn )
= b × (x1 pX X X
1 + x2 p2 + .... + xn pn )
Xn
=b xi p X
i
i=1
= bE(X).
For the proof of 3, we have
n
X
E(a + bX) = (a + bxi )pX
i
i=1
= (a + bx1 )pX X X
1 + (a + bx2 )p2 + .... + (a + bxn )pn
= (apX X X X X X
1 + ap2 + ... + apn ) + (bx1 p1 + bx2 p2 + .... + bxn pn )
= a × (pX X X X X X
1 + p2 + ... + pn ) + b × (x1 p1 + x2 p2 + .... + xn pn )
Xn Xn
=a pX
i + b xi p X
i
i=1 i=1
= a + bE(X).
2
Variance and Covariance
Let X and Y be two discrete random variables. The set of possible values for X is {x1 , . . . , xn };
and the set of possible values for Y is {y1 , . . . , ym }. The joint probability function is given by
pX,Y
ij = P (X = xi , Y = yj ) , i = 1, . . . n; j = 1, . . . , m.
1.
E[X + Y ] = E[X] + E[Y ]. (2)
Proof:
n X
m
(xi + yj )pX,Y
X
E(X + Y ) = ij
i=1 j=1
n X m
(xi pX,Y + yj pX,Y
X
= ij ij )
i=1 j=1
n X
m n X
m
xi pX,Y yj pX,Y
X X
= ij + ij (3)
i=1 j=1 i=1 j=1
n m m n
!
pX,Y pX,Y
X X X X
= xi · ij
+ yj · ij (4)
i=1 j=1 j=1 i=1
Pm
because we can take xi out of j=1 because xi does not depend on j’s
n
X m
X
= xi · pX
i + yj · pYj
i=1 j=1
Pm X,Y Pn X,Y
because pX
i = j=1 pij and pYj = i=1 pij
= E(X) + E(Y )
3
Pn Pm X,Y Pn Pm X,Y
Equation (4): To understand i=1 j=1 xi pij = i=1 xi ·( j=1 pij ), consider the case
of n = m = 2. Then,
2 X
2
xi pX,Y = x1 pX,Y X,Y X,Y X,Y
X
ij 11 + x1 p12 + x2 p21 + x2 p22
i=1 j=1
X,Y
· ( 2i=1 pX,Y
P2 P2 P2 P
Similarly, we may show that i=1 j=1 yj pij = j=1 yj ij ).
4
4. Cov (X, Y ) = Cov (Y, X) .
= b1 b2 Cov(X, Y ).
5
expand Cov (X, Y ) as follows.
Equation (6): This is similar to equation (4). Please consider the case of n = m = 2 and
convince yourself that (6) holds.
6
Then,
7
10. Let b be a constant. Show that E[(X − b)2 ] = E(X 2 ) − 2bE(X) + b2 . What is the value of b
that gives the minimum value of E[(X − b)2 ]?
Noting that E[X 2 ]−2bE(X)+b2 is a quadratic convex function of b, we may find the minimum
∂
by differentiating E[(X − b)2 ] with respect to b and set ∂b E[(X − b)2 ] = 0, i.e.,
∂
E[(X − b)2 ] = −2E(X) + 2b = 0,
∂b
and, therefore, setting the value of b equal to
b = E(X)
11. Let {xi : i = 1, . . . , n} and {yi : i = 1, . . . , n} be two sequences. Define the averages
n
1X
x̄ = xi ,
n
i=1
n
1X
ȳ = yi .
n
i=1
Pn
(a) i=1 (xi − x̄) = 0.
Proof:
n
X n
X n
X
(xi − x̄) = xi − x̄
i=1 i=1 i=1
Xn
= xi − nx̄
i=1
because ni=1 x̄ = x̄ + x̄ + ... + x̄ = nx̄
P
Pn
xi
= n i=1 − nx̄
n Pn
xi
because ni=1 xi = nn ni=1 xi = n i=1
P P
n
= nx̄ − nx̄
Pn
xi
because x̄ = i=1
n
= 0.
Pn
− x̄)2 =
Pn
(b) i=1 (xi i=1 xi (xi − x̄).
8
Proof: We use the result of 2.(a) above.
n
X n
X
2
(xi − x̄) = (xi − x̄) (xi − x̄)
i=1 i=1
Xn n
X
= xi (xi − x̄) − x̄ (xi − x̄)
i=1 i=1
Xn Xn
= xi (xi − x̄) − x̄ (xi − x̄)
i=1 i=1
n
X
because x̄ is constant and does not depend on i’s = xi (xi − x̄) − x̄ · 0
i=1
because ni=1 (xi − x̄) = 0. as shown above
P
n
X
= xi (xi − x̄) .
i=1
Pn Pn Pn
(c) i=1 (xi − x̄) (yi − ȳ) = i=1 yi (xi − x̄) = i=1 xi (yi − ȳ).
Also,
n
X n
X n
X
(xi − x̄) (yi − ȳ) = xi (yi − ȳ) − x̄ (yi − ȳ)
i=1 i=1 i=1
Xn Xn
= xi (yi − ȳ) − x̄ (yi − ȳ)
i=1 i=1
Xn
= xi (yi − ȳ) − x̄ · 0
i=1
Xn
= xi (yi − ȳ) .
i=1
9
function of Y given X as
Y |X P (X = xi , Y = yj ) pX,Y
ij
pij = P (Y = yj |X = xi ) = = X ,
P (X = xi ) pi
where pX,Y
ij = P (X = xi , Y = yj ) and pX
i = P (X = xi ).
The conditional mean of Y given X = xi is given by
m m
Y |X
X X
EY [Y |X = xi ] = yj P (Y = yj |X = xi ) = yj pij ,
j=1 j=1
where the symbol EY indicates that the expectation is taken treating Y as a random variable. The
conditional variance of Y given X = xi is given by
m
Y |X
X
V ar(Y |X = xi ) = E[(Y − E[Y |X = xi ])2 ] = (yj − E[Y |X = xi ])2 pij .
j=1
The conditional mean of Y given X can be written as EY [Y |X] without specifying a value of X.
Then, EY [Y |X] is a random variable because the value of EY [Y |X] depends on a realization of X.
The following shows that the unconditional mean of Y is equal to the expected value of EY [Y |X]
where the expectation is taken with respect to X.
Pm Y |X
Proof: Because EY [Y |X = xi ] = j=1 yj pij , we have
n
X
EX [EY [Y |X]] = EY [Y |X = xi ]pX
i
i=1
n X m
Y |X
X
= ( yj pij )pX
i
i=1 j=1
n X
X m
pX,Y
ij
= yj pX
i
i=1 j=1
pX
i
n X m
yj pX,Y
X
= ij
i=1 j=1
m n
pX,Y
X X
= yj ij
j=1 i=1
Xm
= yj pYj = EY [Y ].
j=1
2. Let g(Y ) be some known function of Y . Show that EY [g(Y )] = EX [EY [g(Y )|X]].
10
Proof:
n
X
EX [EY [g(Y )|X]] = EY [g(Y )|X = xi ]pX
i
i=1
n X m
Y |X
X
= ( g(yj )pij )pX
i
i=1 j=1
n X
X m
pX,Y
ij
= g(yj ) pX
i
i=1 j=1
pX
i
n X m
g(yj )pX,Y
X
= ij
i=1 j=1
m n
pX,Y
X X
= g(yj ) ij
j=1 i=1
Xm
= g(yj )pYj = EY [g(Y )].
j=1
3. Let g(Y ) and h(X) be some known functions of Y and X, respectively. Show that E[g(Y )h(X)] =
EX [h(X)EY [g(Y )|X]].
Proof:
n
X
EX [h(X)EY [g(Y )|X]] = h(xi )EY [g(Y )|X = xi ]pX
i
i=1
n m
Y |X
X X
= h(xi )( g(yj )pij )pX
i
i=1 j=1
n
X m
X pX,Y
ij
= h(xi ) g(yj ) pX
i
i=1 j=1
pX
i
n X
m
g(yj )h(xi )pX,Y
X
= ij
i=1 j=1
= E[g(Y )h(X)]
Proof:
Alternative Proof (Please compare this proof with the above proof ): Let EX (X) =
11
1 X 1 Y Y |X
n xi p i and EY (Y ) = m yj pj . Define pji = Pr(Y = yj |X = xi ).
12