0% found this document useful (0 votes)
23 views

Selective Review - Probability

The document discusses several probability concepts: 1) The law of total probability describes the probability of an event A in terms of the probabilities of A conditioned on mutually exclusive and exhaustive events E1, E2, ..., Ek that partition the sample space. 2) Bayes' rule follows from the law of total probability and the definition of conditional probability. It relates the conditional probabilities P(A|B) and P(B|A). 3) Two events A and B are independent if and only if their joint probability equals the product of their individual probabilities. 4) The Cauchy-Schwarz inequality states that for any random variables X and Y, including complex-valued ones, the expectation

Uploaded by

Nouman Khan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

Selective Review - Probability

The document discusses several probability concepts: 1) The law of total probability describes the probability of an event A in terms of the probabilities of A conditioned on mutually exclusive and exhaustive events E1, E2, ..., Ek that partition the sample space. 2) Bayes' rule follows from the law of total probability and the definition of conditional probability. It relates the conditional probabilities P(A|B) and P(B|A). 3) Two events A and B are independent if and only if their joint probability equals the product of their individual probabilities. 4) The Cauchy-Schwarz inequality states that for any random variables X and Y, including complex-valued ones, the expectation

Uploaded by

Nouman Khan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Law of Total Probability, Bayes' Rule and Independence

Events 𝑬𝟏 , 𝑬𝟐 , … , 𝑬𝒌 are said to form a partition of sample space 𝛀 if the events are mutually exclusive
and 𝛀 = 𝑬𝟏 ∪ 𝑬𝟐 ∪ … ∪ 𝑬𝒌 . For a partition, ℙ 𝑬𝟏 + ℙ 𝑬𝟐 + ⋯ + ℙ 𝑬𝒌 = 1. The law of total
probability indicates that for any event 𝑨:

ℙ 𝑨 = ℙ 𝑨 ∩ 𝑬𝟏 + ℙ 𝑨 ∩ 𝑬𝟐 + ⋯ + ℙ 𝑨 ∩ 𝑬𝒌 = ℙ 𝑨 ∩ 𝑬𝒎
𝒎=𝟏

For events 𝑨 and 𝑩: ℙ 𝑨 = ℙ 𝑨 ∩ 𝑩 + ℙ 𝑨 ∩ 𝑩c and ℙ 𝑩 = ℙ 𝑨 ∩ 𝑩 + ℙ 𝑨c ∩ 𝑩

If ℙ 𝑬𝒎 ≠ 𝟎 for each 𝒎, and exploiting the definition of conditional probability

ℙ 𝑨∩𝑬𝒎
ℙ 𝑨|𝑬𝒎 = , the law of total probability can be written as:
ℙ 𝑬𝒎

ℙ 𝑨 = ℙ 𝑨|𝑬𝟏 ℙ 𝑬𝟏 + ℙ 𝑨|𝑬𝟐 ℙ 𝑬𝟐 + ⋯ + ℙ 𝑨|𝑬𝒌 ℙ 𝑬𝒌 = ℙ 𝑨|𝑬𝒎 ℙ 𝑬𝒎


𝒎=𝟏

𝒚𝒊𝒆𝒍𝒅𝒔 ℙ 𝑨|𝑬𝒎 ℙ 𝑬𝒎
Note that ℙ 𝑨 ∩ 𝑬𝒎 = ℙ 𝑨|𝑬𝒎 ℙ 𝑬𝒎 = ℙ 𝑬𝒎 |𝑨 ℙ 𝑨 ℙ 𝑬𝒎 |𝑨 = ℙ 𝑨

This is Bayes' rule.

Finally, recall that events 𝑨 and 𝑩 are independent if and only if ℙ 𝑨 ∩ 𝑩 = ℙ 𝑨 ℙ 𝑩 . This
implies that ℙ 𝑨|𝑩 = ℙ 𝑨 .

1
In the context of random variables, considering random variables 𝑿 and 𝒀 with joint PDF 𝒇𝑿,𝒀 𝒙, 𝒚 or
PMF 𝒑𝑿,𝒀 𝒙, 𝒚 , the law of total probability takes the forms:
∞ ∞

𝒇𝑿 𝒙 = 𝒇𝑿,𝒀 𝒙, 𝒚 𝒅𝒚 = 𝒇𝑿|𝒀 𝒙|𝒚 𝒇𝒀 𝒚 𝒅𝒚


−∞ −∞

𝒑𝑿 𝒙 = 𝒑𝑿,𝒀 𝒙, 𝒚 = 𝒑𝑿|𝒀 𝒙|𝒚 𝒑𝒀 𝒚


𝒚 𝒚

𝑿 and 𝒀 are independent if and only if 𝒇𝑿,𝒀 𝒙, 𝒚 = 𝒇𝑿 𝒙 𝒇𝒀 𝒚 or 𝒑𝑿,𝒀 𝒙, 𝒚 = 𝒑𝑿 𝒙 𝒑𝒀 𝒚 .

2
Union Bound
𝑵 𝑵
The probability ℙ 𝒏=𝟏 𝑨𝒏 is upperbounded by 𝒏=𝟏 ℙ 𝑨𝒏 .

Proof. We first prove that ℙ 𝑨 ∪ 𝑩 ≤ ℙ 𝑨 + ℙ 𝑩

ℙ 𝑨 ∪ 𝑩 = ℙ 𝑨 + ℙ 𝑩 ∩ 𝑨𝒄 and ℙ 𝑩 = ℙ 𝑩 ∩ 𝑨 + ℙ 𝑩 ∩ 𝑨𝒄 . Hence,

ℙ 𝑨∪𝑩 =ℙ 𝑨 +ℙ 𝑩 −ℙ 𝑩∩𝑨 .

Since ℙ 𝑩 ∩ 𝑨 ≥ 𝟎, ℙ 𝑨 ∪ 𝑩 ≤ ℙ 𝑨 + ℙ 𝑩 .

𝑵 𝑵
We can generalize this result to ℙ 𝒏=𝟏 𝑨𝒏 ≤ 𝒏=𝟏 ℙ 𝑨𝒏 using induction.

𝑵−𝟏 𝑵−𝟏
Assume that ℙ 𝒏=𝟏 𝑨𝒏 ≤ 𝒏=𝟏 ℙ 𝑨𝒏 .
𝑵 𝑵−𝟏
ℙ 𝒏=𝟏 𝑨𝒏 =ℙ 𝒏=𝟏 𝑨𝒏 ∪ 𝑨𝑵 . Using the case of two events,
𝑵−𝟏 𝑵−𝟏 𝑵−𝟏 𝑵−𝟏
ℙ 𝒏=𝟏 𝑨𝒏 ∪ 𝑨𝑵 ≤ ℙ 𝒏=𝟏 𝑨𝒏 + ℙ 𝑨𝑵 . But ℙ 𝒏=𝟏 𝑨𝒏 ≤ 𝒏=𝟏 ℙ 𝑨𝒏 .
Hence,

𝑵−𝟏 𝑵 𝑵−𝟏 𝑵

ℙ 𝑨𝒏 ∪ 𝑨𝑵 = ℙ 𝑨𝒏 ≤ ℙ 𝑨𝒏 + ℙ 𝑨𝑵 = ℙ 𝑨𝒏
𝒏=𝟏 𝒏=𝟏 𝒏=𝟏 𝒏=𝟏

3
Convex and Concave Functions
A set 𝑺 ⊆ ℝ𝑵 is convex if for every 𝒙, 𝒚 ∈ 𝑺, 𝝀𝒙 + 𝟏 − 𝝀 𝒚 ∈ 𝑺 for every 𝝀 ∈ 𝟎, 𝟏 .

Let 𝒈 𝒙 be a function defined over the domain 𝓓 ⊆ ℝ𝑵 . Function 𝒈 𝒙 : 𝓓 → ℝ is convex if its domain,
𝓓, is a convex set and it satisfies:

𝒈 𝝀𝒙 + 𝟏 − 𝝀 𝒚 ≤ 𝝀𝒈 𝒙 + 𝟏 − 𝝀 𝒈 𝒚 for every 𝒙, 𝒚 ∈ 𝓓 and 𝝀 ∈ 𝟎, 𝟏 .

A function 𝒈 𝒙 is concave if its domain is a convex set and it satisfies:

𝒈 𝝀𝒙 + 𝟏 − 𝝀 𝒚 ≥ 𝝀𝒈 𝒙 + 𝟏 − 𝝀 𝒈 𝒚 for every 𝒙, 𝒚 ∈ 𝓓 and 𝝀 ∈ 𝟎, 𝟏 .

𝒅𝟐 𝒈 𝒙 𝒅𝟐 𝒈 𝒙
If 𝑵 = 𝟏 and 𝒈 𝒙 is twice differentiable, it is convex when ≥ 𝟎 and concave when ≤𝟎
𝒅𝒙𝟐 𝒅𝒙𝟐
over 𝓓. For general 𝑵, the function is convex if the Hessian matrix is positive semidefinite and concave if
𝒙𝟏
𝒙𝟐
the Hessian matrix is negative semidefinite. Recall that if 𝒙 = ⋮ , the Hessian matrix is given by:
𝒙𝑵

𝝏𝟐 𝒈 𝝏𝟐 𝒈 𝝏𝟐 𝒈 𝝏𝟐 𝒈
𝝏𝒙𝟐𝟏 𝝏𝒙𝟏 𝝏𝒙𝟐 𝝏𝒙𝟏 𝝏𝒙𝑵−𝟏 𝝏𝒙𝟏 𝝏𝒙𝑵

𝝏𝟐 𝒈 𝝏𝟐 𝒈 𝝏𝟐 𝒈 𝝏𝟐 𝒈
𝝏𝒙𝟐 𝝏𝒙𝟏 𝝏𝒙𝟐𝟐 𝝏𝒙𝟐 𝝏𝒙𝑵−𝟏 𝝏𝒙𝟐 𝝏𝒙𝑵
⋮ ⋱ ⋮
𝟐
𝝏 𝒈 𝟐
𝝏𝟐 𝒈 𝝏 𝒈 𝝏𝟐 𝒈
𝝏𝒙𝑵−𝟏 𝝏𝒙𝟏 𝝏𝒙𝑵−𝟏 𝝏𝒙𝟐 𝝏𝒙𝟐𝑵−𝟏 𝝏𝒙𝑵−𝟏 𝝏𝒙𝑵

𝝏𝟐 𝒈 𝝏𝟐 𝒈 𝝏𝟐 𝒈 𝝏𝟐 𝒈
𝝏𝒙𝑵 𝝏𝒙𝟏 𝝏𝒙𝑵 𝝏𝒙𝟐 𝝏𝒙𝑵 𝝏𝒙𝑵−𝟏 𝝏𝒙𝟐𝑵
Examples:

𝒈 𝒙 = 𝒙𝟐 is convex over ℝ.

𝒈 𝒙 = ln 𝒙 is concave over ℝ+ .

𝟏
𝒈 𝒙 = is convex over ℝ+.
𝒙

𝒈 𝒙 = 𝒙𝟐𝟏 + 𝒙𝟐𝟐 is convex over ℝ𝟐.

4
Important Inequality

𝒏 𝒛𝒏 ≤ 𝒏 𝒛𝒏 , where 𝒛𝒏 ∈ ℂ. If we have an integral, ∫ 𝒇 𝒛 𝒅𝒛 ≤ ∫ 𝒇(𝒛) 𝒅𝒛, where


𝒇(𝒛) is a complex-valued function.

Examples:
𝒒
𝔼 𝑿𝒒 ≤𝔼 𝑿 𝒒
because 𝒏 𝒙𝒏 ℙ 𝑿 = 𝒙𝒏 ≤ 𝒏 𝒙𝒏 𝒒 ℙ 𝑿 = 𝒙𝒏 or

∫ 𝒙𝒒 𝒇𝑿 𝒙 𝒅𝒙 ≤ ∫ 𝒙 𝒒 𝒇𝑿 𝒙 𝒅𝒙.


If 𝝓𝑿 𝒖 = ∫−∞ 𝒆𝒊𝒖𝒙 𝒇𝑿 𝒙 𝒅𝒙, 𝒖 ∈ ℝ, then
∞ ∞
𝝓𝑿 𝒖 ≤ ∫−∞ 𝒆𝒊𝒖𝒙 𝒇𝑿 𝒙 𝒅𝒙 = ∫−∞ 𝒇𝑿 𝒙 𝒅𝒙 = 𝟏.

That is, 𝝓𝑿 𝒖 ≤ 𝟏.

5
Existence of Moments

The 𝒌th moment of a random variable, 𝔼 𝑿𝒌 = 𝒙𝒙
𝒌
𝒑𝑿 𝒙 or 𝔼 𝑿𝒌 = ∫−∞ 𝒙𝒌 𝒇𝑿 𝒙 𝒅𝒙,
𝒌
exists if 𝔼 𝑿 is finite. Hence, for the mean 𝔼 𝑿 to exist 𝔼 𝑿 < ∞.

Theorem. If the 𝒌th moment exists and if 𝒋 < 𝑘, then the 𝒋th moment exists.

Proof.

𝒋
𝔼 𝑿 = 𝒙 𝒋 𝒇𝑿 𝒙 𝒅𝒙 = 𝒙 𝒋 𝒇𝑿 𝒙 𝒅𝒙 + 𝒙 𝒋 𝒇𝑿 𝒙 𝒅𝒙
−∞ 𝒙 ≤𝟏 𝒙 >1

𝒚𝒊𝒆𝒍𝒅𝒔 ∞
If 𝒙 ≤ 𝟏, 𝒙 𝒋 ≤ 𝟏 ∫𝒙 ≤𝟏 𝒙 𝒋 𝒇𝑿 𝒙 𝒅𝒙 ≤ ∫𝒙 ≤𝟏 𝒇𝑿 𝒙 𝒅𝒙 ≤ ∫−∞ 𝒇𝑿 𝒙 𝒅𝒙 = 𝟏.

𝒚𝒊𝒆𝒍𝒅𝒔
If 𝒙 > 1, 𝑗 < 𝑘, then 𝒙 𝒋 < 𝒙 𝒌


∫𝒙 >1 𝒙 𝒋 𝒇𝑿 𝒙 𝒅𝒙 ≤ ∫𝒙 >1 𝒙 𝒌 𝒇𝑿 𝒙 𝒅𝒙 ≤ ∫−∞ 𝒙 𝒌 𝒇𝑿 𝒙 𝒅𝒙

Therefore,

𝒋 𝒌
𝔼 𝑿 ≤𝟏+𝔼 𝑿
𝒌 𝒋
If 𝔼 𝑿 is finite, then 𝔼 𝑿 is also finite for 𝒋 < 𝑘.

6
Cauchy-Schwarz Inequality in the Context of Random Variables
Consider random variables 𝑿 and 𝒀, which may be complex-valued. Consider the random variable
𝜶𝑿 − 𝒀∗ , where * denotes complex conjugation and 𝜶 ∈ ℂ.

𝒚𝒊𝒆𝒍𝒅𝒔
𝔼 𝜶𝑿 − 𝒀∗ 𝟐
≥𝟎 𝜶 𝟐𝔼 𝑿 𝟐
+𝔼 𝒀 𝟐
− 𝜶𝔼 𝑿𝒀 − 𝜶∗ 𝔼 𝑿∗ 𝒀∗ ≥ 𝟎

𝔼 𝑿∗ 𝒀∗
This inequality is valid for any 𝜶. Set 𝜶 = assuming 𝔼 𝑿 𝟐 ≠ 𝟎. Hence,
𝔼 𝑿𝟐

𝟐
𝔼 𝑿𝒀 𝟐 𝟐
𝔼 𝑿𝒀 𝟐 𝔼 𝑿𝒀 𝟐 𝒚𝒊𝒆𝒍𝒅𝒔
𝟐
𝔼 𝑿𝒀 𝟐
𝔼 𝑿 +𝔼 𝒀 − − ≥𝟎 𝔼 𝒀 ≥
𝔼 𝑿𝟐 𝟐 𝔼 𝑿𝟐 𝔼 𝑿𝟐 𝔼 𝑿𝟐

𝟐 𝟐 𝟐 𝟐 𝟐
Thus, 𝔼 𝑿𝒀 ≤ 𝔼 𝑿 𝔼 𝒀 or 𝔼 𝑿𝒀 ≤ 𝔼 𝑿 𝔼 𝒀 . This means that when 𝑿 and 𝒀
are real-valued, 𝔼 𝑿𝒀 ≤ 𝔼 𝑿𝒀 ≤ 𝔼 𝑿 𝟐 𝔼 𝒀 𝟐 . If we replace 𝑿 by 𝑿 − 𝔼 𝑿 and 𝒀 by

𝒀∗ − 𝔼 𝒀∗ , we obtain the inequality: Cov 𝑿, 𝒀 ≤ Var 𝑿 Var 𝒀 . Since the correlation


Cov 𝑿,𝒀
coefficient 𝝆𝑿𝒀 = , use of Cauchy-Schwarz inequality allows us to prove that 𝝆𝑿𝒀 ≤ 𝟏.
Var 𝑿 Var 𝒀

Setting 𝒀 = 𝟏 in 𝔼 𝑿𝒀 ≤ 𝔼 𝑿 𝟐 𝔼 𝒀 𝟐 means that 𝔼 𝑿 ≤ 𝔼 𝑿 𝟐 . Since Var 𝑿 =


𝔼 𝑿 𝟐 − 𝔼 𝑿 𝟐 , we can see that the variance is always nonnegative.

An important point is that Cauchy-Schwarz inequality is achieved with equality if 𝑿 = 𝜷𝒀∗ , where
𝜷 ∈ ℂ. If 𝑿 = 𝜷𝒀∗ , 𝔼 𝑿𝒀 𝟐
= 𝜷 𝟐
𝔼 𝒀 𝟐 𝟐
=𝔼 𝑿 𝟐
𝔼 𝒀 𝟐
.

7
Theorem Regarding CDF
If the 𝒏th moment of random variable 𝑿 exists, then 𝐥𝐢𝐦𝒙→∞ 𝒙𝒏 𝟏 − 𝑭𝑿 𝒙 = 𝟎.

Proof. 𝔼 𝑿𝒏 = ∫−∞ 𝒙𝒏 𝒇𝑿 𝒙 𝒅𝒙

𝜶 ∞
Consider positive 𝜶. 𝔼 𝑿𝒏 = ∫−∞ 𝒙𝒏 𝒇𝑿 𝒙 𝒅𝒙 + ∫𝜶 𝒙𝒏 𝒇𝑿 𝒙 𝒅𝒙

𝜶 ∞ 𝜶 ∞
𝒏 𝒏 𝒏 𝒏 𝒏
𝔼𝑿 ≥ 𝒙 𝒇𝑿 𝒙 𝒅𝒙 + 𝜶 𝒇𝑿 𝒙 𝒅𝒙 = 𝒙 𝒇𝑿 𝒙 𝒅𝒙 + 𝜶 𝒇𝑿 𝒙 𝒅𝒙
−∞ 𝜶 −∞ 𝜶

𝔼 𝑿𝒏 ≥ 𝒙𝒏 𝒇𝑿 𝒙 𝒅𝒙 + 𝜶𝒏 𝟏 − 𝑭𝑿 𝜶
−∞

𝜶
𝒏 𝒏
𝜶 𝟏 − 𝑭𝑿 𝜶 ≤ 𝔼𝑿 − 𝒙𝒏 𝒇𝑿 𝒙 𝒅𝒙
−∞

As 𝜶 → ∞, the L.H.S., which is nonnegative, is upperbounded by a quantity that tends to zero.


(Recall that 𝔼 𝑿𝒏 is finite.)

This means that 𝐥𝐢𝐦𝜶→∞ 𝜶𝒏 𝟏 − 𝑭𝑿 𝜶 = 𝟎.

8
Mean of Nonnegative RV in terms of CDF & Markov's Inequality

Consider nonnegative RV 𝑿 and suppose that 𝔼 𝑿 exists. 𝔼 𝑿 = ∫𝟎 𝟏 − 𝑭𝑿 𝒙 𝒅𝒙, where
𝟏 − 𝑭𝑿 𝒙 = ℙ 𝑿 > 𝑥 is sometimes called the complementary CDF or tail distribution.

Proof.
∞ ∞ ∞ ∞
𝔼 𝑿 = ∫𝟎 𝒙𝒇𝑿 𝒙 𝒅𝒙 = − ∫𝟎 𝒙𝒅 𝟏 − 𝑭𝑿 𝒙 = −𝒙 𝟏 − 𝑭𝑿 𝒙 𝟎 + ∫𝟎 𝟏 − 𝑭𝑿 𝒙 𝒅𝒙.

If 𝔼 𝑿 is finite, then 𝐥𝐢𝐦𝒙→∞ 𝒙 𝟏 − 𝑭𝑿 𝒙 = 𝟎. Hence, 𝔼 𝑿 = ∫𝟎 𝟏 − 𝑭𝑿 𝒙 𝒅𝒙.
∞ 𝒂
𝟏 − 𝑭𝑿 𝒙 is a nonnegative function and, hence, ∫𝟎 𝟏 − 𝑭𝑿 𝒙 𝒅𝒙 ≥ ∫𝟎 𝟏 − 𝑭𝑿 𝒙 𝒅𝒙, where 𝒂 is a
positive real constant. This means that
𝒂

𝔼𝑿 ≥ 𝟏 − 𝑭𝑿 𝒙 𝒅𝒙
𝟎

𝟏 − 𝑭𝑿 𝒙 is a nonincreasing function. Thus, 𝟏 − 𝑭𝑿 𝒙 ≥ 𝟏 − 𝑭𝑿 𝒂 for all 𝟎 ≤ 𝒙 ≤ 𝒂.


𝒂 𝒂

𝔼𝑿 ≥ 𝟏 − 𝑭𝑿 𝒙 𝒅𝒙 ≥ 𝟏 − 𝑭𝑿 𝒂 𝒅𝒙 = 𝒂 𝟏 − 𝑭𝑿 𝒂 = 𝒂ℙ 𝑿 > 𝑎
𝟎 𝟎

Actually, it can be shown that for 𝒂 > 0 and a nonnegative continuous or discrete RV 𝑿 whose first
moment exists, we have the following inequality:
𝔼𝑿
ℙ 𝑿≥𝒂 ≤
𝒂
This is typically referred to as Markov's inequality.

9
Important Series

∞ 𝟏 ∞ 𝟏 𝝅𝟐
Consider the series 𝒏=𝟏 𝒏𝜸 . It converges when 𝜸 > 1. For example, 𝒏=𝟏 𝒏𝟐 = .
𝟔

∞ 𝟏 ∞ 𝟏 𝟏
One way to show this is to upperbound 𝒏=𝟏 𝒏𝜸 by 𝟏 + ∫𝟏 𝒙𝜸 𝒅𝒙 =𝟏+
𝜸−𝟏
< ∞ when
𝜸 > 1.

𝟏 𝟏 𝟏 𝟏
When 𝜸 = 𝟏, we have the harmonic series, i.e., + + + + ⋯, which can be shown to diverge.
𝟏 𝟐 𝟑 𝟒

∞ 𝟏 ∞ 𝟏 𝟏
Note that 𝒏=𝟏 𝒏𝜸 can be lowerbounded by ∫𝟏 𝒙𝜸
𝒅𝒙 = . Hence,
𝜸−𝟏


𝟏 𝟏 𝟏
< < 𝟏 +
𝜸−𝟏 𝒏𝜸 𝜸−𝟏
𝒏=𝟏

10
Taylor Series
Consider function 𝒈 𝒙 : ℝ → ℝ. Its Taylor series expansion about a point 𝒙𝟎 is given by:

𝒌 𝒙
∞ 𝒈 𝒅𝒈(𝒙)
𝒈 𝒙 = 𝒌=𝟎
𝟎
𝒙 − 𝒙𝟎 𝒌 , where 𝒈 𝟎 𝒙 = 𝒈 𝒙 , 𝒈 𝟏 𝒙 = ,𝒈 𝟐 𝒙 =
𝒌! 𝒅𝒙
𝒅𝟐 𝒈(𝒙)
, and so on. Using 𝒏 + 𝟏 terms and a remainder term, we can write:
𝒅𝒙𝟐

𝒏
𝒈 𝒌 𝒙𝟎 𝒌
𝒈 𝒏+𝟏 𝜻 𝒏+𝟏
𝒈 𝒙 = 𝒙 − 𝒙𝟎 + 𝒙 − 𝒙𝟎
𝒌! 𝒏+𝟏 !
𝒌=𝟎

where 𝜻 = 𝝀𝒙𝟎 + 𝟏 − 𝝀 𝒙, 𝝀 ∈ 𝟎, 𝟏 , i.e., 𝜻 lies between 𝒙 and 𝒙𝟎 .

Example: Let 𝒈 𝒙 = 𝒆𝒙 and 𝒙𝟎 = 𝟎. Hence,

𝟏 𝒆𝜻
𝒆𝒙 = 𝒏 𝒌
𝒌=𝟎 𝒌! 𝒙 + 𝒏+𝟏 ! 𝒙
𝒏+𝟏 , 𝜻 ∈ 𝟎, 𝒙 when 𝒙 > 0 and 𝜻 ∈ 𝒙, 𝟎 when 𝒙 < 0.

Note that if 𝒏 is odd 𝒙𝒏+𝟏 is nonnegative. Hence, for odd 𝒏,

𝟏
𝒆𝒙 ≥ 𝒏 𝒌 𝒙
𝒌=𝟎 𝒌! 𝒙 for all 𝒙. When 𝒏 = 𝟏, 𝒆 ≥ 𝟏 + 𝒙 for all 𝒙. This means that 𝒆
−𝒙
≥ 𝟏 − 𝒙 for all 𝒙.

If function 𝒈(𝒙) is complex-valued,

𝒏 𝒙
𝒈 𝒌 𝒙𝟎 𝟏
𝒈 𝒙 = 𝒙 − 𝒙𝟎 𝒌 + 𝒙 − 𝒕 𝒏 𝒈 𝒏+𝟏 𝒕 𝒅𝒕
𝒌! 𝒏!
𝒌=𝟎 𝒙𝟎

This expression is also valid for real-valued functions.

In the multidimensional case when 𝒈 𝒙 : ℝ𝑵 → ℝ, the Taylor series expansion about point 𝒚 is given
by:

𝟏
𝒈 𝒙 = 𝒈 𝒚 + 𝛁T𝒈 𝒚 𝒙 − 𝒚 + 𝒙 − 𝒚 T𝓗 𝒚 𝒙 − 𝒚 + ⋯
𝟐
𝝏𝒈 𝝏𝒈 𝝏𝒈
where 𝛁 T 𝒈 𝒚 = … and 𝓗 𝒚 is the Hessian matrix evaluated at point 𝒚.
𝝏𝒙𝟏 𝒚 𝝏𝒙𝟐 𝒚 𝝏𝒙𝑵 𝒚

11
That is,

𝑵 𝑵 𝑵
𝝏𝒈 𝟏 𝝏𝟐 𝒈
𝒈 𝒙 =𝒈 𝒚 + 𝒙 𝒏 − 𝒚𝒏 + 𝒙 𝒏 − 𝒚𝒏 𝒙 𝒎 − 𝒚 𝒎 + ⋯
𝝏𝒙𝒏 𝒚
𝟐 𝝏𝒙𝒏 𝝏𝒙𝒎 𝒚
𝒏=𝟏 𝒏=𝟏 𝒎=𝟏

12
Leibniz's Rule for Differentiation under the Integral Sign
Let 𝒇(𝒙, 𝒕) be a function such that the partial derivative of 𝒇 with respect to 𝒕 exists, and is continuous.
Then,

𝜷 𝒕 𝜷 𝒕
𝒅 𝝏𝒇 𝒅𝜷 𝒅𝜶
𝒇 𝒙, 𝒕 𝒅𝒙 = 𝒅𝒙 + 𝒇 𝜷 𝒕 ,𝒕 − 𝒇 𝜶 𝒕 ,𝒕
𝒅𝒕 𝝏𝒕 𝒅𝒕 𝒅𝒕
𝜶 𝒕 𝜶 𝒕

Example:

𝒕 𝒕
𝒅 𝐥𝐧 𝟏 + 𝒕𝒙 𝒙 𝐥𝐧 𝟏 + 𝒕𝟐
𝒅𝒙 = 𝒅𝒙 +
𝒅𝒕 𝟏 + 𝒙𝟐 𝟏 + 𝒙𝟐 𝟏 + 𝒕𝒙 𝟏 + 𝒕𝟐
𝟎 𝟎

Another example:


𝒅 𝒙𝟐 𝒕𝟐
− −
𝒆 𝟐 𝒅𝒙 = −𝒆 𝟐
𝒅𝒕
𝒕

13
Q-Function

𝟏 𝒗𝟐
−𝟐
𝑸 𝒙 ≜ 𝒆 𝒅𝒗
𝟐𝝅
𝒙

𝑸 𝒙 is ℙ 𝑿 > 𝑥 = 𝟏 − 𝑭𝑿 𝒙 where 𝑿~𝓝 𝟎, 𝟏 . That is, 𝑸 𝒙 is the complementary CDF of the


standard Gaussian/normal distribution.

𝑸 𝒙 is convex over 𝟎, ∞ and concave over −∞, 𝟎 .

𝑸 −𝒙 = 𝟏 − 𝑸 𝒙
𝒙−𝝁
If 𝑿~𝓝 𝝁, 𝝇𝟐 , ℙ 𝑿 > 𝑥 = 𝑸 .
𝝇

𝒙𝟐
𝟏 −
If 𝒙 ≥ 𝟎, 𝑸 𝒙 ≤ 𝒆 𝟐
𝟐

𝒙𝟐 𝒙𝟐
𝟏 𝟏 − 𝟏 −
If 𝒙 ≥ 𝟎, 𝟏− 𝒆 𝟐 ≤𝑸 𝒙 ≤ 𝒆 𝟐
𝟐𝝅 𝒙 𝒙𝟐 𝟐𝝅 𝒙

14
Chi-Squared Distribution
The chi-squared distribution with 𝒌 degrees of freedom, 𝝌𝟐 𝒌 , is the PDF of a sum of the squares of 𝒌
independent real-valued zero-mean unit-variance Gaussian random variables.

𝒌 𝒙
𝟏 −𝟏 −𝟐
If 𝑿~𝝌 𝒌 ,𝟐
𝒇𝑿 𝒙 = 𝐤 𝒌
𝒙 𝟐 𝒆 𝕀 𝒙≥𝟎
𝟐 𝟐𝚪
𝟐

If 𝑿𝟏 ~𝝌𝟐 𝒌𝟏 and 𝑿𝟐 ~𝝌𝟐 𝒌𝟐 are independent, 𝑿𝟏 + 𝑿𝟐 ~ 𝝌𝟐 𝒌𝟏 + 𝒌𝟐


𝑿𝟏
𝒌𝟏
If 𝑿𝟏 ~𝝌𝟐 𝒌𝟏 and 𝑿𝟐 ~𝝌𝟐 𝒌𝟐 are independent, then 𝑿 ~𝑭 𝒌𝟏 , 𝒌𝟐 (F-distribution)
𝟐
𝒌𝟐

𝑿𝟏 𝒌𝟏 𝒌𝟐
If 𝑿𝟏 ~𝝌𝟐 𝒌𝟏 and 𝑿𝟐 ~𝝌𝟐 𝒌𝟐 are independent, ~Beta , (beta distribution)
𝑿𝟏 +𝑿𝟐 𝟐 𝟐

𝟏
If 𝑿~ 𝝌𝟐 𝒌 ,
𝟐
𝒇𝑿 𝒙 = 𝒆−𝒙 𝕀 𝒙 ≥ 𝟎 , i.e., 𝑿 is a unit-mean exponential random variable.

15
PDF of the Sum of Two Random Variables
Consider random variables 𝑿 and 𝒀 with joint PDF 𝒇𝑿,𝒀 𝒙, 𝒚 . We are interested in the PDF of
𝒁 = 𝑿 + 𝒀.
∞ 𝒛−𝒙

𝑭𝒁 𝒛 = ℙ 𝒁 ≤ 𝒛 = 𝒇𝑿,𝒀 𝒙, 𝒚 𝒅𝒚𝒅𝒙
−∞ −∞


𝒅𝑭𝒁 𝒛
𝒇𝒁 𝒛 = = 𝒇𝑿,𝒀 𝒙, 𝒛 − 𝒙 𝒅𝒙
𝒅𝒛
−∞

If 𝑿 and 𝒀 are independent,


𝒇𝒁 𝒛 = 𝒇𝑿 𝒙 𝒇𝒀 𝒛 − 𝒙 𝒅𝒙
−∞

This is, the PDF of the sum of two independent random variables 𝑿 and 𝒀 is the convolution of the PDF's
of these variables. The results here are also valid for the discrete case.

Example:

Let 𝑿 and 𝒀 be independent, 𝒇𝑿 𝒙 = 𝝁𝟏 𝒆−𝝁𝟏 𝒙 𝕀 𝒙 ≥ 𝟎 and 𝒇𝒀 𝒚 = 𝝁𝟐 𝒆−𝝁𝟐 𝒚 𝕀 𝒚 ≥ 𝟎 , where


𝝁𝟏 , 𝝁𝟐 ∈ ℝ+ .

𝒇𝒁 𝒛 = 𝝁𝟏 𝒆−𝝁𝟏 𝒙 𝕀 𝒙 ≥ 𝟎 𝝁𝟐 𝒆−𝝁𝟐 𝒛−𝒙


𝕀 𝒛 − 𝒙 ≥ 𝟎 𝒅𝒙
−∞
𝒛

= 𝕀 𝒛 ≥ 𝟎 𝝁𝟏 𝝁𝟐 𝒆−𝝁𝟐 𝒛 𝒆𝒙 𝝁𝟐 −𝝁𝟏
𝒅𝒙
𝟎

𝝁 𝝁
If 𝝁𝟏 ≠ 𝝁𝟐 , 𝒇𝒁 𝒛 = 𝝁 𝟏−𝝁𝟐 𝒆−𝝁𝟏𝒛 − 𝒆−𝝁𝟐𝒛 𝕀 𝒛 ≥ 𝟎 .
𝟐 𝟏

If 𝝁𝟏 = 𝝁𝟐 = 𝝁, 𝒇𝒁 𝒛 = 𝝁𝟐 𝒛𝒆−𝝁𝒛 𝕀 𝒛 ≥ 𝟎 .

16
PDF of the Ratio of Two Random Variables
𝒀
Consider random variables 𝑿 and 𝒀 with joint PDF 𝒇𝑿,𝒀 𝒙, 𝒚 . We are interested in the PDF of 𝒁 = 𝑿
when ℙ 𝑿 = 𝟎 = 𝟎.

∞ 𝒙𝒛 𝟎 ∞

𝑭𝒁 𝒛 = ℙ 𝒁 ≤ 𝒛 = 𝒇𝑿,𝒀 𝒙, 𝒚 𝒅𝒚𝒅𝒙 + 𝒇𝑿,𝒀 𝒙, 𝒚 𝒅𝒚𝒅𝒙


𝟎 −∞ −∞ 𝒙𝒛

∞ 𝟎 ∞
𝒅𝑭𝒁 𝒛
𝒇𝒁 𝒛 = = 𝒙𝒇𝑿,𝒀 𝒙, 𝒛𝒙 𝒅𝒙 + −𝒙 𝒇𝑿,𝒀 𝒙, 𝒛𝒙 𝒅𝒙 = 𝒙 𝒇𝑿,𝒀 𝒙, 𝒛𝒙 𝒅𝒙
𝒅𝒛
𝟎 −∞ −∞

If 𝑿 and 𝒀 are independent,


𝒇𝒁 𝒛 = 𝒙 𝒇𝑿 𝒙 𝒇𝒀 𝒛𝒙 𝒅𝒙
−∞

Example:

𝒙𝟐 𝒚𝟐
𝟏 − 𝟏 −
Let 𝑿 and 𝒀 be independent, 𝒇𝑿 𝒙 = 𝒆 𝟐 and 𝒇𝒀 𝒚 = 𝒆 𝟐.
𝟐𝝅 𝟐𝝅

𝒙𝟐 𝒙𝟐 𝒛𝟐 𝒙𝟐
∞ 𝟏 − 𝟏 − 𝟏 ∞ − 𝟏+𝒛𝟐 𝟏 𝟏
𝒇𝒁 𝒛 = ∫−∞ 𝒙 𝒆 𝟐 𝒆 𝟐 𝒅𝒙 = ∫ 𝒆 𝟐 𝒅𝒙𝟐 = (Standard
𝟐𝝅 𝟐𝝅 𝝅 𝟎 𝝅 𝟏+𝒛𝟐
Cauchy PDF)

17
PDF of the Maximum and Minimum of Independent Random
Variables
𝑵
Consider i.i.d. random variables 𝑿𝒏 𝒏=𝟏 , 𝒇𝑿𝒏 𝒙 = 𝒇 𝒙 and 𝑭𝑿𝒏 𝒙 = 𝑭 𝒙 .

Let 𝒀 = max 𝑿𝟏 , 𝑿𝟐 , … , 𝑿𝑵 and 𝒁 = min 𝑿𝟏 , 𝑿𝟐 , … , 𝑿𝑵 .

𝑭𝒀 𝒚 = ℙ 𝒀 ≤ 𝒚 = ℙ 𝑿𝟏 ≤ 𝒚, 𝑿𝟐 ≤ 𝒚, … , 𝑿𝑵 ≤ 𝒚
𝑵 𝑵
𝑵
= ℙ 𝑿𝒏 ≤ 𝒚 = 𝑭𝑿𝒏 𝒚 = 𝑭 𝒚
𝒏=𝟏 𝒏=𝟏

𝒅𝑭𝒀 𝒚 𝑵−𝟏
𝒇𝒀 𝒚 = =𝑵𝑭 𝒚 𝒇 𝒚
𝒅𝒚

Similarly,

𝟏 − 𝑭𝒁 𝒛 = ℙ 𝒁 > 𝑧 = ℙ 𝑿𝟏 > 𝑧, 𝑿𝟐 > 𝑧, … , 𝑿𝑵 > 𝑧


𝑵 𝑵
𝒚𝒊𝒆𝒍𝒅𝒔
𝑵
= ℙ 𝑿𝒏 > 𝑧 = 𝟏 − 𝑭𝑿𝒏 𝒛 = 𝟏−𝑭 𝒛
𝒏=𝟏 𝒏=𝟏

𝑵
𝑭𝒁 𝒛 = 𝟏 − 𝟏 − 𝑭 𝒛

𝒅𝑭𝒁 𝒛 𝑵−𝟏
𝒇𝒁 𝒛 = =𝑵 𝟏−𝑭 𝒛 𝒇 𝒛
𝒅𝒛
We can write down the PDF's directly. Assume that the random variables are independent, but not
necessarily identically distributed.

𝑵 𝑵

𝒇𝒀 𝒚 = 𝒇𝑿 𝒚
𝒏
𝑭𝑿𝒗 𝒚
𝒏=𝟏 𝒗=𝟏
𝒗≠𝒏

The interpretation of this expression is as follows. Any of the 𝑵 random variables can be the maximum.
If the maximum is 𝑿𝒏 , then all the other random variables are less than (or equal to) it. Similarly,

𝑵 𝑵

𝒇𝒁 𝒛 = 𝒇𝑿 𝒛
𝒏
𝟏 − 𝑭𝑿𝒗 𝒛
𝒏=𝟏 𝒗=𝟏
𝒗≠𝒏

18
Joint PDF of the Maximum and Minimum of i.i.d. Random Variables
𝑵
Consider i.i.d. random variables 𝑿𝒏 𝒏=𝟏 , 𝑵 ≥ 𝟐, 𝒇𝑿𝒏 𝒙 = 𝒇 𝒙 and 𝑭𝑿𝒏 𝒙 = 𝑭 𝒙 .

Let 𝒀 = max 𝑿𝟏 , 𝑿𝟐 , … , 𝑿𝑵 and 𝒁 = min 𝑿𝟏 , 𝑿𝟐 , … , 𝑿𝑵 .

𝑭𝒀,𝒁 𝜶, 𝜷 = ℙ 𝒀 ≤ 𝜶, 𝒁 ≤ 𝜷 = ℙ 𝒀 ≤ 𝜶 − ℙ 𝒀 ≤ 𝜶, 𝒁 > 𝛽

𝑭𝒀,𝒁 𝜶, 𝜷 = ℙ 𝑿𝟏 ≤ 𝜶, 𝑿𝟐 ≤ 𝜶, … , 𝑿𝑵 ≤ 𝜶
− ℙ 𝜷 < 𝑋𝟏 ≤ 𝜶, 𝜷 < 𝑋𝟐 ≤ 𝜶, … , 𝜷 < 𝑿𝑵 ≤ 𝜶 𝕀 𝜶 ≥ 𝜷
Therefore,

𝑵 𝑵
𝑭𝒀,𝒁 𝜶, 𝜷 = 𝑭 𝜶 − 𝑭 𝜶 −𝑭 𝜷 𝕀 𝜶≥𝜷
𝝏𝟐 𝑭𝒀,𝒁 𝜶, 𝜷 𝑵−𝟐
𝒇𝒀,𝒁 𝜶, 𝜷 = =𝑵 𝑵−𝟏 𝒇 𝜶 𝒇 𝜷 𝑭 𝜶 −𝑭 𝜷 𝕀 𝜶≥𝜷
𝝏𝜶 𝝏𝜷
This expression can be interpreted as follows. If 𝜶 < 𝛽, then the PDF is zero as the maximum
cannot be less than the minimum. For 𝜶 ≥ 𝜷, one of the 𝑵 random variables is the maximum, and one
of the remaining 𝑵 − 𝟏 random variables is the minimum. The other 𝑵 − 𝟐 random variables are
between the minimum and maximum.

19
Important Limits

𝟏
𝐥𝐢𝐦 𝟏− =𝟎
𝒎→∞ 𝒏
𝒏=𝒎

Proof:

𝟏
𝐥𝐢𝐦 𝟏−
𝒎→∞ 𝒏
𝒏=𝒎
𝒌 𝒌
𝟏 𝒏−𝟏
= 𝐥𝐢𝐦 𝐥𝐢𝐦 𝟏− = 𝐥𝐢𝐦 𝐥𝐢𝐦
𝒎→∞ 𝒌→∞ 𝒏 𝒎→∞ 𝒌→∞ 𝒏
𝒏=𝒎 𝒏=𝒎
𝒎 − 𝟏 𝒎 𝒎 + 𝟏 𝒌 − 𝟐𝒌 − 𝟏 𝒎−𝟏
= 𝐥𝐢𝐦 𝐥𝐢𝐦 …. = 𝐥𝐢𝐦 𝐥𝐢𝐦 = 𝐥𝐢𝐦 𝟎
𝒎→∞ 𝒌→∞ 𝒎 𝒎 + 𝟏𝒎 + 𝟐 𝒌 − 𝟏 𝒌 𝒎→∞ 𝒌→∞ 𝒌 𝒎→∞
=𝟎


𝟏
𝐥𝐢𝐦 𝟏− =𝟏
𝒎→∞ 𝒏𝟐
𝒏=𝒎

Proof:

𝟏
𝐥𝐢𝐦 𝟏−
𝒎→∞ 𝒏𝟐
𝒏=𝒎
𝒌 𝒌 𝒌−𝟏
𝟏 𝒏𝟐 − 𝟏 𝒏−𝟏 𝒏+𝟏
= 𝐥𝐢𝐦 𝐥𝐢𝐦 𝟏 − 𝟐 = 𝐥𝐢𝐦 𝐥𝐢𝐦 = 𝐥𝐢𝐦 𝐥𝐢𝐦
𝒎→∞ 𝒌→∞ 𝒏 𝒎→∞ 𝒌→∞ 𝒏𝟐 𝒎→∞ 𝒌→∞ 𝒏𝟐
𝒏=𝒎 𝒏=𝒎 𝒏=𝒎
𝒎 − 𝟏𝒎 + 𝟏 𝒎 𝒎 + 𝟐𝒎 + 𝟏𝒎 + 𝟑 𝒌 − 𝟐 𝒌 𝒌 − 𝟏𝒌 + 𝟏
= 𝐥𝐢𝐦 𝐥𝐢𝐦 …
𝒎→∞ 𝒌→∞ 𝒎 𝒎 𝒎 + 𝟏𝒎 + 𝟏𝒎 + 𝟐𝒎 + 𝟐 𝒌 − 𝟏𝒌 − 𝟏 𝒌 𝒌
𝒎−𝟏𝒌+𝟏 𝒎−𝟏
= 𝐥𝐢𝐦 𝐥𝐢𝐦 = 𝐥𝐢𝐦 =𝟏
𝒎→∞ 𝒌→∞ 𝒎 𝒌 𝒎→∞ 𝒎

20
Memoryless Distributions
Consider nonnegative random variable 𝑿 with PDF 𝒇𝑿 𝒙 . We are interested in having the following
property for nonnegative 𝒕 and 𝒔:

ℙ 𝑿 > 𝑡 + 𝑠|𝑋 > 𝑠 = ℙ 𝑿 > 𝑡


Define complementary CDF 𝝍 𝒕 = ℙ 𝑿 > 𝑡 .

ℙ 𝑿>𝑡+𝑠 𝝍 𝒕+𝒔
ℙ 𝑿 > 𝑡 + 𝑠|𝑋 > 𝑠 = =
ℙ 𝑿>𝑠 𝝍 𝒔
Requiring that ℙ 𝑿 > 𝑡 + 𝑠|𝑋 > 𝑠 = ℙ 𝑿 > 𝑡 is equivalent to requiring that:

𝝍 𝒕 + 𝒔 = 𝝍 𝒕 𝝍 𝒔 for all 𝒕, 𝒔 ≥ 𝟎. That is, ln 𝝍 𝒕 + 𝒔 = ln 𝝍 𝒕 + ln 𝝍 𝒔 , indicating that


ln 𝝍 𝒕 is a linear function. Therefore, when 𝝍 𝟎 = 𝟏, 𝝍 𝒕 = exp −𝑪𝒕 , where 𝑪 is a constant.

The random variables with such a complementary CDF are:

-Exponential random variable: 𝝍 𝒕 = exp −𝝁𝒕 , where 𝝁 is the reciprocal of the mean.

-Geometric random variable: 𝝍 𝒕 = 𝟏 − 𝒑 𝒕 = exp ln 𝟏 − 𝒑 𝒕 , where 𝒑 is the success


probability and 𝒕, in this case, is a positive integer.

21
Zero Variance
If Var 𝑿 = 𝔼 𝑿𝟐 − 𝔼 𝑿 𝟐
= 𝟎, then 𝑿 is a degenerate (constant) random variable, 𝑿 = 𝔼 𝑿 .
This can be seen from Chebyshev's inequality (to be studied later): ∀𝜺 > 0,

Var 𝑿 𝒚𝒊𝒆𝒍𝒅𝒔 𝒚𝒊𝒆𝒍𝒅𝒔


ℙ 𝑿−𝔼 𝑿 ≥𝜺 ≤ ℙ 𝑿−𝔼 𝑿 ≥ 𝜺 ≤ 𝟎
𝜺𝟐
ℙ 𝑿 − 𝔼 𝑿 ≥ 𝜺 = 𝟎.

(The accurate thing to say is that 𝑿 is an almost surely constant random variable, but making
the distinction will not be important for our purposes, at least for now.)

22
Sample Mean and Variance
Consider 𝑵 i.i.d. random variables 𝑿𝟏 , 𝑿𝟐 , 𝑿𝟑 , … , 𝑿𝑵 such that 𝔼 𝑿𝒏 = 𝝁 and Var 𝑿𝒏 =
𝝇𝟐 . Consider the following two estimators for the mean and the variance:
𝑵
𝟏
𝝁= 𝑿𝒏
𝑵
𝒏=𝟏

𝑵
𝟏 𝟐
𝑽= 𝑿𝒏 − 𝝁
𝑵−𝟏
𝒏=𝟏

Both estimators are unbiased:

𝑵 𝑵
𝟏 𝟏 𝑵𝝁
𝔼𝝁 = 𝔼 𝑿𝒏 = 𝝁= =𝝁
𝑵 𝑵 𝑵
𝒏=𝟏 𝒏=𝟏

𝑵 𝑵 𝑵 𝑵
𝟏 𝟏 𝟐
𝔼𝑽 = 𝔼 𝑿𝒏 𝟐 + 𝟐 𝔼 𝑿 𝒋 𝑿𝒗 − 𝔼 𝑿𝒋 𝑿𝒏
𝑵−𝟏 𝑵 𝑵
𝒏=𝟏 𝒋=𝟏 𝒗=𝟏 𝒋=𝟏
𝑵
𝟏 𝟏
= 𝝁𝟐 + 𝝇𝟐 + 𝑵 𝝁𝟐 + 𝝇𝟐 + 𝑵𝟐 − 𝑵 𝝁𝟐
𝑵−𝟏 𝑵𝟐
𝒏=𝟏
𝟐 𝟐 𝒚𝒊𝒆𝒍𝒅𝒔
− 𝝁 + 𝝇𝟐 + 𝑵 − 𝟏 𝝁𝟐
𝑵

𝑵
𝟏 𝑵−𝟏 𝟐 𝑵−𝟏 𝟐
𝔼𝑽 = 𝝁 + 𝝇𝟐 − 𝝁 = 𝝇𝟐
𝑵−𝟏 𝑵 𝑵
𝒏=𝟏

Note that if 𝝇𝟐 is finite, Var 𝝁 decreases with 𝑵. Specifically,

𝑵
𝟏 𝝇𝟐
Var 𝝁 = 𝟐 Var 𝑿𝒏 =
𝑵 𝑵
𝒏=𝟏

23
Iterated Expectations and Total Variance
Consider two random variables 𝑿 and 𝒀. The conditional expectation of 𝑿 given 𝒀 is:

𝔼 𝑿|𝒀 = 𝒙𝒇𝑿|𝒀 𝒙|𝒚 𝒅𝒙


−∞

Note that 𝔼 𝑿|𝒀 is a function of random variable 𝒀. If we take the expectation of 𝔼 𝑿|𝒀 , i.e.,
𝔼 𝔼 𝑿|𝒀 , using the PDF of 𝒀, we obtain:

∞ ∞

𝒙𝒇𝑿|𝒀 𝒙|𝒚 𝒅𝒙 𝒇𝒀 𝒚 𝒅𝒚
−∞ −∞

∞ ∞ ∞ ∞

𝔼 𝔼 𝑿|𝒀 = 𝒙𝒇𝑿|𝒀 𝒙|𝒚 𝒅𝒙 𝒇𝒀 𝒚 𝒅𝒚 = 𝒙 𝒇𝑿|𝒀 𝒙|𝒚 𝒇𝒀 𝒚 𝒅𝒙𝒅𝒚


−∞ −∞ −∞ −∞
∞ ∞ ∞

= 𝒙 𝒇𝑿,𝒀 𝒙, 𝒚 𝒅𝒚 𝒅𝒙 = 𝒙𝒇𝑿 𝒙 𝒅𝒙 = 𝔼 𝑿
−∞ −∞ −∞

Hence, 𝔼 𝑿 = 𝔼 𝔼 𝑿|𝒀 , where the inner expectation is carried out treating 𝑌 as a constant and
employing the conditional PDF of 𝑿 given 𝒀.

Now we prove that Var 𝑿 = Var 𝔼 𝑿|𝒀 + 𝔼 Var 𝑿|𝒀 . Subscripts are added to the
expectation operators to make explicit the variable over which expectation is taken.

𝟐 𝟐 𝟐 𝟐
Var 𝔼 𝑿|𝒀 = 𝔼𝒀 𝔼𝑿|𝒀 𝑿|𝒀 − 𝔼𝒀 𝔼𝑿|𝒀 𝑿|𝒀 = 𝔼𝒀 𝔼𝑿|𝒀 𝑿|𝒀 − 𝔼𝑿 𝑿

𝟐 𝟐
𝔼 Var 𝑿|𝒀 = 𝔼𝒀 𝔼𝑿|𝒀 𝑿𝟐 |𝒀 − 𝔼𝑿|𝒀 𝑿|𝒀 = 𝔼𝑿 𝑿𝟐 − 𝔼𝒀 𝔼𝑿|𝒀 𝑿|𝒀

Adding,

Var 𝔼 𝑿|𝒀 + 𝔼 Var 𝑿|𝒀 = 𝔼𝑿 𝑿𝟐 − 𝔼𝑿 𝑿 𝟐


= Var 𝑿

Iterated expectations and total variance are valid so long as 𝔼 𝑿 and Var 𝑿 are finite.

24
Example:

Let 𝑺 = 𝑿𝟏 + 𝑿𝟐 + ⋯ + 𝑿𝑵 , where the 𝑿𝒊 ′𝑠 are i.i.d. with mean 𝝁 and variance 𝝇𝟐 , and where 𝑵 is also
a random variable, independent of the 𝑿′𝒊 𝑠.

What is 𝔼 𝑺 ?

𝔼 𝑺 = 𝔼 𝔼 𝑺|𝑵 = 𝔼 𝑵𝝁 = 𝝁𝔼 𝑵

What is Var 𝑺 ?

Var 𝑺 = Var 𝔼 𝑺|𝑵 + 𝔼 Var 𝑺|𝑵 = Var 𝑵𝝁 + 𝔼 𝑵𝝇𝟐 = 𝝁𝟐 Var 𝑵 + 𝝇𝟐 𝔼 𝑵

25
Histogram and Empirical CDF
Assume that we have 𝒏 i.i.d. observations 𝒙𝒌 𝒏𝒌=𝟏 and we want to estimate the underlying PDF. One
simple method for doing this is to divide the range of observations into 𝒌 intervals at points
𝒄𝟎 , 𝒄𝟏 , 𝒄𝟐 , … , 𝒄𝒌 . Defining ∆𝒋 = 𝒄𝒋 − 𝒄𝒋−𝟏 , 𝒋 = 𝟏, 𝟐, … , 𝒌 and 𝒏𝒋 as the number of observations that fall in
𝒏𝒋
the interval 𝒄𝒋−𝟏 , 𝒄𝒋 , the estimated PDF 𝒇 𝒙 = ,𝒙 ∈ 𝒄𝒋−𝟏 , 𝒄𝒋 . (Function 𝒉 𝒙 = 𝒏𝒋 is called
𝒏∆𝒋
histogram.)

𝒏 𝔼 𝒏𝒋 𝒏ℙ 𝒙𝒌 ∈ 𝒄𝒋−𝟏 ,𝒄𝒋 𝑭𝑿 𝒄𝒋 −𝑭𝑿 𝒄𝒋−𝟏


Note that 𝒏𝒋 = 𝒌=𝟏 𝕀 𝒙𝒌 ∈ 𝒄𝒋−𝟏 , 𝒄𝒋 and 𝔼 𝒇 𝒙 = = =
𝒏∆𝒋 𝒏∆𝒋 ∆𝒋

Var 𝒏𝒋 𝒏ℙ 𝒙𝒌 ∈ 𝒄𝒋−𝟏 , 𝒄𝒋 𝟏 − ℙ 𝒙𝒌 ∈ 𝒄𝒋−𝟏 , 𝒄𝒋


Var 𝒇 𝒙 = =
𝒏𝟐 ∆𝟐𝒋 𝒏𝟐 ∆𝟐𝒋
𝑭𝑿 𝒄𝒋 − 𝑭𝑿 𝒄𝒋−𝟏 𝟏 − 𝑭𝑿 𝒄𝒋 − 𝑭𝑿 𝒄𝒋−𝟏
=
𝒏∆𝟐𝒋

For sufficiently small ∆𝒋 , 𝑭𝑿 𝒄𝒋 − 𝑭𝑿 𝒄𝒋−𝟏 ≈𝒇𝑿 𝒄𝒋 ∆𝒋 .

26
𝟏 𝒏
The empirical CDF (or cumulative histogram) is 𝑭 𝒙 = 𝒌=𝟏 𝕀 𝒙𝒌 ≤ 𝒙 . The sum
𝒏
𝒏
𝒌=𝟏 𝕀 𝒙𝒌 ≤ 𝒙 is equal to the number of observations less than or equal to 𝒙.

𝒏 𝒏 𝒏
𝟏 𝟏 𝟏
𝔼𝑭 𝒙 = 𝔼 𝕀 𝒙𝒌 ≤ 𝒙 = ℙ 𝒙𝒌 ≤ 𝒙 = 𝑭𝑿 𝒙 = 𝑭𝑿 𝒙
𝒏 𝒏 𝒏
𝒌=𝟏 𝒌=𝟏 𝒌=𝟏
𝒏 𝒏
𝟏 𝟏 𝟐
Var 𝑭 𝒙 = 𝟐 Var 𝕀 𝒙𝒌 ≤ 𝒙 = 𝟐 ℙ 𝒙𝒌 ≤ 𝒙 − ℙ 𝒙𝒌 ≤ 𝒙
𝒏 𝒏
𝒌=𝟏 𝒌=𝟏
𝟐
𝑭𝑿 𝒙 − 𝑭𝑿 𝒙 𝟏
= ≤
𝒏 𝟒𝒏

27
Rice and Rayleigh Distributions
Consider random variables

𝑿𝟏 = 𝑨 𝐜𝐨𝐬 𝜽 + 𝒘𝟏 , 𝑿𝟐 = 𝑨 𝐬𝐢𝐧 𝜽 + 𝒘𝟐 ,

where 𝑨 and 𝜽 are constants, 𝒘𝟏 and 𝒘𝟐 are i.i.d. zero-mean Gaussian random variables, each with
variance 𝝇𝟐 . Consequently, 𝑿𝟏 ~𝓝 𝑨 𝐜𝐨𝐬 𝜽, 𝝇𝟐 and 𝑿𝟐 ~𝓝 𝑨 𝐬𝐢𝐧 𝜽, 𝝇𝟐 . Random variables 𝑿𝟏 and 𝑿𝟐
are uncorrelated and, hence, independent. Their joint PDF is given by:

𝟏 𝒙𝟏 − 𝑨 𝐜𝐨𝐬 𝜽 𝟐 + 𝒙𝟐 − 𝑨 𝐬𝐢𝐧 𝜽 𝟐
𝒇𝑿𝟏,𝑿𝟐 𝒙𝟏 , 𝒙𝟐 = exp −
𝟐𝝅𝝇𝟐 𝟐𝝇𝟐

We now do the transformation 𝑿𝟏 = 𝑹 𝐜𝐨𝐬 𝚽, 𝑿𝟐 = 𝑹 𝐬𝐢𝐧 𝚽 , 𝑹 ∈ 𝟎, ∞ , 𝚽 ∈ 𝟎, 𝟐𝛑 .

𝝏 𝒙𝟏 , 𝒙𝟐
𝒇𝑹,𝚽 𝒓, 𝝋 = 𝒇𝑿𝟏,𝑿𝟐 𝒙𝟏 , 𝒙𝟐
𝝏 𝒓, 𝝋

𝝏 𝒙𝟏 ,𝒙𝟐
where is the absolute value of the determinant of the Jacobian matrix:
𝝏 𝒓,𝝋

𝝏𝒙𝟏 𝝏𝒙𝟏
𝝏𝒓 𝝏𝝋 𝐜𝐨𝐬 𝝋 −𝒓 𝐬𝐢𝐧 𝝋
𝝏𝒙𝟐 𝝏𝒙𝟐
=
𝐬𝐢𝐧 𝝋 𝒓 𝐜𝐨𝐬 𝝋
,
𝝏𝒓 𝝏𝝋

𝝏 𝒙𝟏 , 𝒙𝟐
= 𝒓cos𝟐 𝝋 + 𝒓sin𝟐 𝝋 = 𝒓
𝝏 𝒓, 𝝋

𝒓 𝒙𝟐𝟏 + 𝒙𝟐𝟐 + 𝑨𝟐 − 𝟐𝑨𝒙𝟏 𝐜𝐨𝐬 𝜽 − 𝟐𝑨𝒙𝟐 𝐬𝐢𝐧 𝜽


𝒇𝑹,𝚽 𝒓, 𝝋 = exp −
𝟐𝝅𝝇𝟐 𝟐𝝇𝟐
𝒓 𝒓𝟐 + 𝑨𝟐 − 𝟐𝑨𝒓 𝐜𝐨𝐬 𝝋 𝐜𝐨𝐬 𝜽 − 𝟐𝑨𝒓 𝐬𝐢𝐧 𝝋 𝐬𝐢𝐧 𝜽
= exp −
𝟐𝝅𝝇𝟐 𝟐𝝇𝟐
𝒓 𝒓𝟐 + 𝑨𝟐 − 𝟐𝑨𝒓cos 𝝋 − 𝜽
= exp −
𝟐𝝅𝝇𝟐 𝟐𝝇𝟐

The marginal PDF of 𝑹 = 𝑿𝟐𝟏 + 𝑿𝟐𝟐 is the Rice distribution:

𝟐𝝅 𝟐𝝅
𝒓 𝒓𝟐 + 𝑨𝟐 𝟐𝑨𝒓cos 𝝋 − 𝜽
𝒇𝑹 𝒓 = 𝒇𝑹,𝚽 𝒓, 𝝋 𝒅𝝋 = exp − exp 𝒅𝝋
𝟐𝝅𝝇𝟐 𝟐𝝇𝟐 𝟐𝝇𝟐
𝟎 𝟎

Since the modified Bessel function of the first kind and zeroth order
𝟏 𝟐𝝅
𝑰𝟎 𝒛 = 𝟐𝝅 ∫𝟎 exp 𝒛cos 𝝋 − 𝜽 𝒅𝝋 (for any 𝜽 ∈ ℝ), for 𝒓 ≥ 𝟎,

28
𝒓 𝒓𝟐 + 𝑨𝟐 𝑨𝒓
𝒇𝑹 𝒓 = 𝟐
exp − 𝑰𝟎
𝝇 𝟐𝝇𝟐 𝝇𝟐

To explicitly show the nonnegativity of 𝒓, we may write the Rice distribution as:

𝒓 𝒓𝟐 + 𝑨𝟐 𝑨𝒓
𝒇𝑹 𝒓 = 𝟐
exp − 𝑰𝟎 𝕝 𝒓≥𝟎
𝝇 𝟐𝝇𝟐 𝝇𝟐

If 𝑨 = 𝟎, then 𝑿𝟏 and 𝑿𝟐 have a mean of zero. In this case 𝑹 = 𝑿𝟐𝟏 + 𝑿𝟐𝟐 has Rayleigh distribution.
Exploiting the fact that 𝑰𝟎 𝟎 = 𝟏, the Rayleigh distribution is given by:

𝒓 𝒓𝟐
𝒇𝑹 𝒓 = 𝟐
exp − 𝕝 𝒓≥𝟎
𝝇 𝟐𝝇𝟐

When 𝑨 = 𝟎, if 𝑷 = 𝑹𝟐 ,
𝟐
𝒅𝒓 𝒑 𝒑 𝟏 𝟏 𝒑
𝒇𝑷 𝒑 = 𝒇𝑹 𝒓 = exp − 𝟐𝝇𝟐 𝕝 𝒑 ≥ 𝟎 = 𝟐𝝇𝟐 exp − 𝟐𝝇𝟐 𝕝 𝒑 ≥ 𝟎 .
𝒅𝒑 𝝇𝟐 𝟐 𝒑

That is, 𝑷 is exponentially distributed and has a mean value of 𝟐𝝇𝟐 .

When 𝑨 > 0,
𝟐
𝒅𝒓 𝒑 𝒑 +𝑨𝟐 𝑨 𝒑 𝟏
𝒇𝑷 𝒑 = 𝒇𝑹 𝒓 = exp − 𝑰𝟎 𝕝 𝒑≥𝟎 =
𝒅𝒑 𝝇𝟐 𝟐𝝇𝟐 𝝇𝟐 𝟐 𝒑

𝟏 𝒑+𝑨𝟐 𝑨𝟐 𝒑
exp − 𝑰𝟎 𝕝 𝒑≥𝟎 .
𝟐𝝇𝟐 𝟐𝝇𝟐 𝝇𝟒

That is, 𝑷 is proportional to a noncentral chi-squared distribution with two degrees of freedom and
𝑨𝟐
noncentrality parameter .
𝝇𝟐

𝒓 𝒓𝟐
Note that when 𝑨 = 𝟎, 𝒇𝑹,𝚽 𝒓, 𝝋 = exp − 𝟐𝝇𝟐 𝕝 𝒓 ≥ 𝟎
𝟐𝝅𝝇𝟐

𝟏
Integrating with respect to , 𝒇𝚽 𝝋 = , i.e., 𝝋 is uniformly distributed.
𝟐𝝅

29
Arithmetic-Mean Geometric Mean (AM-GM) Inequality
𝒏
Consider positive real numbers 𝒙𝒌 𝒌=𝟏 .

𝟏 𝒏 𝟏
Their arithmetic mean is
𝒏 𝒌=𝟏 𝒙𝒌 = 𝒏 𝒙𝟏 + 𝒙𝟐 + ⋯ + 𝒙𝒏 .

𝒏 𝟏 𝒏 𝟏 𝒏
Their geometric mean is 𝒌=𝟏 𝒙𝒌 = 𝒙𝟏 𝒙𝟐 … 𝒙𝒏 .
𝟏 𝒏 𝒏 𝟏 𝒏
The AM-GM inequality states that:
𝒏 𝒌=𝟏 𝒙𝒌 ≥ 𝒌=𝟏 𝒙𝒌 with equality if and only if all the
numbers are equal.

Proof.

We can make use of the inequality exp 𝒛 ≥ 𝟏 + 𝒛, with equality if and only if 𝒛 = 𝟎, which is
equivalent to exp 𝒛 − 𝟏 ≥ 𝒛 with equality if and only if 𝒛 = 𝟏.

𝒙𝒌 𝟏 𝒏
Consider 𝒚𝒌 = where 𝜼 is the arithmetic mean, i.e., 𝜼 = 𝒌=𝟏 𝒙𝒌 .
𝜼 𝒏

𝒏 𝟏 𝒏 𝒏 𝟏 𝒏 𝟏 𝒏
Since 𝒛 ≤ exp 𝒛 − 𝟏 , 𝒌=𝟏 𝒚𝒌 ≤ 𝒌=𝟏 exp 𝒚𝒌 − 𝟏 = exp 𝒌=𝟏 𝒚𝒌 −𝟏 .
𝒏

𝟏 𝒏 𝟏 𝒏 𝒙𝒌 𝜼
𝒏 𝒌=𝟏 𝒚𝒌 =𝒏 𝒌=𝟏 𝜼 = 𝜼
= 𝟏.

𝒏 𝟏 𝒏 𝒏 𝟏 𝒏
𝟏
𝒚𝒌 = 𝒙𝒌
𝜼
𝒌=𝟏 𝒌=𝟏

𝟏 𝒚𝒊𝒆𝒍𝒅𝒔 𝟏
𝒏 𝟏 𝒏 𝒏 𝟏 𝒏 𝒏
Therefore,
𝜼 𝒌=𝟏 𝒙𝒌 ≤ exp 𝟏 − 𝟏 = 𝟏 𝒌=𝟏 𝒙𝒌 ≤𝜼=𝒏 𝒌=𝟏 𝒙𝒌 .

Note that exp 𝒛 − 𝟏 = 𝒛 iff 𝒛 = 𝟏. This means that the AM-GM inequality is satisfied with
equality iff ∀𝒌 𝒚𝒌 = 𝟏. This is equivalent to: ∀𝒌 𝒙𝒌 = 𝜼.

30

You might also like