Selective Review - Probability
Selective Review - Probability
Events 𝑬𝟏 , 𝑬𝟐 , … , 𝑬𝒌 are said to form a partition of sample space 𝛀 if the events are mutually exclusive
and 𝛀 = 𝑬𝟏 ∪ 𝑬𝟐 ∪ … ∪ 𝑬𝒌 . For a partition, ℙ 𝑬𝟏 + ℙ 𝑬𝟐 + ⋯ + ℙ 𝑬𝒌 = 1. The law of total
probability indicates that for any event 𝑨:
ℙ 𝑨 = ℙ 𝑨 ∩ 𝑬𝟏 + ℙ 𝑨 ∩ 𝑬𝟐 + ⋯ + ℙ 𝑨 ∩ 𝑬𝒌 = ℙ 𝑨 ∩ 𝑬𝒎
𝒎=𝟏
ℙ 𝑨∩𝑬𝒎
ℙ 𝑨|𝑬𝒎 = , the law of total probability can be written as:
ℙ 𝑬𝒎
𝒚𝒊𝒆𝒍𝒅𝒔 ℙ 𝑨|𝑬𝒎 ℙ 𝑬𝒎
Note that ℙ 𝑨 ∩ 𝑬𝒎 = ℙ 𝑨|𝑬𝒎 ℙ 𝑬𝒎 = ℙ 𝑬𝒎 |𝑨 ℙ 𝑨 ℙ 𝑬𝒎 |𝑨 = ℙ 𝑨
Finally, recall that events 𝑨 and 𝑩 are independent if and only if ℙ 𝑨 ∩ 𝑩 = ℙ 𝑨 ℙ 𝑩 . This
implies that ℙ 𝑨|𝑩 = ℙ 𝑨 .
1
In the context of random variables, considering random variables 𝑿 and 𝒀 with joint PDF 𝒇𝑿,𝒀 𝒙, 𝒚 or
PMF 𝒑𝑿,𝒀 𝒙, 𝒚 , the law of total probability takes the forms:
∞ ∞
2
Union Bound
𝑵 𝑵
The probability ℙ 𝒏=𝟏 𝑨𝒏 is upperbounded by 𝒏=𝟏 ℙ 𝑨𝒏 .
ℙ 𝑨 ∪ 𝑩 = ℙ 𝑨 + ℙ 𝑩 ∩ 𝑨𝒄 and ℙ 𝑩 = ℙ 𝑩 ∩ 𝑨 + ℙ 𝑩 ∩ 𝑨𝒄 . Hence,
ℙ 𝑨∪𝑩 =ℙ 𝑨 +ℙ 𝑩 −ℙ 𝑩∩𝑨 .
Since ℙ 𝑩 ∩ 𝑨 ≥ 𝟎, ℙ 𝑨 ∪ 𝑩 ≤ ℙ 𝑨 + ℙ 𝑩 .
𝑵 𝑵
We can generalize this result to ℙ 𝒏=𝟏 𝑨𝒏 ≤ 𝒏=𝟏 ℙ 𝑨𝒏 using induction.
𝑵−𝟏 𝑵−𝟏
Assume that ℙ 𝒏=𝟏 𝑨𝒏 ≤ 𝒏=𝟏 ℙ 𝑨𝒏 .
𝑵 𝑵−𝟏
ℙ 𝒏=𝟏 𝑨𝒏 =ℙ 𝒏=𝟏 𝑨𝒏 ∪ 𝑨𝑵 . Using the case of two events,
𝑵−𝟏 𝑵−𝟏 𝑵−𝟏 𝑵−𝟏
ℙ 𝒏=𝟏 𝑨𝒏 ∪ 𝑨𝑵 ≤ ℙ 𝒏=𝟏 𝑨𝒏 + ℙ 𝑨𝑵 . But ℙ 𝒏=𝟏 𝑨𝒏 ≤ 𝒏=𝟏 ℙ 𝑨𝒏 .
Hence,
𝑵−𝟏 𝑵 𝑵−𝟏 𝑵
ℙ 𝑨𝒏 ∪ 𝑨𝑵 = ℙ 𝑨𝒏 ≤ ℙ 𝑨𝒏 + ℙ 𝑨𝑵 = ℙ 𝑨𝒏
𝒏=𝟏 𝒏=𝟏 𝒏=𝟏 𝒏=𝟏
3
Convex and Concave Functions
A set 𝑺 ⊆ ℝ𝑵 is convex if for every 𝒙, 𝒚 ∈ 𝑺, 𝝀𝒙 + 𝟏 − 𝝀 𝒚 ∈ 𝑺 for every 𝝀 ∈ 𝟎, 𝟏 .
Let 𝒈 𝒙 be a function defined over the domain 𝓓 ⊆ ℝ𝑵 . Function 𝒈 𝒙 : 𝓓 → ℝ is convex if its domain,
𝓓, is a convex set and it satisfies:
𝒅𝟐 𝒈 𝒙 𝒅𝟐 𝒈 𝒙
If 𝑵 = 𝟏 and 𝒈 𝒙 is twice differentiable, it is convex when ≥ 𝟎 and concave when ≤𝟎
𝒅𝒙𝟐 𝒅𝒙𝟐
over 𝓓. For general 𝑵, the function is convex if the Hessian matrix is positive semidefinite and concave if
𝒙𝟏
𝒙𝟐
the Hessian matrix is negative semidefinite. Recall that if 𝒙 = ⋮ , the Hessian matrix is given by:
𝒙𝑵
𝝏𝟐 𝒈 𝝏𝟐 𝒈 𝝏𝟐 𝒈 𝝏𝟐 𝒈
𝝏𝒙𝟐𝟏 𝝏𝒙𝟏 𝝏𝒙𝟐 𝝏𝒙𝟏 𝝏𝒙𝑵−𝟏 𝝏𝒙𝟏 𝝏𝒙𝑵
⋯
𝝏𝟐 𝒈 𝝏𝟐 𝒈 𝝏𝟐 𝒈 𝝏𝟐 𝒈
𝝏𝒙𝟐 𝝏𝒙𝟏 𝝏𝒙𝟐𝟐 𝝏𝒙𝟐 𝝏𝒙𝑵−𝟏 𝝏𝒙𝟐 𝝏𝒙𝑵
⋮ ⋱ ⋮
𝟐
𝝏 𝒈 𝟐
𝝏𝟐 𝒈 𝝏 𝒈 𝝏𝟐 𝒈
𝝏𝒙𝑵−𝟏 𝝏𝒙𝟏 𝝏𝒙𝑵−𝟏 𝝏𝒙𝟐 𝝏𝒙𝟐𝑵−𝟏 𝝏𝒙𝑵−𝟏 𝝏𝒙𝑵
⋯
𝝏𝟐 𝒈 𝝏𝟐 𝒈 𝝏𝟐 𝒈 𝝏𝟐 𝒈
𝝏𝒙𝑵 𝝏𝒙𝟏 𝝏𝒙𝑵 𝝏𝒙𝟐 𝝏𝒙𝑵 𝝏𝒙𝑵−𝟏 𝝏𝒙𝟐𝑵
Examples:
𝒈 𝒙 = 𝒙𝟐 is convex over ℝ.
𝒈 𝒙 = ln 𝒙 is concave over ℝ+ .
𝟏
𝒈 𝒙 = is convex over ℝ+.
𝒙
4
Important Inequality
Examples:
𝒒
𝔼 𝑿𝒒 ≤𝔼 𝑿 𝒒
because 𝒏 𝒙𝒏 ℙ 𝑿 = 𝒙𝒏 ≤ 𝒏 𝒙𝒏 𝒒 ℙ 𝑿 = 𝒙𝒏 or
∫ 𝒙𝒒 𝒇𝑿 𝒙 𝒅𝒙 ≤ ∫ 𝒙 𝒒 𝒇𝑿 𝒙 𝒅𝒙.
∞
If 𝝓𝑿 𝒖 = ∫−∞ 𝒆𝒊𝒖𝒙 𝒇𝑿 𝒙 𝒅𝒙, 𝒖 ∈ ℝ, then
∞ ∞
𝝓𝑿 𝒖 ≤ ∫−∞ 𝒆𝒊𝒖𝒙 𝒇𝑿 𝒙 𝒅𝒙 = ∫−∞ 𝒇𝑿 𝒙 𝒅𝒙 = 𝟏.
That is, 𝝓𝑿 𝒖 ≤ 𝟏.
5
Existence of Moments
∞
The 𝒌th moment of a random variable, 𝔼 𝑿𝒌 = 𝒙𝒙
𝒌
𝒑𝑿 𝒙 or 𝔼 𝑿𝒌 = ∫−∞ 𝒙𝒌 𝒇𝑿 𝒙 𝒅𝒙,
𝒌
exists if 𝔼 𝑿 is finite. Hence, for the mean 𝔼 𝑿 to exist 𝔼 𝑿 < ∞.
Theorem. If the 𝒌th moment exists and if 𝒋 < 𝑘, then the 𝒋th moment exists.
Proof.
∞
𝒋
𝔼 𝑿 = 𝒙 𝒋 𝒇𝑿 𝒙 𝒅𝒙 = 𝒙 𝒋 𝒇𝑿 𝒙 𝒅𝒙 + 𝒙 𝒋 𝒇𝑿 𝒙 𝒅𝒙
−∞ 𝒙 ≤𝟏 𝒙 >1
𝒚𝒊𝒆𝒍𝒅𝒔 ∞
If 𝒙 ≤ 𝟏, 𝒙 𝒋 ≤ 𝟏 ∫𝒙 ≤𝟏 𝒙 𝒋 𝒇𝑿 𝒙 𝒅𝒙 ≤ ∫𝒙 ≤𝟏 𝒇𝑿 𝒙 𝒅𝒙 ≤ ∫−∞ 𝒇𝑿 𝒙 𝒅𝒙 = 𝟏.
𝒚𝒊𝒆𝒍𝒅𝒔
If 𝒙 > 1, 𝑗 < 𝑘, then 𝒙 𝒋 < 𝒙 𝒌
∞
∫𝒙 >1 𝒙 𝒋 𝒇𝑿 𝒙 𝒅𝒙 ≤ ∫𝒙 >1 𝒙 𝒌 𝒇𝑿 𝒙 𝒅𝒙 ≤ ∫−∞ 𝒙 𝒌 𝒇𝑿 𝒙 𝒅𝒙
Therefore,
𝒋 𝒌
𝔼 𝑿 ≤𝟏+𝔼 𝑿
𝒌 𝒋
If 𝔼 𝑿 is finite, then 𝔼 𝑿 is also finite for 𝒋 < 𝑘.
6
Cauchy-Schwarz Inequality in the Context of Random Variables
Consider random variables 𝑿 and 𝒀, which may be complex-valued. Consider the random variable
𝜶𝑿 − 𝒀∗ , where * denotes complex conjugation and 𝜶 ∈ ℂ.
𝒚𝒊𝒆𝒍𝒅𝒔
𝔼 𝜶𝑿 − 𝒀∗ 𝟐
≥𝟎 𝜶 𝟐𝔼 𝑿 𝟐
+𝔼 𝒀 𝟐
− 𝜶𝔼 𝑿𝒀 − 𝜶∗ 𝔼 𝑿∗ 𝒀∗ ≥ 𝟎
𝔼 𝑿∗ 𝒀∗
This inequality is valid for any 𝜶. Set 𝜶 = assuming 𝔼 𝑿 𝟐 ≠ 𝟎. Hence,
𝔼 𝑿𝟐
𝟐
𝔼 𝑿𝒀 𝟐 𝟐
𝔼 𝑿𝒀 𝟐 𝔼 𝑿𝒀 𝟐 𝒚𝒊𝒆𝒍𝒅𝒔
𝟐
𝔼 𝑿𝒀 𝟐
𝔼 𝑿 +𝔼 𝒀 − − ≥𝟎 𝔼 𝒀 ≥
𝔼 𝑿𝟐 𝟐 𝔼 𝑿𝟐 𝔼 𝑿𝟐 𝔼 𝑿𝟐
𝟐 𝟐 𝟐 𝟐 𝟐
Thus, 𝔼 𝑿𝒀 ≤ 𝔼 𝑿 𝔼 𝒀 or 𝔼 𝑿𝒀 ≤ 𝔼 𝑿 𝔼 𝒀 . This means that when 𝑿 and 𝒀
are real-valued, 𝔼 𝑿𝒀 ≤ 𝔼 𝑿𝒀 ≤ 𝔼 𝑿 𝟐 𝔼 𝒀 𝟐 . If we replace 𝑿 by 𝑿 − 𝔼 𝑿 and 𝒀 by
An important point is that Cauchy-Schwarz inequality is achieved with equality if 𝑿 = 𝜷𝒀∗ , where
𝜷 ∈ ℂ. If 𝑿 = 𝜷𝒀∗ , 𝔼 𝑿𝒀 𝟐
= 𝜷 𝟐
𝔼 𝒀 𝟐 𝟐
=𝔼 𝑿 𝟐
𝔼 𝒀 𝟐
.
7
Theorem Regarding CDF
If the 𝒏th moment of random variable 𝑿 exists, then 𝐥𝐢𝐦𝒙→∞ 𝒙𝒏 𝟏 − 𝑭𝑿 𝒙 = 𝟎.
∞
Proof. 𝔼 𝑿𝒏 = ∫−∞ 𝒙𝒏 𝒇𝑿 𝒙 𝒅𝒙
𝜶 ∞
Consider positive 𝜶. 𝔼 𝑿𝒏 = ∫−∞ 𝒙𝒏 𝒇𝑿 𝒙 𝒅𝒙 + ∫𝜶 𝒙𝒏 𝒇𝑿 𝒙 𝒅𝒙
𝜶 ∞ 𝜶 ∞
𝒏 𝒏 𝒏 𝒏 𝒏
𝔼𝑿 ≥ 𝒙 𝒇𝑿 𝒙 𝒅𝒙 + 𝜶 𝒇𝑿 𝒙 𝒅𝒙 = 𝒙 𝒇𝑿 𝒙 𝒅𝒙 + 𝜶 𝒇𝑿 𝒙 𝒅𝒙
−∞ 𝜶 −∞ 𝜶
𝔼 𝑿𝒏 ≥ 𝒙𝒏 𝒇𝑿 𝒙 𝒅𝒙 + 𝜶𝒏 𝟏 − 𝑭𝑿 𝜶
−∞
𝜶
𝒏 𝒏
𝜶 𝟏 − 𝑭𝑿 𝜶 ≤ 𝔼𝑿 − 𝒙𝒏 𝒇𝑿 𝒙 𝒅𝒙
−∞
8
Mean of Nonnegative RV in terms of CDF & Markov's Inequality
∞
Consider nonnegative RV 𝑿 and suppose that 𝔼 𝑿 exists. 𝔼 𝑿 = ∫𝟎 𝟏 − 𝑭𝑿 𝒙 𝒅𝒙, where
𝟏 − 𝑭𝑿 𝒙 = ℙ 𝑿 > 𝑥 is sometimes called the complementary CDF or tail distribution.
Proof.
∞ ∞ ∞ ∞
𝔼 𝑿 = ∫𝟎 𝒙𝒇𝑿 𝒙 𝒅𝒙 = − ∫𝟎 𝒙𝒅 𝟏 − 𝑭𝑿 𝒙 = −𝒙 𝟏 − 𝑭𝑿 𝒙 𝟎 + ∫𝟎 𝟏 − 𝑭𝑿 𝒙 𝒅𝒙.
∞
If 𝔼 𝑿 is finite, then 𝐥𝐢𝐦𝒙→∞ 𝒙 𝟏 − 𝑭𝑿 𝒙 = 𝟎. Hence, 𝔼 𝑿 = ∫𝟎 𝟏 − 𝑭𝑿 𝒙 𝒅𝒙.
∞ 𝒂
𝟏 − 𝑭𝑿 𝒙 is a nonnegative function and, hence, ∫𝟎 𝟏 − 𝑭𝑿 𝒙 𝒅𝒙 ≥ ∫𝟎 𝟏 − 𝑭𝑿 𝒙 𝒅𝒙, where 𝒂 is a
positive real constant. This means that
𝒂
𝔼𝑿 ≥ 𝟏 − 𝑭𝑿 𝒙 𝒅𝒙
𝟎
𝔼𝑿 ≥ 𝟏 − 𝑭𝑿 𝒙 𝒅𝒙 ≥ 𝟏 − 𝑭𝑿 𝒂 𝒅𝒙 = 𝒂 𝟏 − 𝑭𝑿 𝒂 = 𝒂ℙ 𝑿 > 𝑎
𝟎 𝟎
Actually, it can be shown that for 𝒂 > 0 and a nonnegative continuous or discrete RV 𝑿 whose first
moment exists, we have the following inequality:
𝔼𝑿
ℙ 𝑿≥𝒂 ≤
𝒂
This is typically referred to as Markov's inequality.
9
Important Series
∞ 𝟏 ∞ 𝟏 𝝅𝟐
Consider the series 𝒏=𝟏 𝒏𝜸 . It converges when 𝜸 > 1. For example, 𝒏=𝟏 𝒏𝟐 = .
𝟔
∞ 𝟏 ∞ 𝟏 𝟏
One way to show this is to upperbound 𝒏=𝟏 𝒏𝜸 by 𝟏 + ∫𝟏 𝒙𝜸 𝒅𝒙 =𝟏+
𝜸−𝟏
< ∞ when
𝜸 > 1.
𝟏 𝟏 𝟏 𝟏
When 𝜸 = 𝟏, we have the harmonic series, i.e., + + + + ⋯, which can be shown to diverge.
𝟏 𝟐 𝟑 𝟒
∞ 𝟏 ∞ 𝟏 𝟏
Note that 𝒏=𝟏 𝒏𝜸 can be lowerbounded by ∫𝟏 𝒙𝜸
𝒅𝒙 = . Hence,
𝜸−𝟏
∞
𝟏 𝟏 𝟏
< < 𝟏 +
𝜸−𝟏 𝒏𝜸 𝜸−𝟏
𝒏=𝟏
10
Taylor Series
Consider function 𝒈 𝒙 : ℝ → ℝ. Its Taylor series expansion about a point 𝒙𝟎 is given by:
𝒌 𝒙
∞ 𝒈 𝒅𝒈(𝒙)
𝒈 𝒙 = 𝒌=𝟎
𝟎
𝒙 − 𝒙𝟎 𝒌 , where 𝒈 𝟎 𝒙 = 𝒈 𝒙 , 𝒈 𝟏 𝒙 = ,𝒈 𝟐 𝒙 =
𝒌! 𝒅𝒙
𝒅𝟐 𝒈(𝒙)
, and so on. Using 𝒏 + 𝟏 terms and a remainder term, we can write:
𝒅𝒙𝟐
𝒏
𝒈 𝒌 𝒙𝟎 𝒌
𝒈 𝒏+𝟏 𝜻 𝒏+𝟏
𝒈 𝒙 = 𝒙 − 𝒙𝟎 + 𝒙 − 𝒙𝟎
𝒌! 𝒏+𝟏 !
𝒌=𝟎
𝟏 𝒆𝜻
𝒆𝒙 = 𝒏 𝒌
𝒌=𝟎 𝒌! 𝒙 + 𝒏+𝟏 ! 𝒙
𝒏+𝟏 , 𝜻 ∈ 𝟎, 𝒙 when 𝒙 > 0 and 𝜻 ∈ 𝒙, 𝟎 when 𝒙 < 0.
𝟏
𝒆𝒙 ≥ 𝒏 𝒌 𝒙
𝒌=𝟎 𝒌! 𝒙 for all 𝒙. When 𝒏 = 𝟏, 𝒆 ≥ 𝟏 + 𝒙 for all 𝒙. This means that 𝒆
−𝒙
≥ 𝟏 − 𝒙 for all 𝒙.
𝒏 𝒙
𝒈 𝒌 𝒙𝟎 𝟏
𝒈 𝒙 = 𝒙 − 𝒙𝟎 𝒌 + 𝒙 − 𝒕 𝒏 𝒈 𝒏+𝟏 𝒕 𝒅𝒕
𝒌! 𝒏!
𝒌=𝟎 𝒙𝟎
In the multidimensional case when 𝒈 𝒙 : ℝ𝑵 → ℝ, the Taylor series expansion about point 𝒚 is given
by:
𝟏
𝒈 𝒙 = 𝒈 𝒚 + 𝛁T𝒈 𝒚 𝒙 − 𝒚 + 𝒙 − 𝒚 T𝓗 𝒚 𝒙 − 𝒚 + ⋯
𝟐
𝝏𝒈 𝝏𝒈 𝝏𝒈
where 𝛁 T 𝒈 𝒚 = … and 𝓗 𝒚 is the Hessian matrix evaluated at point 𝒚.
𝝏𝒙𝟏 𝒚 𝝏𝒙𝟐 𝒚 𝝏𝒙𝑵 𝒚
11
That is,
𝑵 𝑵 𝑵
𝝏𝒈 𝟏 𝝏𝟐 𝒈
𝒈 𝒙 =𝒈 𝒚 + 𝒙 𝒏 − 𝒚𝒏 + 𝒙 𝒏 − 𝒚𝒏 𝒙 𝒎 − 𝒚 𝒎 + ⋯
𝝏𝒙𝒏 𝒚
𝟐 𝝏𝒙𝒏 𝝏𝒙𝒎 𝒚
𝒏=𝟏 𝒏=𝟏 𝒎=𝟏
12
Leibniz's Rule for Differentiation under the Integral Sign
Let 𝒇(𝒙, 𝒕) be a function such that the partial derivative of 𝒇 with respect to 𝒕 exists, and is continuous.
Then,
𝜷 𝒕 𝜷 𝒕
𝒅 𝝏𝒇 𝒅𝜷 𝒅𝜶
𝒇 𝒙, 𝒕 𝒅𝒙 = 𝒅𝒙 + 𝒇 𝜷 𝒕 ,𝒕 − 𝒇 𝜶 𝒕 ,𝒕
𝒅𝒕 𝝏𝒕 𝒅𝒕 𝒅𝒕
𝜶 𝒕 𝜶 𝒕
Example:
𝒕 𝒕
𝒅 𝐥𝐧 𝟏 + 𝒕𝒙 𝒙 𝐥𝐧 𝟏 + 𝒕𝟐
𝒅𝒙 = 𝒅𝒙 +
𝒅𝒕 𝟏 + 𝒙𝟐 𝟏 + 𝒙𝟐 𝟏 + 𝒕𝒙 𝟏 + 𝒕𝟐
𝟎 𝟎
Another example:
∞
𝒅 𝒙𝟐 𝒕𝟐
− −
𝒆 𝟐 𝒅𝒙 = −𝒆 𝟐
𝒅𝒕
𝒕
13
Q-Function
∞
𝟏 𝒗𝟐
−𝟐
𝑸 𝒙 ≜ 𝒆 𝒅𝒗
𝟐𝝅
𝒙
𝑸 −𝒙 = 𝟏 − 𝑸 𝒙
𝒙−𝝁
If 𝑿~𝓝 𝝁, 𝝇𝟐 , ℙ 𝑿 > 𝑥 = 𝑸 .
𝝇
𝒙𝟐
𝟏 −
If 𝒙 ≥ 𝟎, 𝑸 𝒙 ≤ 𝒆 𝟐
𝟐
𝒙𝟐 𝒙𝟐
𝟏 𝟏 − 𝟏 −
If 𝒙 ≥ 𝟎, 𝟏− 𝒆 𝟐 ≤𝑸 𝒙 ≤ 𝒆 𝟐
𝟐𝝅 𝒙 𝒙𝟐 𝟐𝝅 𝒙
14
Chi-Squared Distribution
The chi-squared distribution with 𝒌 degrees of freedom, 𝝌𝟐 𝒌 , is the PDF of a sum of the squares of 𝒌
independent real-valued zero-mean unit-variance Gaussian random variables.
𝒌 𝒙
𝟏 −𝟏 −𝟐
If 𝑿~𝝌 𝒌 ,𝟐
𝒇𝑿 𝒙 = 𝐤 𝒌
𝒙 𝟐 𝒆 𝕀 𝒙≥𝟎
𝟐 𝟐𝚪
𝟐
𝑿𝟏 𝒌𝟏 𝒌𝟐
If 𝑿𝟏 ~𝝌𝟐 𝒌𝟏 and 𝑿𝟐 ~𝝌𝟐 𝒌𝟐 are independent, ~Beta , (beta distribution)
𝑿𝟏 +𝑿𝟐 𝟐 𝟐
𝟏
If 𝑿~ 𝝌𝟐 𝒌 ,
𝟐
𝒇𝑿 𝒙 = 𝒆−𝒙 𝕀 𝒙 ≥ 𝟎 , i.e., 𝑿 is a unit-mean exponential random variable.
15
PDF of the Sum of Two Random Variables
Consider random variables 𝑿 and 𝒀 with joint PDF 𝒇𝑿,𝒀 𝒙, 𝒚 . We are interested in the PDF of
𝒁 = 𝑿 + 𝒀.
∞ 𝒛−𝒙
𝑭𝒁 𝒛 = ℙ 𝒁 ≤ 𝒛 = 𝒇𝑿,𝒀 𝒙, 𝒚 𝒅𝒚𝒅𝒙
−∞ −∞
∞
𝒅𝑭𝒁 𝒛
𝒇𝒁 𝒛 = = 𝒇𝑿,𝒀 𝒙, 𝒛 − 𝒙 𝒅𝒙
𝒅𝒛
−∞
𝒇𝒁 𝒛 = 𝒇𝑿 𝒙 𝒇𝒀 𝒛 − 𝒙 𝒅𝒙
−∞
This is, the PDF of the sum of two independent random variables 𝑿 and 𝒀 is the convolution of the PDF's
of these variables. The results here are also valid for the discrete case.
Example:
= 𝕀 𝒛 ≥ 𝟎 𝝁𝟏 𝝁𝟐 𝒆−𝝁𝟐 𝒛 𝒆𝒙 𝝁𝟐 −𝝁𝟏
𝒅𝒙
𝟎
𝝁 𝝁
If 𝝁𝟏 ≠ 𝝁𝟐 , 𝒇𝒁 𝒛 = 𝝁 𝟏−𝝁𝟐 𝒆−𝝁𝟏𝒛 − 𝒆−𝝁𝟐𝒛 𝕀 𝒛 ≥ 𝟎 .
𝟐 𝟏
If 𝝁𝟏 = 𝝁𝟐 = 𝝁, 𝒇𝒁 𝒛 = 𝝁𝟐 𝒛𝒆−𝝁𝒛 𝕀 𝒛 ≥ 𝟎 .
16
PDF of the Ratio of Two Random Variables
𝒀
Consider random variables 𝑿 and 𝒀 with joint PDF 𝒇𝑿,𝒀 𝒙, 𝒚 . We are interested in the PDF of 𝒁 = 𝑿
when ℙ 𝑿 = 𝟎 = 𝟎.
∞ 𝒙𝒛 𝟎 ∞
∞ 𝟎 ∞
𝒅𝑭𝒁 𝒛
𝒇𝒁 𝒛 = = 𝒙𝒇𝑿,𝒀 𝒙, 𝒛𝒙 𝒅𝒙 + −𝒙 𝒇𝑿,𝒀 𝒙, 𝒛𝒙 𝒅𝒙 = 𝒙 𝒇𝑿,𝒀 𝒙, 𝒛𝒙 𝒅𝒙
𝒅𝒛
𝟎 −∞ −∞
𝒇𝒁 𝒛 = 𝒙 𝒇𝑿 𝒙 𝒇𝒀 𝒛𝒙 𝒅𝒙
−∞
Example:
𝒙𝟐 𝒚𝟐
𝟏 − 𝟏 −
Let 𝑿 and 𝒀 be independent, 𝒇𝑿 𝒙 = 𝒆 𝟐 and 𝒇𝒀 𝒚 = 𝒆 𝟐.
𝟐𝝅 𝟐𝝅
𝒙𝟐 𝒙𝟐 𝒛𝟐 𝒙𝟐
∞ 𝟏 − 𝟏 − 𝟏 ∞ − 𝟏+𝒛𝟐 𝟏 𝟏
𝒇𝒁 𝒛 = ∫−∞ 𝒙 𝒆 𝟐 𝒆 𝟐 𝒅𝒙 = ∫ 𝒆 𝟐 𝒅𝒙𝟐 = (Standard
𝟐𝝅 𝟐𝝅 𝝅 𝟎 𝝅 𝟏+𝒛𝟐
Cauchy PDF)
17
PDF of the Maximum and Minimum of Independent Random
Variables
𝑵
Consider i.i.d. random variables 𝑿𝒏 𝒏=𝟏 , 𝒇𝑿𝒏 𝒙 = 𝒇 𝒙 and 𝑭𝑿𝒏 𝒙 = 𝑭 𝒙 .
𝑭𝒀 𝒚 = ℙ 𝒀 ≤ 𝒚 = ℙ 𝑿𝟏 ≤ 𝒚, 𝑿𝟐 ≤ 𝒚, … , 𝑿𝑵 ≤ 𝒚
𝑵 𝑵
𝑵
= ℙ 𝑿𝒏 ≤ 𝒚 = 𝑭𝑿𝒏 𝒚 = 𝑭 𝒚
𝒏=𝟏 𝒏=𝟏
𝒅𝑭𝒀 𝒚 𝑵−𝟏
𝒇𝒀 𝒚 = =𝑵𝑭 𝒚 𝒇 𝒚
𝒅𝒚
Similarly,
𝑵
𝑭𝒁 𝒛 = 𝟏 − 𝟏 − 𝑭 𝒛
𝒅𝑭𝒁 𝒛 𝑵−𝟏
𝒇𝒁 𝒛 = =𝑵 𝟏−𝑭 𝒛 𝒇 𝒛
𝒅𝒛
We can write down the PDF's directly. Assume that the random variables are independent, but not
necessarily identically distributed.
𝑵 𝑵
𝒇𝒀 𝒚 = 𝒇𝑿 𝒚
𝒏
𝑭𝑿𝒗 𝒚
𝒏=𝟏 𝒗=𝟏
𝒗≠𝒏
The interpretation of this expression is as follows. Any of the 𝑵 random variables can be the maximum.
If the maximum is 𝑿𝒏 , then all the other random variables are less than (or equal to) it. Similarly,
𝑵 𝑵
𝒇𝒁 𝒛 = 𝒇𝑿 𝒛
𝒏
𝟏 − 𝑭𝑿𝒗 𝒛
𝒏=𝟏 𝒗=𝟏
𝒗≠𝒏
18
Joint PDF of the Maximum and Minimum of i.i.d. Random Variables
𝑵
Consider i.i.d. random variables 𝑿𝒏 𝒏=𝟏 , 𝑵 ≥ 𝟐, 𝒇𝑿𝒏 𝒙 = 𝒇 𝒙 and 𝑭𝑿𝒏 𝒙 = 𝑭 𝒙 .
𝑭𝒀,𝒁 𝜶, 𝜷 = ℙ 𝒀 ≤ 𝜶, 𝒁 ≤ 𝜷 = ℙ 𝒀 ≤ 𝜶 − ℙ 𝒀 ≤ 𝜶, 𝒁 > 𝛽
𝑭𝒀,𝒁 𝜶, 𝜷 = ℙ 𝑿𝟏 ≤ 𝜶, 𝑿𝟐 ≤ 𝜶, … , 𝑿𝑵 ≤ 𝜶
− ℙ 𝜷 < 𝑋𝟏 ≤ 𝜶, 𝜷 < 𝑋𝟐 ≤ 𝜶, … , 𝜷 < 𝑿𝑵 ≤ 𝜶 𝕀 𝜶 ≥ 𝜷
Therefore,
𝑵 𝑵
𝑭𝒀,𝒁 𝜶, 𝜷 = 𝑭 𝜶 − 𝑭 𝜶 −𝑭 𝜷 𝕀 𝜶≥𝜷
𝝏𝟐 𝑭𝒀,𝒁 𝜶, 𝜷 𝑵−𝟐
𝒇𝒀,𝒁 𝜶, 𝜷 = =𝑵 𝑵−𝟏 𝒇 𝜶 𝒇 𝜷 𝑭 𝜶 −𝑭 𝜷 𝕀 𝜶≥𝜷
𝝏𝜶 𝝏𝜷
This expression can be interpreted as follows. If 𝜶 < 𝛽, then the PDF is zero as the maximum
cannot be less than the minimum. For 𝜶 ≥ 𝜷, one of the 𝑵 random variables is the maximum, and one
of the remaining 𝑵 − 𝟏 random variables is the minimum. The other 𝑵 − 𝟐 random variables are
between the minimum and maximum.
19
Important Limits
∞
𝟏
𝐥𝐢𝐦 𝟏− =𝟎
𝒎→∞ 𝒏
𝒏=𝒎
Proof:
∞
𝟏
𝐥𝐢𝐦 𝟏−
𝒎→∞ 𝒏
𝒏=𝒎
𝒌 𝒌
𝟏 𝒏−𝟏
= 𝐥𝐢𝐦 𝐥𝐢𝐦 𝟏− = 𝐥𝐢𝐦 𝐥𝐢𝐦
𝒎→∞ 𝒌→∞ 𝒏 𝒎→∞ 𝒌→∞ 𝒏
𝒏=𝒎 𝒏=𝒎
𝒎 − 𝟏 𝒎 𝒎 + 𝟏 𝒌 − 𝟐𝒌 − 𝟏 𝒎−𝟏
= 𝐥𝐢𝐦 𝐥𝐢𝐦 …. = 𝐥𝐢𝐦 𝐥𝐢𝐦 = 𝐥𝐢𝐦 𝟎
𝒎→∞ 𝒌→∞ 𝒎 𝒎 + 𝟏𝒎 + 𝟐 𝒌 − 𝟏 𝒌 𝒎→∞ 𝒌→∞ 𝒌 𝒎→∞
=𝟎
∞
𝟏
𝐥𝐢𝐦 𝟏− =𝟏
𝒎→∞ 𝒏𝟐
𝒏=𝒎
Proof:
∞
𝟏
𝐥𝐢𝐦 𝟏−
𝒎→∞ 𝒏𝟐
𝒏=𝒎
𝒌 𝒌 𝒌−𝟏
𝟏 𝒏𝟐 − 𝟏 𝒏−𝟏 𝒏+𝟏
= 𝐥𝐢𝐦 𝐥𝐢𝐦 𝟏 − 𝟐 = 𝐥𝐢𝐦 𝐥𝐢𝐦 = 𝐥𝐢𝐦 𝐥𝐢𝐦
𝒎→∞ 𝒌→∞ 𝒏 𝒎→∞ 𝒌→∞ 𝒏𝟐 𝒎→∞ 𝒌→∞ 𝒏𝟐
𝒏=𝒎 𝒏=𝒎 𝒏=𝒎
𝒎 − 𝟏𝒎 + 𝟏 𝒎 𝒎 + 𝟐𝒎 + 𝟏𝒎 + 𝟑 𝒌 − 𝟐 𝒌 𝒌 − 𝟏𝒌 + 𝟏
= 𝐥𝐢𝐦 𝐥𝐢𝐦 …
𝒎→∞ 𝒌→∞ 𝒎 𝒎 𝒎 + 𝟏𝒎 + 𝟏𝒎 + 𝟐𝒎 + 𝟐 𝒌 − 𝟏𝒌 − 𝟏 𝒌 𝒌
𝒎−𝟏𝒌+𝟏 𝒎−𝟏
= 𝐥𝐢𝐦 𝐥𝐢𝐦 = 𝐥𝐢𝐦 =𝟏
𝒎→∞ 𝒌→∞ 𝒎 𝒌 𝒎→∞ 𝒎
20
Memoryless Distributions
Consider nonnegative random variable 𝑿 with PDF 𝒇𝑿 𝒙 . We are interested in having the following
property for nonnegative 𝒕 and 𝒔:
ℙ 𝑿>𝑡+𝑠 𝝍 𝒕+𝒔
ℙ 𝑿 > 𝑡 + 𝑠|𝑋 > 𝑠 = =
ℙ 𝑿>𝑠 𝝍 𝒔
Requiring that ℙ 𝑿 > 𝑡 + 𝑠|𝑋 > 𝑠 = ℙ 𝑿 > 𝑡 is equivalent to requiring that:
-Exponential random variable: 𝝍 𝒕 = exp −𝝁𝒕 , where 𝝁 is the reciprocal of the mean.
21
Zero Variance
If Var 𝑿 = 𝔼 𝑿𝟐 − 𝔼 𝑿 𝟐
= 𝟎, then 𝑿 is a degenerate (constant) random variable, 𝑿 = 𝔼 𝑿 .
This can be seen from Chebyshev's inequality (to be studied later): ∀𝜺 > 0,
(The accurate thing to say is that 𝑿 is an almost surely constant random variable, but making
the distinction will not be important for our purposes, at least for now.)
22
Sample Mean and Variance
Consider 𝑵 i.i.d. random variables 𝑿𝟏 , 𝑿𝟐 , 𝑿𝟑 , … , 𝑿𝑵 such that 𝔼 𝑿𝒏 = 𝝁 and Var 𝑿𝒏 =
𝝇𝟐 . Consider the following two estimators for the mean and the variance:
𝑵
𝟏
𝝁= 𝑿𝒏
𝑵
𝒏=𝟏
𝑵
𝟏 𝟐
𝑽= 𝑿𝒏 − 𝝁
𝑵−𝟏
𝒏=𝟏
𝑵 𝑵
𝟏 𝟏 𝑵𝝁
𝔼𝝁 = 𝔼 𝑿𝒏 = 𝝁= =𝝁
𝑵 𝑵 𝑵
𝒏=𝟏 𝒏=𝟏
𝑵 𝑵 𝑵 𝑵
𝟏 𝟏 𝟐
𝔼𝑽 = 𝔼 𝑿𝒏 𝟐 + 𝟐 𝔼 𝑿 𝒋 𝑿𝒗 − 𝔼 𝑿𝒋 𝑿𝒏
𝑵−𝟏 𝑵 𝑵
𝒏=𝟏 𝒋=𝟏 𝒗=𝟏 𝒋=𝟏
𝑵
𝟏 𝟏
= 𝝁𝟐 + 𝝇𝟐 + 𝑵 𝝁𝟐 + 𝝇𝟐 + 𝑵𝟐 − 𝑵 𝝁𝟐
𝑵−𝟏 𝑵𝟐
𝒏=𝟏
𝟐 𝟐 𝒚𝒊𝒆𝒍𝒅𝒔
− 𝝁 + 𝝇𝟐 + 𝑵 − 𝟏 𝝁𝟐
𝑵
𝑵
𝟏 𝑵−𝟏 𝟐 𝑵−𝟏 𝟐
𝔼𝑽 = 𝝁 + 𝝇𝟐 − 𝝁 = 𝝇𝟐
𝑵−𝟏 𝑵 𝑵
𝒏=𝟏
𝑵
𝟏 𝝇𝟐
Var 𝝁 = 𝟐 Var 𝑿𝒏 =
𝑵 𝑵
𝒏=𝟏
23
Iterated Expectations and Total Variance
Consider two random variables 𝑿 and 𝒀. The conditional expectation of 𝑿 given 𝒀 is:
∞
Note that 𝔼 𝑿|𝒀 is a function of random variable 𝒀. If we take the expectation of 𝔼 𝑿|𝒀 , i.e.,
𝔼 𝔼 𝑿|𝒀 , using the PDF of 𝒀, we obtain:
∞ ∞
𝒙𝒇𝑿|𝒀 𝒙|𝒚 𝒅𝒙 𝒇𝒀 𝒚 𝒅𝒚
−∞ −∞
∞ ∞ ∞ ∞
= 𝒙 𝒇𝑿,𝒀 𝒙, 𝒚 𝒅𝒚 𝒅𝒙 = 𝒙𝒇𝑿 𝒙 𝒅𝒙 = 𝔼 𝑿
−∞ −∞ −∞
Hence, 𝔼 𝑿 = 𝔼 𝔼 𝑿|𝒀 , where the inner expectation is carried out treating 𝑌 as a constant and
employing the conditional PDF of 𝑿 given 𝒀.
Now we prove that Var 𝑿 = Var 𝔼 𝑿|𝒀 + 𝔼 Var 𝑿|𝒀 . Subscripts are added to the
expectation operators to make explicit the variable over which expectation is taken.
𝟐 𝟐 𝟐 𝟐
Var 𝔼 𝑿|𝒀 = 𝔼𝒀 𝔼𝑿|𝒀 𝑿|𝒀 − 𝔼𝒀 𝔼𝑿|𝒀 𝑿|𝒀 = 𝔼𝒀 𝔼𝑿|𝒀 𝑿|𝒀 − 𝔼𝑿 𝑿
𝟐 𝟐
𝔼 Var 𝑿|𝒀 = 𝔼𝒀 𝔼𝑿|𝒀 𝑿𝟐 |𝒀 − 𝔼𝑿|𝒀 𝑿|𝒀 = 𝔼𝑿 𝑿𝟐 − 𝔼𝒀 𝔼𝑿|𝒀 𝑿|𝒀
Adding,
Iterated expectations and total variance are valid so long as 𝔼 𝑿 and Var 𝑿 are finite.
24
Example:
Let 𝑺 = 𝑿𝟏 + 𝑿𝟐 + ⋯ + 𝑿𝑵 , where the 𝑿𝒊 ′𝑠 are i.i.d. with mean 𝝁 and variance 𝝇𝟐 , and where 𝑵 is also
a random variable, independent of the 𝑿′𝒊 𝑠.
What is 𝔼 𝑺 ?
𝔼 𝑺 = 𝔼 𝔼 𝑺|𝑵 = 𝔼 𝑵𝝁 = 𝝁𝔼 𝑵
What is Var 𝑺 ?
25
Histogram and Empirical CDF
Assume that we have 𝒏 i.i.d. observations 𝒙𝒌 𝒏𝒌=𝟏 and we want to estimate the underlying PDF. One
simple method for doing this is to divide the range of observations into 𝒌 intervals at points
𝒄𝟎 , 𝒄𝟏 , 𝒄𝟐 , … , 𝒄𝒌 . Defining ∆𝒋 = 𝒄𝒋 − 𝒄𝒋−𝟏 , 𝒋 = 𝟏, 𝟐, … , 𝒌 and 𝒏𝒋 as the number of observations that fall in
𝒏𝒋
the interval 𝒄𝒋−𝟏 , 𝒄𝒋 , the estimated PDF 𝒇 𝒙 = ,𝒙 ∈ 𝒄𝒋−𝟏 , 𝒄𝒋 . (Function 𝒉 𝒙 = 𝒏𝒋 is called
𝒏∆𝒋
histogram.)
26
𝟏 𝒏
The empirical CDF (or cumulative histogram) is 𝑭 𝒙 = 𝒌=𝟏 𝕀 𝒙𝒌 ≤ 𝒙 . The sum
𝒏
𝒏
𝒌=𝟏 𝕀 𝒙𝒌 ≤ 𝒙 is equal to the number of observations less than or equal to 𝒙.
𝒏 𝒏 𝒏
𝟏 𝟏 𝟏
𝔼𝑭 𝒙 = 𝔼 𝕀 𝒙𝒌 ≤ 𝒙 = ℙ 𝒙𝒌 ≤ 𝒙 = 𝑭𝑿 𝒙 = 𝑭𝑿 𝒙
𝒏 𝒏 𝒏
𝒌=𝟏 𝒌=𝟏 𝒌=𝟏
𝒏 𝒏
𝟏 𝟏 𝟐
Var 𝑭 𝒙 = 𝟐 Var 𝕀 𝒙𝒌 ≤ 𝒙 = 𝟐 ℙ 𝒙𝒌 ≤ 𝒙 − ℙ 𝒙𝒌 ≤ 𝒙
𝒏 𝒏
𝒌=𝟏 𝒌=𝟏
𝟐
𝑭𝑿 𝒙 − 𝑭𝑿 𝒙 𝟏
= ≤
𝒏 𝟒𝒏
27
Rice and Rayleigh Distributions
Consider random variables
𝑿𝟏 = 𝑨 𝐜𝐨𝐬 𝜽 + 𝒘𝟏 , 𝑿𝟐 = 𝑨 𝐬𝐢𝐧 𝜽 + 𝒘𝟐 ,
where 𝑨 and 𝜽 are constants, 𝒘𝟏 and 𝒘𝟐 are i.i.d. zero-mean Gaussian random variables, each with
variance 𝝇𝟐 . Consequently, 𝑿𝟏 ~𝓝 𝑨 𝐜𝐨𝐬 𝜽, 𝝇𝟐 and 𝑿𝟐 ~𝓝 𝑨 𝐬𝐢𝐧 𝜽, 𝝇𝟐 . Random variables 𝑿𝟏 and 𝑿𝟐
are uncorrelated and, hence, independent. Their joint PDF is given by:
𝟏 𝒙𝟏 − 𝑨 𝐜𝐨𝐬 𝜽 𝟐 + 𝒙𝟐 − 𝑨 𝐬𝐢𝐧 𝜽 𝟐
𝒇𝑿𝟏,𝑿𝟐 𝒙𝟏 , 𝒙𝟐 = exp −
𝟐𝝅𝝇𝟐 𝟐𝝇𝟐
𝝏 𝒙𝟏 , 𝒙𝟐
𝒇𝑹,𝚽 𝒓, 𝝋 = 𝒇𝑿𝟏,𝑿𝟐 𝒙𝟏 , 𝒙𝟐
𝝏 𝒓, 𝝋
𝝏 𝒙𝟏 ,𝒙𝟐
where is the absolute value of the determinant of the Jacobian matrix:
𝝏 𝒓,𝝋
𝝏𝒙𝟏 𝝏𝒙𝟏
𝝏𝒓 𝝏𝝋 𝐜𝐨𝐬 𝝋 −𝒓 𝐬𝐢𝐧 𝝋
𝝏𝒙𝟐 𝝏𝒙𝟐
=
𝐬𝐢𝐧 𝝋 𝒓 𝐜𝐨𝐬 𝝋
,
𝝏𝒓 𝝏𝝋
𝝏 𝒙𝟏 , 𝒙𝟐
= 𝒓cos𝟐 𝝋 + 𝒓sin𝟐 𝝋 = 𝒓
𝝏 𝒓, 𝝋
𝟐𝝅 𝟐𝝅
𝒓 𝒓𝟐 + 𝑨𝟐 𝟐𝑨𝒓cos 𝝋 − 𝜽
𝒇𝑹 𝒓 = 𝒇𝑹,𝚽 𝒓, 𝝋 𝒅𝝋 = exp − exp 𝒅𝝋
𝟐𝝅𝝇𝟐 𝟐𝝇𝟐 𝟐𝝇𝟐
𝟎 𝟎
Since the modified Bessel function of the first kind and zeroth order
𝟏 𝟐𝝅
𝑰𝟎 𝒛 = 𝟐𝝅 ∫𝟎 exp 𝒛cos 𝝋 − 𝜽 𝒅𝝋 (for any 𝜽 ∈ ℝ), for 𝒓 ≥ 𝟎,
28
𝒓 𝒓𝟐 + 𝑨𝟐 𝑨𝒓
𝒇𝑹 𝒓 = 𝟐
exp − 𝑰𝟎
𝝇 𝟐𝝇𝟐 𝝇𝟐
To explicitly show the nonnegativity of 𝒓, we may write the Rice distribution as:
𝒓 𝒓𝟐 + 𝑨𝟐 𝑨𝒓
𝒇𝑹 𝒓 = 𝟐
exp − 𝑰𝟎 𝕝 𝒓≥𝟎
𝝇 𝟐𝝇𝟐 𝝇𝟐
If 𝑨 = 𝟎, then 𝑿𝟏 and 𝑿𝟐 have a mean of zero. In this case 𝑹 = 𝑿𝟐𝟏 + 𝑿𝟐𝟐 has Rayleigh distribution.
Exploiting the fact that 𝑰𝟎 𝟎 = 𝟏, the Rayleigh distribution is given by:
𝒓 𝒓𝟐
𝒇𝑹 𝒓 = 𝟐
exp − 𝕝 𝒓≥𝟎
𝝇 𝟐𝝇𝟐
When 𝑨 = 𝟎, if 𝑷 = 𝑹𝟐 ,
𝟐
𝒅𝒓 𝒑 𝒑 𝟏 𝟏 𝒑
𝒇𝑷 𝒑 = 𝒇𝑹 𝒓 = exp − 𝟐𝝇𝟐 𝕝 𝒑 ≥ 𝟎 = 𝟐𝝇𝟐 exp − 𝟐𝝇𝟐 𝕝 𝒑 ≥ 𝟎 .
𝒅𝒑 𝝇𝟐 𝟐 𝒑
When 𝑨 > 0,
𝟐
𝒅𝒓 𝒑 𝒑 +𝑨𝟐 𝑨 𝒑 𝟏
𝒇𝑷 𝒑 = 𝒇𝑹 𝒓 = exp − 𝑰𝟎 𝕝 𝒑≥𝟎 =
𝒅𝒑 𝝇𝟐 𝟐𝝇𝟐 𝝇𝟐 𝟐 𝒑
𝟏 𝒑+𝑨𝟐 𝑨𝟐 𝒑
exp − 𝑰𝟎 𝕝 𝒑≥𝟎 .
𝟐𝝇𝟐 𝟐𝝇𝟐 𝝇𝟒
That is, 𝑷 is proportional to a noncentral chi-squared distribution with two degrees of freedom and
𝑨𝟐
noncentrality parameter .
𝝇𝟐
𝒓 𝒓𝟐
Note that when 𝑨 = 𝟎, 𝒇𝑹,𝚽 𝒓, 𝝋 = exp − 𝟐𝝇𝟐 𝕝 𝒓 ≥ 𝟎
𝟐𝝅𝝇𝟐
𝟏
Integrating with respect to , 𝒇𝚽 𝝋 = , i.e., 𝝋 is uniformly distributed.
𝟐𝝅
29
Arithmetic-Mean Geometric Mean (AM-GM) Inequality
𝒏
Consider positive real numbers 𝒙𝒌 𝒌=𝟏 .
𝟏 𝒏 𝟏
Their arithmetic mean is
𝒏 𝒌=𝟏 𝒙𝒌 = 𝒏 𝒙𝟏 + 𝒙𝟐 + ⋯ + 𝒙𝒏 .
𝒏 𝟏 𝒏 𝟏 𝒏
Their geometric mean is 𝒌=𝟏 𝒙𝒌 = 𝒙𝟏 𝒙𝟐 … 𝒙𝒏 .
𝟏 𝒏 𝒏 𝟏 𝒏
The AM-GM inequality states that:
𝒏 𝒌=𝟏 𝒙𝒌 ≥ 𝒌=𝟏 𝒙𝒌 with equality if and only if all the
numbers are equal.
Proof.
We can make use of the inequality exp 𝒛 ≥ 𝟏 + 𝒛, with equality if and only if 𝒛 = 𝟎, which is
equivalent to exp 𝒛 − 𝟏 ≥ 𝒛 with equality if and only if 𝒛 = 𝟏.
𝒙𝒌 𝟏 𝒏
Consider 𝒚𝒌 = where 𝜼 is the arithmetic mean, i.e., 𝜼 = 𝒌=𝟏 𝒙𝒌 .
𝜼 𝒏
𝒏 𝟏 𝒏 𝒏 𝟏 𝒏 𝟏 𝒏
Since 𝒛 ≤ exp 𝒛 − 𝟏 , 𝒌=𝟏 𝒚𝒌 ≤ 𝒌=𝟏 exp 𝒚𝒌 − 𝟏 = exp 𝒌=𝟏 𝒚𝒌 −𝟏 .
𝒏
𝟏 𝒏 𝟏 𝒏 𝒙𝒌 𝜼
𝒏 𝒌=𝟏 𝒚𝒌 =𝒏 𝒌=𝟏 𝜼 = 𝜼
= 𝟏.
𝒏 𝟏 𝒏 𝒏 𝟏 𝒏
𝟏
𝒚𝒌 = 𝒙𝒌
𝜼
𝒌=𝟏 𝒌=𝟏
𝟏 𝒚𝒊𝒆𝒍𝒅𝒔 𝟏
𝒏 𝟏 𝒏 𝒏 𝟏 𝒏 𝒏
Therefore,
𝜼 𝒌=𝟏 𝒙𝒌 ≤ exp 𝟏 − 𝟏 = 𝟏 𝒌=𝟏 𝒙𝒌 ≤𝜼=𝒏 𝒌=𝟏 𝒙𝒌 .
Note that exp 𝒛 − 𝟏 = 𝒛 iff 𝒛 = 𝟏. This means that the AM-GM inequality is satisfied with
equality iff ∀𝒌 𝒚𝒌 = 𝟏. This is equivalent to: ∀𝒌 𝒙𝒌 = 𝜼.
30