0% found this document useful (0 votes)
24 views

Lesson 1

The document reviews basic notions in probability theory, including sigma-fields, probability measures, probability spaces, independence, random variables, expectation, and conditional probability.

Uploaded by

Nam
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Lesson 1

The document reviews basic notions in probability theory, including sigma-fields, probability measures, probability spaces, independence, random variables, expectation, and conditional probability.

Uploaded by

Nam
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Review of basic notions in probability theory.

� [σ -field] Let Ω be a set and consider σ -field F of Ω, i.e. a collection of subsets


of Ω such that
(i) Ω ∈ F .
(ii) A1, A2, . . . ∈ F implies ∪∞
n=1 An ∈ F .
c
(iii) A ∈ F implies A ∈ F .
• In this course elements A of a σ -field will be called events and they correspond
to sets, where, after an experience outcome ω , we can say if “ω ∈ A” is true
or false.
• If we consider the set 2Ω (the set of all subsets of Ω) then it is clear that 2Ω
is a σ -field.

On the other hand, it is often not a good choice. Indeed,

• From the modeling point of view if we consider, for instance, the experiment

3
of tossing a coin twice and we can only observe the result of the first toss,
then the “event” “both results are head” cannot be included in our σ -field.

• From the mathematical point of view, if we consider 2Ω as σ -field, then it is


not always possible to define a probability measure on these events.

An important σ -field is the the Borel σ -field of R, which is a subset of 2R, that
we construct as follows. Consider the set

C := {I ⊂ R | I is an open interval} .

Clearly, C is not a σ -field. Indeed, for any a ∈ R, the singleton {a} does not
belong to C and
� � 1 1

{a} = a − ,a + .
n∈N∗
n n
The set of Borel sets B(R) of R is defined as the smallest σ -field containing C .
Equivalently, since the intersection of an arbitrary family of σ -fields is a σ -field
(exercise), B(R) is the intersection of all σ -fields in R containing C .

4
Similarly, in Rd we can define B(Rd) as the smallest σ -field containing the
“rectangles”

d
I = Πi=1(ai, bi) for some a1, . . . , ad ∈ R and b1, . . . , bd ∈ R.

� [Probability measure] A probability measure on F is a function P : F → [0, 1]


that satisfies :
(i) P(Ω) = 1.
(ii) If A1, A2, . . . ∈ F are disjoints, then




P(∪n=1An) = P(An).
n=1

� [Probability space] A probability space is a triple (Ω, F , P), where Ω is a set, F


is a σ -algebra of subsets of Ω and P is a probability measure on Ω.

5
� [Basic properties] We have the following properties, whose proofs are left to the
reader.
(i) P(∅) = 0.
(ii) If A, B ∈ F , P(A ∪ B) = P(A) + P(B) − P(A ∩ B).
(iii) Consider a family (An)n∈N of elements in F . We say that An ↑ A if

A1 ⊆ A2 ⊆ A3 . . . and A = n≥1 An. In this case, we have

P(A) = lim P(An).


n→∞

(iv) Consider a family (An)n∈N of elements in F . We say that An ↓ A if



A1 ⊇ A2 ⊇ A3 . . . and A = n≥1 An. In this case, we have

P(A) = lim P(An).


n→∞

� [Independence] Two events A and B ∈ F are independent iff

P(A ∩ B) = P(A)P(B).

6
Let I be an arbitrary index set and {Ai}i∈I a family of events in F . Then, this
family is said to be independent iff for each finite set of distinct indices i1, . . . , ik
we have

P(Ai1 ∩ Ai2 ∩ . . . Aik ) = P(Ai1 )P(Ai2 ) . . . P(Aik ).

� [Conditional probability] Let (Ω, F , P) be a probability space and A, B ∈ F


with P(A) �= 0. Then, the conditional probability of B given A is defined by

P(A ∩ B)
P(B|A) :=
P(A).
Theorem 1. Let B1, B2, . . . , be a countably family of sets such that

Bi ∩ Bj = ∅ if i �= j and ∪i=1 Bi = Ω,

(i.e. the family (Bi)i∈N forms a partition of Ω). Then, for any A ∈ F


P(A) = P(A|Bi)P(Bi).
i=1

7
Proof :

� ∞

P(A) = P(A ∩ ∪i∈N Bi ) = P(∪i∈N (A ∩ Bi )) = P(A ∩ Bi ) = P(A|Bi )P(Bi ).
i=1 i=1

� [Random variables and random vectors] Let (Ω, F , P) be a probability space. A


random variable (r.v.) is a function X : Ω → R that satisfies
−1
(∀ A ∈ B(R)) {X ∈ A} := X (A) = {ω ∈ Ω | X(ω) ∈ A} ∈ F .

Given a r.v. X , we define its distribution function FX : R → [0, 1] by

FX (x) := P(X ≤ x) = P(X ∈ (−∞, x]).

� We say that a random variable X is discrete if X(Ω) is countable.


� We say that a random variable X absolutely continuous if there exists a Borel
measurable function fX : R → R, i.e. f −1(A) ∈ B(R) for all A ∈ B(R),
called the density of X , such that
a) fX (x) ≥ 0 for all x ∈ R.

8
b) For all a, b ∈ R, a ≤ b, we have
� b
FX (b) − FX (a) = fX (x)dx.
a
.
A random vector is a function
d
Ω � ω �→ X(ω) = (X1(ω), . . . , Xd(ω)) ∈ R (d ∈ N),

where, for every k = 1, . . . , d, Xk is a random variable.


Given a random vector X : Ω → Rd, its distribution function FX : Rd → [0, 1]
is defined by
d
(∀ x = (x1, . . . , xd) ∈ R ) FX (x) := P(X1 ≤ x1, X2 ≤ x2, . . . Xd ≤ xd).

� We say that a random vector X is discrete if X(Ω) is countable.


� We say that a random vector X is absolutely continuous if there exists a Borel
−1
measurable function fX : Rd → R, i.e. fX (A) ∈ B(Rd) for all A ∈ B(R),

9
called the density of X , such that

a) fX (x) ≥ 0 for all x ∈ Rd.


b) For all x = (x1, . . . , xd) ∈ Rd
� x1 � xd
FX (x1, . . . , xd) = ... fX (y1, . . . , yd)dyd . . . dy1.
−∞ −∞

It is easy to check that if X is absolutely continuous, then for all


k = 1, . . . , d, Xk is absolutely continuous and
� ∞ � ∞
(∀ xk ∈ R) fX (xk ) = ... fX (y1 , . . . , xk , . . . , yd )dyd . . . dyk+1 dyk−1 dy1 .
k −∞ −∞
d−1 integrals

On the contrary, the converse is false. Indeed, it suffices to take d = 2


and X1 = X2 = X , where X is absolutely continuous. Then (X1, X2)
takes values in D := {(x, y) ∈ Rd | y = x}. Thus,

P((X1, X2) ∈ D) = P(X1 = X2) = 1,

10
but, if (X1, X2) is absolutely continuous,

P((X1, X2) ∈ D) = f(X1,X2)(x1, x2)dx1dx2 = 0,
D

because the “volume” of D in R2 is 0.


� We say that a family of random variables X1, . . . , Xn is independent, if for
every B1, . . . , Bn, with Bi ∈ B(R) (i = 1, . . . , n), we have

P(X1 ∈ B1, . . . , Xn ∈ Bn) = P(X1 ∈ B1) · · · P(Xn ∈ Bn).

If the r.v’s are discrete, then this equivalent to

(∀ x1 ∈ X1 (Ω), . . . , xd ∈ Xn (Ω)) P(X1 = x1 , . . . , Xn = xn ) = P(X1 = x1 ) . . . P(Xn = xn )

If the r.v’s are absolutely continuous, then independence is equivalent to X =


(X1, . . . , Xn) is absolutely continuous and

n
(∀ (x1, . . . , xn) ∈ R ) fX (x1, . . . , xn) = fX1 (x1) . . . fXn (xn).

11
� [Expectation and conditional expectation] Given a discrete or absolutely continuous
random variable X : Ω → R, its expectation is defined by
 �
 m
 xk P(X = xk ) if X is discrete,
xk ∈X(Ω)
E(X) =

 �
R
xfX (x)dx if X is absolutely continuous.

An associated notion, is the the variance. Given a discrete or absolutely continuous


random variable X : Ω → R, its variance is defined by
� � � �2
2 2
V (X) = E (X − E(X)) = E(X ) − E(X) .
exercise

� Given A ∈ F such that P(A) > 0 and a discrete random variable X : Ω →


R, its conditional expectation of X given A is defined by

E(X|A) = xk P(X = xk |A),
xk ∈X(Ω)

12
provided that the sum on the r.h.s. exists. It is easy to check that if B1,
B2, . . . , is a countably family of elements in F such that

Bi ∩ Bj = ∅ if i �= j and ∪i=1 Bi = Ω,

then


E (X) = E (X|Bi) P(Bi).
i=1
� Given two discrete random variables X , Y and x ∈ X(Ω) such that P(X =
x) �= 0, we define the conditional expectation of Y given that X = x as

E(Y |X = x) = yk P(Y = yk |X = xk ).
yk ∈Y (Ω)

� Given an absolutely continuous random vector (X, Y ) and x ∈ R such that


fX (x) �= 0, we define the conditional density of Y given that X = x as

f(X,Y )(x, y)
(∀ y ∈ R) fY |X (y|x) = .
fX (x)

13
In this framework, for every B ∈ B(R), the conditional probability P(Y ∈ B|X = x)
is defined by �
P(Y ∈ B|X = x) = fY |X (y|x)dy.
B
Similarly, the conditional expectation of Y given that X = x is defined by
� ∞
E(Y |X = x) = yfY |X (y|x)dy.
−∞

In this context, it is easy to show that


� ∞
E(Y ) = E(Y |X = x)fX (x)dx. (1)
−∞

If we define the random variable E(Y |X) by

(∀ ω ∈ Ω) E(Y |X)(ω) = E(Y |X = X(ω)).

Then (1) can be written as

E(Y ) = E(E(Y |X)).

14
Classical random variables

� [Discrete random variables]

1. Consider a random variable X that models an experience with only two possible
outcomes, let us say 0 and 1. Assume that there exists p ∈ (0, 1) such that
P(X = 1) = p (and so P(X = 0) = 1 − p). Then we say that X has a
Bernoulli distribution of parameter p. We have

E(X) = p and V (X) = p(1 − p).

2. Consider a experience satisfying the following conditions :

a) It consists of n sub-experiences, which are independent, and each one of them


can have two outcomes 1 (success) and 0 (failure).

b) At each sub-experience, the probability of success is p ∈ (0, 1).

15
Let X the random variable that counts the number of success. Then,
� n�
k n−k
P(X = k) = p (1 − p) ∀ k = 0, . . . , n,
k

where we recall that � n� n!


= .
k k!(n − k)!
In this case we say that X follows a binomial distribution of parameters (n, p)
and we denote X ∼ B(n, p). We have

E(X) = np and V (X) = np(1 − p).


�n
If X ∼ B(n, p) then X has the same distribution than k=1 Xk , where
X1, . . . , Xn are independent and identically distributed (iid) random variables,
with each Xk (k = 1, . . . , n) having a Bernoulli distribution of parameter p.
3. We consider here the distribution of a random variable that it is often used
to describe ”rare events” on small time intervals, and where occurrences on an
interval are independent of occurrences on another disjoint interval.

16
We say that a random variable X has a Poisson distribution of parameter λ > 0,
which is denoted by X ∼ P (λ), if

λk −λ
P(X = k) = e , k = 0, 1, 2, . . . .
k!
We have
E(X) = λ and V (X) = λ.

� [Absolutely continuous random variables]

1. Let a, b ∈ R with a < b. We say that an absolutely continuous random variable


X is uniformly distributed in [a, b] if

1
b−a if x ∈ [a, b]
fX (x) =
0 otherwise.

17
We have
a+b (b − a)2
E(X) = and V (X) = .
2 12
2. We say that an absolutely continuous random variable X has an
exponential distribution with parameter λ > 0 if

λe−λx if x ∈ [0, ∞),
fX (x) =
0 otherwise.

We have
2
E(X) = 1/λ and V (X) = 1/λ .
3. Let µ ∈ R and σ > 0. We say that an absolutely continuous random variable X
has a normal (or Gaussian) distribution N (µ, σ 2) if

1 (x−µ)2

fX (x) = √ e 2σ 2 ∀ x ∈ R.
2πσ 2
We have
2
E(X) = µ and V (X) = σ .

18
Convergence notions for random variables

� [Some useful technical results] In this first short section we will prove some tecnical
results. The reader will notice that despite their importance, the proofs are rather
straightforward.

Lemma 1. [First Borel-Cantelli lemma] Consider a sequence (An)n∈N ⊆ F such


that
� ∞
P(An) < ∞. (2)
n=1

Then,
� ∞ �



P Ak = 0.
n=1 k=n

19
�∞ �∞
Proof : Call A = n=1 k=n Ak . Then, for every n ∈ N,

 

� ∞

P(A) ≤ P  Ak  ≤ P(Ak ) → 0, as n → ∞.
k=n k=n

Remark 1. The previous Borel-Cantelli lemma states that if (3) holds, then with
probability one for any realization ω , there exist n0(ω) ∈ N such that ω ∈
/ Ak
for all k ≥ n0(ω). That is, with probability one the events (An)n∈N does not
occur “infinitely often”.

Lemma 2. [Second Borel-Cantelli lemma] Consider a sequence (An)n∈N ⊆ F of


independent events such that



P(An) = ∞. (3)
n=1

20
Then,
� ∞ �



P Ak = 1.
n=1 k=n

Proof : We have
     
∞ �
� ∞ ∞
� m

P Ak  = lim P  Ak  = lim lim P  Ak  ,
n→∞ n→∞ m→∞
n=1 k=n k=n k=n

and
   
m
� m
� c m c m
P Ak  = 1 − P  Ak  = 1 − Πk=n P(Ak ) = 1 − Πk=n (1 − P(Ak )).
k=n k=n

Using that 1 − x ≤ e−x for all x ∈ R, we get


 
m
� �m
P Ak  ≥ 1 − e
− k=n P(Ak ) .
k=n

21
Letting m → ∞, we obtain
 


P Ak  = 1,
k=n

from which the result follows.

Lemma 3. [Jensen inequality] Let X be a discrete or absolutely continuous random


variable and f : R → R be convex. Then,

f (E(X)) ≤ E (f (X)) .

Proof : Since f is convex, there exist α ∈ R and β ∈ R such that f (x) ≥ αx + β for all x ∈ R
and f (E(X)) = αE(X) + β . Thus, f (X) ≥ αX + β and the result follows by taking the
expectation in this inequality. �.

Lemma 4. [Chebyshev inequality] Let X be a discrete or absolutely continuous

22
random variable, a ∈ R, ε > 0 and p > 0. Then,

E (|X − a|p)
P (|X − a| ≥ ε) ≤
εp
Proof : Assume that X is absolutely continuous (the proof in the discrete case is analogous) and
set Y := |X − a|p . Then,

� εp � ∞
p p
E (Y ) = yfy (y)dy + yfy (y)dy ≥ 0 + ε P(Y ≥ ε ).
0 εp

Since P (|X − a| ≥ ε) = P(Y ≥ εp ), the result follows from the previous inequality. �

� [Some convergence notions]

Consider a sequence (Xn)n∈R of discrete or absolutely continuous random variables


and let X be another random variable. For every m ∈ N, n ∈ N, set
� �
1
Am,n := ω ∈ Ω | |Xn(ω) − X(ω)| < . (4)
m

23
Notice that
� � � ∞ �
∞ �

� �
ω ∈ Ω �� X(ω) = lim Xn(ω) = Am,k .
n→∞
m=1 n=1 k=n

In particular, since the sets in (4) belong to F , we have that

{X = lim Xn} ∈ F .
n→∞

Definition 1. (i) We say that Xn converges to a random variable X almost surely


if � �
P {X = lim Xn} = 1.
n→∞

(ii) We say that Xn converges to a random variable X in probability if for every


ε > 0 we have
P (|Xn − X| ≥ ε) → 0 as n → ∞.

24
(iii) Let p ∈ [1, ∞). We say that Xn converges to a random variable X in Lp if
E (|Xn|p) < +∞, for all n ∈ N∗, E (|X|p) < +∞ and

p
E (|Xn − X| ) → 0 as n → ∞.

Lemma 5. Let 1 ≤ p ≤ q < ∞. Then, convergence inf Lq implies convergence


in Lp.
Proof : From Jensen inequality
� � ��
� � q q
p �� p p p � q�
E |Xn − X| ≤E |Xn − X| = E |Xn − X| .

The following result characterizes almost sure convergence.

Proposition 1. The following assertions are equivalent :


(i) The sequence (Xn)n∈N converges to X almost surely.

25
�∞
(ii) For all m ∈ N we have limn→∞ P( k=n Am,k ) = 1 .
(iii) For every ε > 0

lim P ({∃ k ≥ n | |Xk − X| ≥ ε}) = 0.


n→∞

Proof. (i)⇔(ii). From the definition of Am,n and the fact that the countable intersection of sets
with
�∞ �∞ one has probability one (exercise) we get that P ({X = limn→∞ Xn }) = 1 iff
probability
P( n=1 k=n Am,k ) = 1. But, if n1 ≤ n2

� ∞

Am,k ⊆ Am,k ,
k=n1 k=n2

and hence, P ({X = limn→∞ Xn }) = 1 iff limn→∞ P( ∞k=n Am,k ) = 1 for all m ∈ N.

(ii)⇔(iii). Condition (ii) holds iff


�� ��
1
lim P ∃ k ≥ n | |Xk − X| ≥ =0 ∀ m ∈ N.
n→∞ m
Then, given ε > 0 and choosing m ∈ N such that 1/m ≤ ε, we get the equivalence with (iii).

26
Let us provide now some sufficient conditions for almost sure convergence.

Proposition 2. [Sufficient conditions for almost sure convergence] Almost sure


convergence of (Xn)n∈N to X is ensured by any of the following two conditions :
(i) For every ε > 0 we have



P (|Xn − X| ≥ ε) < ∞.
n=1

(ii) There exists k > 0 such that


� � �
k
E |Xn − X| < ∞.
n=1

�� :(i) �Let m ∈ �N. By taking �ε�= 1/m


Proof

, the Borel-Cantelli lemma implies that

∞ ∞ c ∞ ∞
P A
n=1 k=n m,k = 0 , i.e. P n=1 k=n A m,k = 1, which is equivalent to
�� ∞ �
limn→∞ P k=n Am,k = 1. The result follows from Proposition 1.

27
(ii) By Chebyshev inequality
� �
k
E |Xn − X|
P (|Xn − X| ≥ ε) ≤ ,
εk

and, hence, the result follows from (i). �

Exercise 1. Let (Xn )n∈N be a sequence of discrete or absolutely continuous independent random
variables and let c ∈ R.
�∞
(i) Show that Xn → c almost surely iff for every ε > 0, n=1 P (|Xn − c| ≥ ε) < ∞.
(ii) Show that if Xn does not converge to c almost surely, then P (limn→∞ Xn = c) = 0.

Hint : In (i) one implication is straigthforward. For the other one, argue by contradiction and
use the second Borel-Cantelli lemma.

Exercise 2. Let (Xn )n∈N be a sequence of discrete or absolutely continuous independent random
variables. Assume that E(Xn ) = 0, for all
��nn ∈ N, �and that there exists C > 0 such that
V (Xn ) ≤ C/n for all n ∈ N. Show that i=1 Xi /n → 0 almost surely.

We will see in Proposition 4 below that almost sure convergence and convergence
in Lp imply convergence in probability. The next result, stated without proof,

28
shows that almost sure convergence imply convergence in Lp if the sequence Xn
is dominated almost surely by a random variable with a finite p-moment.

Proposition 3. [Dominated convergence] Assume that Xn → X almost surely and


that P (|Xn| ≤ Z) = 1 for some random variable Z satisfying E (|Z|p) < ∞.
Then, E (|X|p) < ∞ and Xn → X in Lp.

The following result shows that convergence in probability is the weakest conver-
gence among the ones introduced above.

Proposition 4. Convergence in probability is implied by both, almost sure conver-


gence and convergence in Lp.
Proof : The fact that convergence if probability follows from almost sure convergence is a direct
consequence of the fact that assertion (iii) in Proposition 1 implies convergence in probability.
Now, for every ε > 0 and p > 0, Chebyshev inequality yields

E (|Xn − X|p )
P (|Xn − X| ≥ ε) ≤ ,
εp

29
and, hence, convergence in Lp implies convergence in probability. �

Remark 2. It is interesting to notice that convergence in L2, and so convergence


in probability, is implied by

E(Xn) → E(X) and V (Xn − X) → 0.

This follows directly from the equality


� �
2 2
E (Xn − X) = V (Xn − X) + (E(Xn) − E(X)) .

We have seen that almost sure convergence implies convergence in probability.


The next result is a sort of “converse” of this assertion.

Proposition 5. [Convergence in probability implies almost sure convergence for


a subsequence] Assume that Xn → X in probability. Then, there exists a
subsequence (Xnk )k∈N of (Xn)n∈N that such that Xnk → X almost surely.

30
Proof : Set n1 = 1 and for k > 1 set
� � � � �
� 1 1
nk = inf n > nk−1 �� P |Xm − X� | > k < k ∀ m, � ≥ n .
2 2
Notice that the infimum above is finite. Indeed,
� � � �
P |Xm − X� | > 1k ≤ P |Xm − X| + |X� − X| ≥ 1k
2 2
� � � �
1 1
≤ P |Xm − X| ≥ k+1 + P |X� − X| ≥ k+1
2 2
≤ 1
2k

if m and � are larger than some n0 ∈ N. Since, by construction,


� �
1 1
P |Xnk+1 − Xnk | ≥ k ≤ k ∀ k ∈ N,
2 2
the Borel-Cantelli lemma yields that for all ω outside an event with zero probability, we have the
existence of k0 (ω) such that
1
|Xnk+1 (ω) − Xnk (ω)| < k ∀ k ≥ k0 (ω).
2
Thus, Xnk (ω) is a Cauchy sequence and converge to some X̃(ω). Since Xnk → X̃ almost

31
surely, then it also converges in probability, from which we deduce that X = X̂ almost surely (it
can be shown that the limit in probability is almost surely unique, see the exercise below). The
result follows. �

Remark 3. Since convergence in Lp implies convergence in probability, if Xn →


X in Lp, then there exists a subsequence (Xnk )k∈N of (Xn)n∈N such that
Xnk → X almost everywhere.
Exercise 3. Show that if Xn → X in probability and Xn → Y in probability, then X = Y
almost surely (i.e. P(X = Y ) = 1).

Exercise 4. Consider a sequence of independent Bernoulli random variables (Xn )n∈N such that
Xn = 1 with probability 1/n.
(i) Check that Xn → 0 in probability and in L1 but Xn does not converge to 0 almost surely.
(ii) Consider the subsequence Yn := Xn2 . Show that Yn → 0 almost surely.

Exercise 5. Consider a sequence of independent Bernoulli randon variables (Xn )n∈N such that
Xn = 1 with probability pn .
(i) Provide a necessary and sufficient condition on pn in order to have Xn → 0 in probability.
(ii) Provide a necessary and sufficient condition on pn in order to have Xn → 0 almost surely.

32
Compare with the result in (i).

Remark 4. The first part of the previous exercise shows, in particular, that the
converse of Proposition 3 does not hold.

33

You might also like