Lesson 1
Lesson 1
• From the modeling point of view if we consider, for instance, the experiment
3
of tossing a coin twice and we can only observe the result of the first toss,
then the “event” “both results are head” cannot be included in our σ -field.
An important σ -field is the the Borel σ -field of R, which is a subset of 2R, that
we construct as follows. Consider the set
C := {I ⊂ R | I is an open interval} .
Clearly, C is not a σ -field. Indeed, for any a ∈ R, the singleton {a} does not
belong to C and
� � 1 1
�
{a} = a − ,a + .
n∈N∗
n n
The set of Borel sets B(R) of R is defined as the smallest σ -field containing C .
Equivalently, since the intersection of an arbitrary family of σ -fields is a σ -field
(exercise), B(R) is the intersection of all σ -fields in R containing C .
4
Similarly, in Rd we can define B(Rd) as the smallest σ -field containing the
“rectangles”
d
I = Πi=1(ai, bi) for some a1, . . . , ad ∈ R and b1, . . . , bd ∈ R.
∞
�
∞
P(∪n=1An) = P(An).
n=1
5
� [Basic properties] We have the following properties, whose proofs are left to the
reader.
(i) P(∅) = 0.
(ii) If A, B ∈ F , P(A ∪ B) = P(A) + P(B) − P(A ∩ B).
(iii) Consider a family (An)n∈N of elements in F . We say that An ↑ A if
�
A1 ⊆ A2 ⊆ A3 . . . and A = n≥1 An. In this case, we have
P(A ∩ B) = P(A)P(B).
6
Let I be an arbitrary index set and {Ai}i∈I a family of events in F . Then, this
family is said to be independent iff for each finite set of distinct indices i1, . . . , ik
we have
P(A ∩ B)
P(B|A) :=
P(A).
Theorem 1. Let B1, B2, . . . , be a countably family of sets such that
∞
Bi ∩ Bj = ∅ if i �= j and ∪i=1 Bi = Ω,
(i.e. the family (Bi)i∈N forms a partition of Ω). Then, for any A ∈ F
∞
�
P(A) = P(A|Bi)P(Bi).
i=1
7
Proof :
∞
� ∞
�
P(A) = P(A ∩ ∪i∈N Bi ) = P(∪i∈N (A ∩ Bi )) = P(A ∩ Bi ) = P(A|Bi )P(Bi ).
i=1 i=1
8
b) For all a, b ∈ R, a ≤ b, we have
� b
FX (b) − FX (a) = fX (x)dx.
a
.
A random vector is a function
d
Ω � ω �→ X(ω) = (X1(ω), . . . , Xd(ω)) ∈ R (d ∈ N),
9
called the density of X , such that
10
but, if (X1, X2) is absolutely continuous,
�
P((X1, X2) ∈ D) = f(X1,X2)(x1, x2)dx1dx2 = 0,
D
n
(∀ (x1, . . . , xn) ∈ R ) fX (x1, . . . , xn) = fX1 (x1) . . . fXn (xn).
11
� [Expectation and conditional expectation] Given a discrete or absolutely continuous
random variable X : Ω → R, its expectation is defined by
�
m
xk P(X = xk ) if X is discrete,
xk ∈X(Ω)
E(X) =
�
R
xfX (x)dx if X is absolutely continuous.
12
provided that the sum on the r.h.s. exists. It is easy to check that if B1,
B2, . . . , is a countably family of elements in F such that
∞
Bi ∩ Bj = ∅ if i �= j and ∪i=1 Bi = Ω,
then
∞
�
E (X) = E (X|Bi) P(Bi).
i=1
� Given two discrete random variables X , Y and x ∈ X(Ω) such that P(X =
x) �= 0, we define the conditional expectation of Y given that X = x as
�
E(Y |X = x) = yk P(Y = yk |X = xk ).
yk ∈Y (Ω)
f(X,Y )(x, y)
(∀ y ∈ R) fY |X (y|x) = .
fX (x)
13
In this framework, for every B ∈ B(R), the conditional probability P(Y ∈ B|X = x)
is defined by �
P(Y ∈ B|X = x) = fY |X (y|x)dy.
B
Similarly, the conditional expectation of Y given that X = x is defined by
� ∞
E(Y |X = x) = yfY |X (y|x)dy.
−∞
14
Classical random variables
1. Consider a random variable X that models an experience with only two possible
outcomes, let us say 0 and 1. Assume that there exists p ∈ (0, 1) such that
P(X = 1) = p (and so P(X = 0) = 1 − p). Then we say that X has a
Bernoulli distribution of parameter p. We have
15
Let X the random variable that counts the number of success. Then,
� n�
k n−k
P(X = k) = p (1 − p) ∀ k = 0, . . . , n,
k
16
We say that a random variable X has a Poisson distribution of parameter λ > 0,
which is denoted by X ∼ P (λ), if
λk −λ
P(X = k) = e , k = 0, 1, 2, . . . .
k!
We have
E(X) = λ and V (X) = λ.
17
We have
a+b (b − a)2
E(X) = and V (X) = .
2 12
2. We say that an absolutely continuous random variable X has an
exponential distribution with parameter λ > 0 if
�
λe−λx if x ∈ [0, ∞),
fX (x) =
0 otherwise.
We have
2
E(X) = 1/λ and V (X) = 1/λ .
3. Let µ ∈ R and σ > 0. We say that an absolutely continuous random variable X
has a normal (or Gaussian) distribution N (µ, σ 2) if
1 (x−µ)2
−
fX (x) = √ e 2σ 2 ∀ x ∈ R.
2πσ 2
We have
2
E(X) = µ and V (X) = σ .
18
Convergence notions for random variables
� [Some useful technical results] In this first short section we will prove some tecnical
results. The reader will notice that despite their importance, the proofs are rather
straightforward.
Then,
� ∞ �
∞
�
�
P Ak = 0.
n=1 k=n
19
�∞ �∞
Proof : Call A = n=1 k=n Ak . Then, for every n ∈ N,
∞
� ∞
�
P(A) ≤ P Ak ≤ P(Ak ) → 0, as n → ∞.
k=n k=n
Remark 1. The previous Borel-Cantelli lemma states that if (3) holds, then with
probability one for any realization ω , there exist n0(ω) ∈ N such that ω ∈
/ Ak
for all k ≥ n0(ω). That is, with probability one the events (An)n∈N does not
occur “infinitely often”.
∞
�
P(An) = ∞. (3)
n=1
20
Then,
� ∞ �
∞
�
�
P Ak = 1.
n=1 k=n
Proof : We have
∞ �
� ∞ ∞
� m
�
P Ak = lim P Ak = lim lim P Ak ,
n→∞ n→∞ m→∞
n=1 k=n k=n k=n
and
m
� m
� c m c m
P Ak = 1 − P Ak = 1 − Πk=n P(Ak ) = 1 − Πk=n (1 − P(Ak )).
k=n k=n
21
Letting m → ∞, we obtain
∞
�
P Ak = 1,
k=n
f (E(X)) ≤ E (f (X)) .
Proof : Since f is convex, there exist α ∈ R and β ∈ R such that f (x) ≥ αx + β for all x ∈ R
and f (E(X)) = αE(X) + β . Thus, f (X) ≥ αX + β and the result follows by taking the
expectation in this inequality. �.
22
random variable, a ∈ R, ε > 0 and p > 0. Then,
E (|X − a|p)
P (|X − a| ≥ ε) ≤
εp
Proof : Assume that X is absolutely continuous (the proof in the discrete case is analogous) and
set Y := |X − a|p . Then,
� εp � ∞
p p
E (Y ) = yfy (y)dy + yfy (y)dy ≥ 0 + ε P(Y ≥ ε ).
0 εp
Since P (|X − a| ≥ ε) = P(Y ≥ εp ), the result follows from the previous inequality. �
23
Notice that
� � � ∞ �
∞ �
∞
� �
ω ∈ Ω �� X(ω) = lim Xn(ω) = Am,k .
n→∞
m=1 n=1 k=n
{X = lim Xn} ∈ F .
n→∞
24
(iii) Let p ∈ [1, ∞). We say that Xn converges to a random variable X in Lp if
E (|Xn|p) < +∞, for all n ∈ N∗, E (|X|p) < +∞ and
p
E (|Xn − X| ) → 0 as n → ∞.
25
�∞
(ii) For all m ∈ N we have limn→∞ P( k=n Am,k ) = 1 .
(iii) For every ε > 0
Proof. (i)⇔(ii). From the definition of Am,n and the fact that the countable intersection of sets
with
�∞ �∞ one has probability one (exercise) we get that P ({X = limn→∞ Xn }) = 1 iff
probability
P( n=1 k=n Am,k ) = 1. But, if n1 ≤ n2
∞
� ∞
�
Am,k ⊆ Am,k ,
k=n1 k=n2
�
and hence, P ({X = limn→∞ Xn }) = 1 iff limn→∞ P( ∞k=n Am,k ) = 1 for all m ∈ N.
26
Let us provide now some sufficient conditions for almost sure convergence.
∞
�
P (|Xn − X| ≥ ε) < ∞.
n=1
∞
� � �
k
E |Xn − X| < ∞.
n=1
27
(ii) By Chebyshev inequality
� �
k
E |Xn − X|
P (|Xn − X| ≥ ε) ≤ ,
εk
Exercise 1. Let (Xn )n∈N be a sequence of discrete or absolutely continuous independent random
variables and let c ∈ R.
�∞
(i) Show that Xn → c almost surely iff for every ε > 0, n=1 P (|Xn − c| ≥ ε) < ∞.
(ii) Show that if Xn does not converge to c almost surely, then P (limn→∞ Xn = c) = 0.
Hint : In (i) one implication is straigthforward. For the other one, argue by contradiction and
use the second Borel-Cantelli lemma.
Exercise 2. Let (Xn )n∈N be a sequence of discrete or absolutely continuous independent random
variables. Assume that E(Xn ) = 0, for all
��nn ∈ N, �and that there exists C > 0 such that
V (Xn ) ≤ C/n for all n ∈ N. Show that i=1 Xi /n → 0 almost surely.
We will see in Proposition 4 below that almost sure convergence and convergence
in Lp imply convergence in probability. The next result, stated without proof,
28
shows that almost sure convergence imply convergence in Lp if the sequence Xn
is dominated almost surely by a random variable with a finite p-moment.
The following result shows that convergence in probability is the weakest conver-
gence among the ones introduced above.
E (|Xn − X|p )
P (|Xn − X| ≥ ε) ≤ ,
εp
29
and, hence, convergence in Lp implies convergence in probability. �
30
Proof : Set n1 = 1 and for k > 1 set
� � � � �
� 1 1
nk = inf n > nk−1 �� P |Xm − X� | > k < k ∀ m, � ≥ n .
2 2
Notice that the infimum above is finite. Indeed,
� � � �
P |Xm − X� | > 1k ≤ P |Xm − X| + |X� − X| ≥ 1k
2 2
� � � �
1 1
≤ P |Xm − X| ≥ k+1 + P |X� − X| ≥ k+1
2 2
≤ 1
2k
31
surely, then it also converges in probability, from which we deduce that X = X̂ almost surely (it
can be shown that the limit in probability is almost surely unique, see the exercise below). The
result follows. �
Exercise 4. Consider a sequence of independent Bernoulli random variables (Xn )n∈N such that
Xn = 1 with probability 1/n.
(i) Check that Xn → 0 in probability and in L1 but Xn does not converge to 0 almost surely.
(ii) Consider the subsequence Yn := Xn2 . Show that Yn → 0 almost surely.
Exercise 5. Consider a sequence of independent Bernoulli randon variables (Xn )n∈N such that
Xn = 1 with probability pn .
(i) Provide a necessary and sufficient condition on pn in order to have Xn → 0 in probability.
(ii) Provide a necessary and sufficient condition on pn in order to have Xn → 0 almost surely.
32
Compare with the result in (i).
Remark 4. The first part of the previous exercise shows, in particular, that the
converse of Proposition 3 does not hold.
33