mcnotes51
mcnotes51
If A is an event (particular !'s being realized), then P (A) is called the probability of event
A.
De nition (Support)
A support of P is any set A 2 F s.t. P (A) = 1:
Note: suppose you have a coin, i.e., = fheads; tailsg which only throws `heads'. Then
P (`heads0 ) = 1 and A = heads is a support of P:
1
1.1.2 Properties of probability measures
1. Monotonicity: Let P be a probability measure and A; B 2 F: If A B then P (A)
P (B):
2. Inclusion/exclusion:
X
n X X
P ([nk=1 Ak ) = P (Ai ) P (Ai \Aj )+ P (Ai \Aj \Ak )+:::( 1)n+1 P (A1 \A2 :::\An )
i=1 i<j i<j<k
3. Continuity: If fAn g 2 F and A 2 F and An " A then P (An ) " P (A): (this means the
sequence of sets An , subsets of A converges to the set A). The same is true for convergence
from above.
P
4. Countable sub-additivity: If A1 ; A2 :::and [1 k=1 2 F then P ([Ak ) P (Ak ) (note:
here Ak are not necessarily disjoint).
Note: the reason we require X(!) to be F measurable is that we want to be able to assign
a probability measure to any set f! 2 : X(!) < xg.
Example: Consider throwing a die. The event space is = f1; 2; :::; 6g. A possible
-algebra is F =[f1; ::6g; ?; f1; ::4g; f5; 6g]. A possible probability measure is to assign
P = 1=6 to each possible outcome !i , i = 1; :::; 6 and extend it (using the rules above)
to all sets in F: Notice that the probability measure needs to be de ned on all sets in F!
Now, consider the function:
0 if ! 2 f1; 2; 3g
X(!) =
1 otherwise
Is X(!) a random variable? (Verify this as an exercise using the above de nition). What
if we had ! 2 f1; 2; 3; 4g instead in the rst row above?
2
Let = R and assume we are interested in de ning probabilities over open intervals in
R, including ( 1; 1): It turns out that we can construct a -algebra of all open intervals on
the real line, which is called the Borel algebra on R and denoted by B1 : Note that by the
1
de nition of -algebra, B must contain also all closed, all semi-open intervals, and all singletons
in R: (think why!)
The cumulative distribution function (cdf ), F (x) of the random variable X(!) is
de ned as:
F (x) = P (f! : X(!) xg)
We often write as a shorthand P (X x) instead of P (f! : X(!) xg):
1. F ( 1) = 0 and F (1) = 1.
2. F is non-decreasing and continuous from the right.
3. F (x ) limt"x F (t) and F (x+ ) limt#x F (t) exist and are nite.
4. F is continuous at x i F (x ) = F (x+ ) = F (x):
5. The only possible discontinuities in F are jumps up.
6. The set of discontinuity points of F is countable.
We have the following relationships between the probability measure P and the cdf or a RV
F
1. P (X > x) = 1 F (x)
2. P (x1 < X x2 ) = F (x2 ) F (x1 )
3. P (X < x) = F (x )
4. P (X = x) = F (x+ ) F (x ) (it is positive only if there is a jump at x and zero
otherwise).
A discrete RV, X is a RV which can take only nitely P or countably in nitely many
values x1 ; x2 :: with probabilities p1 ; p2 ; :::; pj > 0 and pj = 1: Its cdf is called a
discrete distribution function and is a step function with jumps at x1 ; x2 ; :::.We can
also P
de ne the probability function for X as f (x) = P (X = x): Clearly, f (xi ) 0
and f (xi ) = 1:
3
A continuous RV is one that has a continuous
Ry cdf. Then there exists a function
f s.t. 8x; y with x < y F (y) F (x) = x f (t)dt and F has a derivative equal to
f almost everywhere (except on a set of measure 0). The function f is called the
probability density function (pdf ) of X and, as we see, it is only de ned up to a set
of measure zero, i.e., it is not unique. Unlike the discrete RVs there is no direct
connection between f (x) and P (X = x) as the latter is always zero for continuous
RVs.
Finally, a mixed random variable is for example one that has a continuous part and mass at
some point.
Let X is a RV and let us look at the random variable Y = g(X). Suppose we are interested
in the distribution of Y; i.e., want to nd f (y):
Case 1: X is discrete with pf fx (x). We have:
X
fy (y) = P (g(x) = y) = fx (x)
x:g(x)=y
Case 2: X is continuous RV with pdf fx (x) and P (a < x < b) = 1: Let g be a continuous
and strictly monotonic function and let a < x < b i < y < : Then the pdf of Y = g(X),
fy (y) is: 8
< d
fx (g 1 (y))j g 1 (y)j for < y <
fy (y) = dy
: 0 otherwise
1.1.5 Expectation
De nition (Expectation of a continuous random variable)
4
Notice that the above imply that it may be possible in some cases the expectation of a RV
1
not to exist. For example take a Cauchy distributed (pdf f (x) = ) random variable.
(1 + x2 )
R1 1
We have 1 xf (x)dx = ln(1 + x2 )j1 0 = 1:
Example: compute the expectation of the uniformly distributed random variable on [a; b]
with F (x) = xb aa for any x 2 [a; b]:
Properties of expectations
R R
(1) integrability: xf (x)dx < 1; i jxjf (x)dx < 1
(2) linearity: E(aX + bY ) = aE(X) + bE(Y )
(3) If 9 constant a s.t. P (X a) = 1 then E(X) a
(4) If P (X Y ) = 1 then E(X) E(Y ):
(5) If P (X = Y ) = 1 then E(X) = E(Y )
(6) If P (X 0) = 1 and E(X) = 0 then P (X = 0) = 1:
(7) If P (X = 0) = 1 then E(X) = 0:
(8) If c is a constant, E(c) = c:
Expectation of a function of RV
Let X be a RV with pdf f (x). Suppose we want to nd E(Y ), where Y = g(X) is a random
variable which is a function of X: Then the following is true:
Z
E(Y ) = E(g(X)) = g(x)f (x)dx
5
A related concept is the mean squared error of a RV de ned as:
for some constant c: The interpretation is that it `measures' the average deviation from c.
Clearly the MSE is minimized at c = E(X):
E(h(X)) h(E(X))
E(X c)2
P (jX cj d)
d2
Let (X; Y ) be a random vector. The joint distribution function (jdf ) F (x; y) of
(X; Y ) is:
6
If X and Y are both discrete RVs then they also have discrete jdf and we can de ne their
joint probability function as
f (x; y) = P (X = x ^ Y = y)
P
with f (xi ; yj ) 0 and i;j f (xi ; yj ) = 1.
The jdf is just the \overall" distribution function of the vector (X; Y ) and is thus analogous
to the cdf in the univariate case. If instead we want to look at the components of the random
vector one at a time we need to de ne the following:
The interpretation is that the mdf of X is its distribution function (summed) over all possible
values for Y .
For discrete RVs X and Y taking values fxi g and fyj g respectively, the above limit can be
computed as simply adding up the joint probabilities f (xi ; yj ) over all possible yj for any given
xi , i.e., P
f (xi ) = f (xi ; yj ) for any i
j
(X; Y ) have a continuous jdf if there exists a non-negative function f (x; y) called
the joint probability density function
RR (joint pdf) of (X; Y ) such that for any set
2
A B : P ((X; Y ) 2 A) = f (s; t)dsdt: The joint pdf satis es the following:
A
R1 R1 @ 2 F (x; y)
f (x; y) 0; 1 1
f (s; t)dsdt = 1; f (x; y) = whenever the derivative
@x@y
exists.
R1
The marginal probability density function of X is f (x) = 1 f (x; t)dt: In the dis-
crete P
case it is called a marginal probability function and is given by f (x) = P (X =
x) = y f (x; y):
Exercise: Suppose that a point (X; Y ) is chosen at random from the rectangle S =
f(x; y) : 0 x 2; 1 y 4g: Determine the joint cdf and pdf of X and Y; the marginal
cdf and pdf of X and the marginal cdf and pdf of Y:
Answers: F (x; y) = x2 ( y 3 1 ) on [0; 2] [1; 4] (think how you'd de ne it outside that area).
2 F (x;y)
The joint pdf is then @ @x@y = 16 : We also have F (x) = x2 and F (y) = y 3 1 :
7
1.1.9 Conditional distribution
The marginal distributions de ned above dealt with the distribution of one of the variables in
a random vector for all possible values of the other. Next we study the distribution of one of
the variables given some xed value for the other.
where f (x; y) is the joint pdf of (X; Y ) and f (y) is the marginal pdf of Y:
Example: suppose we have a fair coin, which we toss twice. Assign 0 if tails occur and
1 if heads. De ne the RV X as the sum of the outcomes of the two throws and Y as the
di erence between the rst and second throw outcome.
X
0 1 2
-1 0 1/4 0
Y 0 1/4 0 1/4
1 0 1/4 0