Random Variables: Fall 2017 Instructor: Ajit Rajwade
Random Variables: Fall 2017 Instructor: Ajit Rajwade
Fall 2017
Instructor:
Ajit Rajwade
1
Topic Overview
Random variable: definition
Discrete and continuous random variables
Probability density function (pdf) and cumulative
distribution function (cdf)
Joint and conditional pdfs
Expectation and its properties
Variance and covariance
Markov’s and Chebyshev’s inequality
Weak law of large numbers
Moment generating functions
2
Random variable
In many random experiments, we are not always
interested in the observed values, but in some
numerical quantity determined by the observed values.
3
Random variable
Value of X (Denoted P(X=x)
This is called the probability
as x) where X = sum
mass function (pmf) table of
of 2 dice throws
the random variable X. If S is
2 1/36 the sample space, then
3 2/36 P(S) = P(union of all events of
the form X = x) = 1 (verify
4 3/36
from table).
5 4/36
6 5/36
7 6/36
8 5/36
9 4/36
10 3/36
11 2/36
12 1/36
4
Random variable: Notation
A random variable is usually denoted by an upper case
alphabet.
5
Random variable: discrete
Random variables whose values can be written as a
finite or infinite sequence are called discrete random
variables.
6
Random variable: continuous
Random variables that can take on values within a
continuum are called continuous random variables.
0 1 4
7
Random variable: continuous
For a continuous random variable, the probability that
it takes on any particular value within a continuum is
zero!
8
Random variable: continuous
Hence for a continuous random variable X, we
consider the cumulative distribution function (cdf)
FX(x) defined as P{X ≤ x}.
9
Random variable: continuous -
example
Consider a cdf of the form:
FX (x) = 0 for x ≤ 0, and
FX (x) = 1-exp(-x2) otherwise
10
Probability Density Function (pdf)
The pdf of a random variable X at a value x is the
derivative of its cumulative distribution function (cdf)
at that value x.
f
X ( x )dx 1
Properties: b
P( a X b) f X ( x )dx FX (b) FX ( a )
a
a
P( X a ) f X ( x )dx 0
11
a
fX(x)
dx
a b x
12
Probability Density Function
Another way of looking at this concept:
a / 2
P{a / 2 X a / 2} f
a / 2
X ( x )dx f (a )
P{a / 2 X a / 2}
f X (a ) lim 0
13
Examples: Popular families of PDFs
1
f X ( x) e ( x ) /( 2 )
2 2
14
Examples: Popular families of PDFs
1
f X ( x) ,a x b
Bounded uniform pdf: (b a )
0 otherwise
15
Expected Value (Expectation) of a
random variable
It is also called the mean value of the random variable.
E ( X ) xi P( X xi )
i
Example:
P( X x ) k / x 2 for x 1, x Z
E ( X ) xP ( X x ) k 1 / x See here.
x 1 x 1
Note : P( X x ) k / x 2 1 if k 6 / 2 See here.
x 1 x 1
18
Expected Value: examples
The expected value that shows up when you throw a
die is 1/6(1+2+3+4+5+6) = 3.5.
https://en.wikipedia.org/wiki/Roulette#/media/File:Roulette_casino.JPG
20
Expected value of a function of
random variable
Consider a function g(X) of a discrete random variable
X. The expected value of g(X) is defined as:
E ( g ( X )) g ( xi ) P( X xi )
i
21
Properties of expected value
E (ag ( X ) b) ( ag ( x ) b) f X ( x )dx
ag ( x ) f X ( x )dx bf X ( x )dx
aE ( g ( X )) b why ?
22
Properties of expected value
Suppose you want to predict the value of a random variable with a known
mean. On an average, what value will yield the least squared error?
Let X be the random variable and c be its predicted value.
We want to find c such that E(( X - c)2 ) is minimized.
Let be the mean of X .
Then
E((X-c) 2 ) E (( X c)2 )
E (( X )2 ( c)2 2( X )( c))
E (( X )2 ) E (( c)2 ) 2 E (( X )( c))
E (( X )2 ) ( c)2 0
E (( X )2 )
The expected value is the value that yields the least
mean squared prediction error!
23
The median
What minimizes the following quantity?
J (c) | x c | f X ( x )dx
c
F (c) | x c | f X ( x )dx | x c | f X ( x )dx
c
c
(c x ) f X ( x )dx ( x c) f X ( x )dx
c
c c
cf X ( x )dx xf X ( x )dx xf X ( x )dx cf X ( x )dx
c c
c
cFX (c) xf X ( x )dx xf X ( x )dx c(1 FX (c))
c
24
The median
c
J (c) cFX (c) xf X ( x )dx xf X ( x )dx c(1 FX (c))
c
In this derivation, we are assuming
c
that the two definite integrals of q(x)
J (c) cFX (c) q( x )dx q( x )dx c(1 FX (c)) exist! This proof won’t go through
c otherwise.
q( x ) xf X ( x )
25
The median
J (c) 2cFX (c) c 2Q(c) Q() Q()
J ' (c) 0
2cf X (c) 2 FX (c) 1 2q(c) 0
2cf X (c) 2 FX (c) 1 2cf X (c) 0
2 FX (c) 1 0
FX (c) 1 / 2
This is the median – by definition and it minimizes J(c). We can double check that J’’(c)
>= 0. Notice the peculiar definition of the median for the continuous case here! This
definition is not conceptually different from the discrete case, though. Also, note that the
median will not be unique if FX is not differentiable at c. This happens when FX is not
strictly increasing in some interval – say K = [c,c+ε] or [c-ε,c]. In such cases, all y ϵ K
will qualify as medians and all of them will produce the same value of J(y). This is
because fx(y) = 0 for y ϵ K.
26
Variance
The variance of a random variable X tells you how much
its values deviate from the mean – on an average.
27
Existence?
For some distributions, the variance (and hence
standard deviation) may not be defined, because the
integral may not have a finite value.
28
Variance: Alternative expression
The definition of variance is:
Var ( X ) E[( X )2 ] ( x )2 f X ( x )dx
Alternative expression:
Var ( X ) E[( X ) 2 ] E[ X 2 2 2 X ]
E[ X 2 ] 2 2 E[ X ]
E[ X 2 ] 2 2 2 why ?
E[ X 2 ] 2
E[ X 2 ] ( E[ X ]) 2
29
Variance: properties
Property:
Var (aX b) E[( aX b E (aX b)) 2 ]
E[( aX b (a b)) 2 ]
E [a 2 ( X ) 2 ]
a 2 E[( X )2 ] a 2Var ( X )
30
Probabilistic inequalities
Sometimes we know the mean or variance of a random
variable, and want to guess the probability that the
random variable can take on a certain value.
31
Probabilistic inequalities
Example: Let’s say the average annual salary offered to a
CSE Btech-4 student at IITB is $100,000. What’s the
probability that you (i.e. a randomly chosen student) will
get an offer of $110,000 or more? Additionally, if you
were told that the variance of the salary was 50,000,
what’s the probability that your package is between
$90,000 and $110,000?
32
Markov’s inequality
Let X be a random variable that takes only non-
negative values. For any a > 0, we have
P{X a} E[ X ] / a
33
Markov’s inequality
Proof:
E[ X ] xf X ( x )dx
0
a
xf X ( x )dx xf X ( x )dx
0 a
xf X ( x )dx
a
af X ( x )dx
a
a f X ( x )dx
a
34 aP{ X a} P{X a} E[ X ] / a
Chebyshev’s inequality
For a random variable X with mean μ and variance σ2,
we have for any value k > 0,
2
P{| X | k}
k2
Proof: follows from Markov’s inequality
35
Chebyshev’s inequality: another form
For a random variable X with mean μ and variance σ2,
we have for any value k > 0,
2
P{| X | k}
k2
If I replace k by kσ, I get the following:
1
P{| X | k } 2
k
36
Back to counting money!
Let X be the random variable indicating the annual
salary offered to you when you reach Btech-4
Then
100K
P{ X 110K} 0.9090 90%
110K
50 K
P{| X 100 K | 10 K } 0.0005 0.05%
10 K 10 K
P{| X 100 K | 10 K } 1 0.05% 99.5%
37
Back to the expected value
When I tell you that the expected value of a random
die variable is 3.5, what does this mean?
38
https://en.wikipedia.org/wiki/Law_of_large_numbers
39
Back to the expected value: weak
law of large numbers
This intuition has a rigorous theoretical justification in
a theorem known as the weak law of large numbers.
40
Back to the expected value: weak
law of large numbers
Let X1, X2,…,Xn be a sequence of independent and
identically distributed random variables each having
mean μ. Then for any ε > 0, we have:
Empirical (or
X X 2 ... X n
P{| 1 | } 0 as n sample) mean
n
Proof: follows immediately from Chebyshev’s
inequality
X 1 X 2 ... X n X 1 X 2 ... X n n 2 2
E( ) ,Var ( ) 2 ,
n n n n
X 1 X 2 ... X n 2
P{| | } 2
n n
X X 2 ... X n
41 lim n P{| 1 | } 0
n
The strong law of large numbers
The strong law of large numbers states the following:
X 1 X 2 ... X n
P(lim n ) 1
n
This is stronger than the weak law because this states that
the probability of the desired event (that the empirical mean
is equal to the actual mean) is equal to 1 given enough
samples. The weak laws states that it tends to 1.
43
(The incorrect) Law of averages
Let’s say a gambler independently tosses an unbiased
coin 20 times, and gets a head each time. He now
applies the “law of averages” and believes that it is
more likely that the next coin toss will yield a tail.
44
Joint distributions/pdfs/pmfs
45
Jointly distributed random variables
Many times in statistics, one needs to model
relationships between two or more random variables –
for example, your CPI at IITB and the annual salary
offered to you during placements!
46
Joint CDFs
Given continuous random variables X and Y, their joint
cumulative distribution function (cdf) is defined as:
FXY ( x, y ) P( X x, Y y )
The distribution of either random variable (called as
marginal cdf) can be obtained from the joint distribution as
follows:
FX ( x ) P( X x, Y ) FXY ( x, ) I’ll explain this
a few slides
FY ( y ) P( X , Y y ) FXY (, y ) further down
47
Joint PMFs
Given two discrete random variables X and Y, their
joint probability mass function (pmf) is defined as:
pXY ( xi , y j ) P( X xi , Y y j )
P{ X xi } P({ X xi , Y y j }) Why?
j
P{ X xi , Y y j } p( xi , y j )
48 j j
Joint PMFs: Example
Consider that in a city 15% of the families are childless, 20% have
only one child, 35% have two children and 30% have three
children. Let us suppose that male and female child are equally
likely and independent.
What is the probability that a randomly chosen family has no
children?
P(B = 0, G = 0) = 0.15 = P(no children)
Has 1 girl child?
P(B=0,G=1)=P(1 child) P(G=1|1 child) = 0.2 x 0.5 = 0.1
Has 3 girls?
P(B = 0, G = 3) = P(3 children) P(G=3 | 3 Children) = 0.3 x (0.5)3
Has 2 boys and 1 girl?
P(B = 2, G = 1) = P(3 children) P(B = 2, G = 1| 3 children) = 0.3 x
(1/8) x 3 = 0.1125 (all 8 combinations of 3 children are equally
49 likely. Out of these there are 3 of the form 2 boys + 1 girl)
Joint PDFs
For two jointly continuous random variables X and Y,
the joint pdf is a non-negative function fXY(x,y) such
that for any set C in the two-dimensional plane, we
have:
P{( X , Y ) C} f
( x , y )C
XY ( x, y )dxdy
2
f XY (a, b) FXY ( x, y ) |x a , y b
50 xy
Y
X
The joint probability that (X,Y) belongs to any
arbitrary-shaped region in the XY-plane is
obtained by integrating the joint pdf of (X,Y)
over that region (eg: region C)
51
Joint and marginal PDFs
The marginal pdf of a random variable can be
obtained by integrating the joint pdf w.r.t. the other
random variable(s):
f X ( x) f
XY ( x, y )dy f X ( x) f XY ( x, y )dy
a a
fY ( y ) f
XY ( x, y )dx
f X ( x )dx f XY ( x, y )dydx
FX ( a ) FXY ( a, )
52
Independent random variables
Two continuous random variables are said to be
independent if and only if:
x, y, f XY ( x, y ) f X ( x) fY ( y )
i.e., the joint pdf is equal to the product of the
marginal pdfs.
53
Independent random variables
Some n continuous random variables X1, X2, …, Xn are said to be
mutually independent if and only if for any finite subset of k
random variables Xi1, Xi2,…, Xik and finite sequence of number
x1, x2,…, xk , the events Xi1 ≤ x1, Xi2 ≤ x2,…, Xik ≤ xk are mutually
independent.
As a consequence
x1 , x2 ,..., xn ,
f X1 , X 2 ,...,X n ( x1 , x2 ,..., xn ) f X1 ( x1 ) f X 2 ( x2 )... f X n ( xn )
i.e., the joint pdf is equal to the product of all n marginal pdfs.
54
f X i , X j ( xi , x j ) f X i ( xi ) f X j ( x j )
Independent random variables
Mutual independence between n random variables
implies that they are pairwise independent, or in fact,
k-wise independent for any k < n.
55
Independent random variables
Consider A = {1,2}, B = {1,3}, C = {1,4}.
56
Concept of covariance
The covariance of two random variables X and Y is
defined as follows:
Cov( X , Y ) E[( X X )(Y Y )]
Further expansion:
Cov( X , Y ) E[( X X )(Y Y )]
E[ XY X Y Y X X Y ]
E[ XY ] X Y Y X X Y why ?
E[ XY ] X Y
E[ XY ] E[ X ]E[Y ]
57
Concept of covariance: properties
Cov(X,Y) = Cov(Y, X)
58
Concept of covariance: properties
Cov( X Z , Y ) Cov( X , Y ) Cov( Z , Y )
Pr oof :
Cov( X Z , Y ) E[( X Z )Y ] E[ X Z ]E[Y ]
E[ XY ZY ] E[ X ]E[Y ] E[ Z ]E[Y ]
E[ XY ] E[ X ]E[Y ] E[ ZY ] E[ Z ]E[Y ]
Cov( X , Y ) Cov( Z , Y )
59
Concept of covariance: properties
Cov( X i , Y ) Cov( X i , Y )
i i
Cov( X i , Y j ) Cov( X i , Y j )
i j i j
Var ( X i ) Cov( X i , X j )
i i j i
60
Concept of covariance: properties
For independent random variables X and Y, Cov(X,Y) =
0, i.e. E[XY] = E[X]E[Y].
Proof:
E[ X ]E[Y ]
61
Concept of covariance: properties
Given random variables X and Y, Cov(X,Y) = 0 does
not necessarily imply that X and Y are independent!
62
Conditional pdf/cdf/pmf
Given random variables X and Y with joint pdf fXY(x,y),
then the conditional pdf of X given Y = y is defined as
follows:
f XY ( x, y )
f X |Y ( x | y ) FX |Y ( x | y )
fY ( y ) x
x
f X ,Y ( z, y )
dz
f Y ( y )
http://math.arizona.edu/~jwatkins/m-conddist.pdf
63
Conditional pdf/cdf/pmf
Conditional cdf FX|Y(x,y):
P ( X x, y Y y )
P( X x | y Y y )
P( y Y y )
F ( x, y ) FXY ( x, y )
XY
( FY ( y ) FY ( y ))
(FXY ( x, y ) / y )
fY ( y )
(FXY ( x, y ) / y )
fY ( y )
FXY ( x, y ) / y
f X |Y ( x | y ) FX |Y ( x | y )
f ( x, y )
f X |Y ( x | y )dx XY dx
x x fY ( y )
fY ( y )
2 FXY ( x, y ) / xy f XY ( x, y ) 1
fY ( y )
fY ( y ) fY ( y )
fY ( y )
f XY ( x , y ) dx
fY ( y )
1
http://math.arizona.edu/~jwatkins/m-conddist.pdf
64
Conditional mean and variance
Conditional densities or distributions can be used to
define the conditional mean (also called conditional
expectation) or conditional variance as follows:
E( X | Y y) xf
X |Y ( x | y )dx
Var ( X | Y y ) ( x E ( X | Y y )) 2 f X |Y ( x | y )dx
65
Example
f ( x, y ) 2.4 x (2 x y ),0 x 1,0 y 1
0 otherwise
Find conditional density of X given Y y.
Find conditional mean of X given Y y.
66
Moment Generating Functions
67
Definition
The moment of random variable X of order n is
defined as follows mn = E(Xn).
68
Why is it so called?
Because of:
mi E X i , i 1
69
Key property
Differentiating the MGF w.r.t. the parameter t yields
the different moments of X.
(t ) E e E e E ( Xe tX )
' d tX d tX
X
dt dt
X' (0) E ( X )
(t )
( 2)
X
d
dt
E Xe tX E ( X 2 etX )
X( 2) (0) E ( X 2 )
....
X( n ) (0) E ( X n )
70
Other properties
If Y = aX+b, then we have: Y ( t ) e tb
X (at )
71
Uniqueness
For a discrete random variable with finite range, the
MGF and PMF uniquely determine each other.
Proof:
X (t ) E (etX ) p( X x)etX PMF uniquely determines MGF.
x
To prove the converse, consider t hat X takes on some n values.
Consider some n values of t as well. Then we have :
n
X (tk ) et x P( X xi )
k i
i 1
Vectors with n elements
Matrix of size n x n
φ X Mp
72
Uniqueness: Another proof
If two discrete random variables X and Y have MGFs
X(t) and Y(t) that both exist and X(t) = Y(t) for all t,
then X and Y have the same probability mass function.
Proof for discrete random variables:
X (t ) Y (t )
etx p( X x) ety p(Y y ) etx p(Y x)
x y x
s x cx 0 where s et , cx p( X x) p(Y x)
x
73
Uniqueness: Continuous case
The uniqueness theorem is also applicable to
continuous random variables, although we do not
prove it here.
74