0% found this document useful (0 votes)

49 views

Random Variables: Fall 2017 Instructor: Ajit Rajwade

This document provides an overview of random variables. It defines discrete and continuous random variables and their probability mass functions and probability density functions. It discusses key concepts such as expectation, variance, and common probability distributions. Examples are provided to illustrate concepts like expected value, probability density functions, and properties of expectation.

Uploaded by

Ashwani Sharma

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views

Random Variables: Fall 2017 Instructor: Ajit Rajwade

Uploaded by

Ashwani Sharma

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 74

Random Variables

Fall 2017
Instructor:
Ajit Rajwade

1
Topic Overview
 Random variable: definition
 Discrete and continuous random variables
 Probability density function (pdf) and cumulative
distribution function (cdf)
 Joint and conditional pdfs
 Expectation and its properties
 Variance and covariance
 Markov’s and Chebyshev’s inequality
 Weak law of large numbers
 Moment generating functions

2
Random variable
 In many random experiments, we are not always
interested in the observed values, but in some
numerical quantity determined by the observed values.

 Example: we may be interested in the sum of the

values of two dice throws, or the number of heads
appearing in n consecutive coin tosses.

 Any such quantities determined by the results of

random experiments are called as random variables
(they may also be the observations themselves).

3
Random variable
Value of X (Denoted P(X=x)
This is called the probability
as x) where X = sum
mass function (pmf) table of
of 2 dice throws
the random variable X. If S is
2 1/36 the sample space, then
3 2/36 P(S) = P(union of all events of
the form X = x) = 1 (verify
4 3/36
from table).
5 4/36
6 5/36
7 6/36
8 5/36
9 4/36
10 3/36
11 2/36
12 1/36
4
Random variable: Notation
 A random variable is usually denoted by an upper case
alphabet.

 Individual values the random variable can acquire are

denoted by lower case.

5
Random variable: discrete
 Random variables whose values can be written as a
finite or infinite sequence are called discrete random
variables.

 Example: results of coin toss or random dice

experiments

 The probability that a random variable X takes on

value x, i.e. P(X=x), is called as the probability mass
function.

6
Random variable: continuous
 Random variables that can take on values within a
continuum are called continuous random variables.

 Example: the dimensions (length, height, width,

weight) of an object are usually continuous quantities,
direction of a vector, amount of water that can be
stored in a 4 litre jar is a continuous random variable in
the interval [0,4].

0 1 4
7
Random variable: continuous
 For a continuous random variable, the probability that
it takes on any particular value within a continuum is
zero!

 Why? Because there are infinitely many values – say

in the interval [0,4] in the example on the previous
slide. Each value will be equally likely.

 Note: Zero probability in case of continuous random

variables does not mean the event will never occur!
This differs from the discrete case.

8
Random variable: continuous
 Hence for a continuous random variable X, we
consider the cumulative distribution function (cdf)
FX(x) defined as P{X ≤ x}.

 The cdf is basically the probability that X takes on a

value less than or equal to x.

 The cdf can be used to compute cumulative interval

measures, that is the probability that X takes on a
value greater than a and less than or equal to b, i.e.
P(a < X ≤ b) = FX (b) -FX (a ).

9
Random variable: continuous -
example
 Consider a cdf of the form:
FX (x) = 0 for x ≤ 0, and
FX (x) = 1-exp(-x2) otherwise

 To find: probability that X exceeds 1

 P(X > 1) = 1-P(X ≤1)=1-FX (1) = e-1

10
Probability Density Function (pdf)
 The pdf of a random variable X at a value x is the
derivative of its cumulative distribution function (cdf)
at that value x.

 It is a non-negative function fX(x) such that for any set

B of real numbers, we have P{ X  B}   f X ( x )dx
B



f

X ( x )dx  1

 Properties: b
P( a  X  b)   f X ( x )dx  FX (b)  FX ( a )
a
a
P( X  a )   f X ( x )dx  0
11
a
fX(x)

a b x

The area beneath the blue curve in between the

lines x = a and x = b is the cumulative interval
measure P(a < X ≤ b) = FX (b) -FX (a ).

fX(a)dx = probability that the random variable

X takes on values between a and a+dx.

12
Probability Density Function
 Another way of looking at this concept:
a  / 2
P{a   / 2  X  a   / 2}   f
a / 2
X ( x )dx  f (a )

P{a   / 2  X  a   / 2}
f X (a )  lim  0


13
Examples: Popular families of PDFs
1
f X ( x)  e ( x  ) /( 2 )
2 2

 Gaussian (normal) pdf:  2

14
Examples: Popular families of PDFs
1
f X ( x)  ,a  x  b
 Bounded uniform pdf: (b  a )
 0 otherwise

15
Expected Value (Expectation) of a
random variable
 It is also called the mean value of the random variable.

 For a discrete random variable X, it is defined as:

E ( X )   xi P( X  xi )
i

 For a continuous random variable X, it is defined as:


E( X )   xf

X ( x )dx

 The expected value should not be (mis)interpreted to be the

value that X usually takes on – it’s the average value, not
the “most frequently occurring value”.
16
Expected Value (Expectation) of a
random variable
 For some pdfs, the expected value is not always
defined, i.e. the integral below may not have a finite
value.

E( X )   xf

X ( x )dx

 One example is the pdf for the Pareto distribution

(under some parameters) given as:
xm xm and α are parameters
f X ( x |  , xm )  for x  xm , otherwise 0
x 1 of the pdf for the Pareto
1  distribution. Verify this
 x 
E ( X )  xm     if   1 result for E(X) on your
 1    xm own.
17
Expected Value (Expectation) of a
random variable
 Likewise for some discrete random variables which
take on infinitely many values, the expected value may
not be defined, i.e. we may have
E ( X )   xi P( X  xi )  
i

 Example:

P( X  x )  k / x 2 for x  1, x  Z 
 
E ( X )   xP ( X  x )  k 1 / x   See here.
x 1 x 1
 
Note :  P( X  x )   k / x 2 1 if k  6 /  2 See here.
x 1 x 1

18
Expected Value: examples
 The expected value that shows up when you throw a
die is 1/6(1+2+3+4+5+6) = 3.5.

 The game of roulette consists of a ball and wheel with

38 numbered pockets on its side. The ball rolls and
settles on one of the pockets. If the number in the
pocket is the same as the one you guessed, you win
$35 (probability 1/38), otherwise you lose $1
(probability 37/38). The expected value of the amount
you earn after one trial is: (-1)37/38 +(35)1/38 =
$-0.0526
19
A Game of Roulette

https://en.wikipedia.org/wiki/Roulette#/media/File:Roulette_casino.JPG
20
Expected value of a function of
random variable
 Consider a function g(X) of a discrete random variable
X. The expected value of g(X) is defined as:
E ( g ( X ))   g ( xi ) P( X  xi )
i

 For a continuous random variable, the expected value

of g(X) is defined as:

E ( g ( X ))   g ( x) f

X ( x )dx

21
Properties of expected value

E (ag ( X )  b)   ( ag ( x )  b) f X ( x )dx

 
  ag ( x ) f X ( x )dx   bf X ( x )dx
 

 aE ( g ( X ))  b    why ?

This property is called the linearity of the expected value. In general, a

function f(x) is said to be linear in x is f(ax+b) = af(x)+b where a and b are
constants. In this case, the expected value is not a function but an operator
(it takes a function as input). An operator E is said to be linear if
E(af(x) + b) = a E(f(x)) + b.

22
Properties of expected value
Suppose you want to predict the value of a random variable with a known
mean. On an average, what value will yield the least squared error?
Let X be the random variable and c be its predicted value.
We want to find c such that E(( X - c)2 ) is minimized.
Let  be the mean of X .
Then
E((X-c) 2 )  E (( X      c)2 )
 E (( X   )2  (   c)2  2( X   )(   c))
 E (( X   )2 )  E ((   c)2 )  2 E (( X   )(   c))
 E (( X   )2 )  (   c)2  0
 E (( X   )2 )
The expected value is the value that yields the least
mean squared prediction error!
23
The median
 What minimizes the following quantity?

J (c)   | x  c | f X ( x )dx


c 
F (c)   | x  c | f X ( x )dx   | x  c | f X ( x )dx
 c

c 
  (c  x ) f X ( x )dx   ( x  c) f X ( x )dx
 c

c c  
  cf X ( x )dx   xf X ( x )dx   xf X ( x )dx   cf X ( x )dx
  c c

c 
 cFX (c)   xf X ( x )dx   xf X ( x )dx  c(1  FX (c))
 c

24
The median
c 
J (c)  cFX (c)   xf X ( x )dx   xf X ( x )dx  c(1  FX (c))
 c
In this derivation, we are assuming
c 
that the two definite integrals of q(x)
J (c)  cFX (c)   q( x )dx   q( x )dx  c(1  FX (c)) exist! This proof won’t go through
 c otherwise.
q( x )  xf X ( x )

J (c)  cFX (c)  (Q (c)  Q ( ))  (Q ()  Q (c))  c(1  FX (c))

(Q ( x )   xf X ( x )dx )
 2cFX (c)  c  2Q (c)  Q ()  Q ( )

25
The median
J (c)  2cFX (c)  c  2Q(c)  Q()  Q()

J ' (c)  0
 2cf X (c)  2 FX (c)  1  2q(c)  0
 2cf X (c)  2 FX (c)  1  2cf X (c)  0
 2 FX (c)  1  0
 FX (c)  1 / 2

This is the median – by definition and it minimizes J(c). We can double check that J’’(c)
>= 0. Notice the peculiar definition of the median for the continuous case here! This
definition is not conceptually different from the discrete case, though. Also, note that the
median will not be unique if FX is not differentiable at c. This happens when FX is not
strictly increasing in some interval – say K = [c,c+ε] or [c-ε,c]. In such cases, all y ϵ K
will qualify as medians and all of them will produce the same value of J(y). This is
because fx(y) = 0 for y ϵ K.
26
Variance
 The variance of a random variable X tells you how much
its values deviate from the mean – on an average.

 The definition of variance is:


Var ( X )  E[( X   ) ]   ( x  )2 f X ( x )dx
2



 The positive square-root of the variance is called the

standard deviation.

 Low-variance probability mass functions or probability

densities tend to be concentrated around one point. High
variance densities are spread out.

27
Existence?
 For some distributions, the variance (and hence
standard deviation) may not be defined, because the
integral may not have a finite value.

 Example: Pareto distribution (see slides on expectation

for definition) for α < 2.

 Note in some cases the mean is defined, but the

variance is not. In some cases both are undefined.
However, if the mean is undefined, then the variance
will be undefined too (why?).

28
Variance: Alternative expression
 The definition of variance is:

Var ( X )  E[( X   )2 ]   ( x  )2 f X ( x )dx


 Alternative expression:

Var ( X )  E[( X   ) 2 ]  E[ X 2   2  2 X ]
 E[ X 2 ]   2  2 E[ X ]
 E[ X 2 ]   2  2  2    why ?
 E[ X 2 ]   2
 E[ X 2 ]  ( E[ X ]) 2

29
Variance: properties
 Property:
Var (aX  b)  E[( aX  b  E (aX  b)) 2 ]
 E[( aX  b  (a  b)) 2 ]
 E [a 2 ( X   ) 2 ]
 a 2 E[( X   )2 ]  a 2Var ( X )

30
Probabilistic inequalities
 Sometimes we know the mean or variance of a random
variable, and want to guess the probability that the
random variable can take on a certain value.

 The exact probability can usually not be computed as

the information is too less. But we can get upper or
lower bounds on this probability which can influence
our decision-making processes.

31
Probabilistic inequalities
 Example: Let’s say the average annual salary offered to a
CSE Btech-4 student at IITB is $100,000. What’s the
probability that you (i.e. a randomly chosen student) will
get an offer of $110,000 or more? Additionally, if you
were told that the variance of the salary was 50,000,
what’s the probability that your package is between
$90,000 and $110,000?

32
Markov’s inequality
 Let X be a random variable that takes only non-
negative values. For any a > 0, we have
P{X  a}  E[ X ] / a

 Proof: next slide

33
Markov’s inequality

 Proof:
E[ X ]   xf X ( x )dx
0
a 
  xf X ( x )dx   xf X ( x )dx
0 a


  xf X ( x )dx
a


  af X ( x )dx
a


 a  f X ( x )dx
a
34  aP{ X  a}  P{X  a}  E[ X ] / a
Chebyshev’s inequality
 For a random variable X with mean μ and variance σ2,
we have for any value k > 0,
2
P{| X   | k} 
k2
 Proof: follows from Markov’s inequality

( X   )2 is a non - negative random variable

 P{( X   )2  k 2 }  E[( X   )2 ] / k 2   2 / k 2
 P{| X   | k}   2 / k 2

35
Chebyshev’s inequality: another form
 For a random variable X with mean μ and variance σ2,
we have for any value k > 0,
2
P{| X   | k} 
k2
 If I replace k by kσ, I get the following:
1
P{| X   | k }  2
k

36
Back to counting money! 
 Let X be the random variable indicating the annual
salary offered to you when you reach Btech-4 

 Then

100K
P{ X  110K}   0.9090  90%
110K
50 K
P{| X  100 K | 10 K }   0.0005  0.05%
10 K  10 K
 P{| X  100 K | 10 K }  1  0.05%  99.5%

37
Back to the expected value
 When I tell you that the expected value of a random
die variable is 3.5, what does this mean?

 If I throw the die n times, and average the results, I

should get a value close to 3.5 provided n is very large
(not valid if n is small).

 As n increases, the average value should move closer

and closer towards 3.5.

 That’s our basic intuition!

38
https://en.wikipedia.org/wiki/Law_of_large_numbers

39
Back to the expected value: weak
law of large numbers
 This intuition has a rigorous theoretical justification in
a theorem known as the weak law of large numbers.

 Let X1, X2,…,Xn be a sequence of independent and

identically distributed random variables each having
mean μ. Then for any ε > 0, we have:
X 1  X 2  ...  X n
P{|   |  }  0 as n  
n

40
Back to the expected value: weak
law of large numbers
 Let X1, X2,…,Xn be a sequence of independent and
identically distributed random variables each having
mean μ. Then for any ε > 0, we have:
Empirical (or
X  X 2  ...  X n
P{| 1   |  }  0 as n   sample) mean
n
 Proof: follows immediately from Chebyshev’s
inequality
X 1  X 2  ...  X n X 1  X 2  ...  X n n 2  2
E( )   ,Var ( ) 2  ,
n n n n
X 1  X 2  ...  X n 2
 P{|   |  }  2
n n
X  X 2  ...  X n
41  lim n P{| 1   |  }  0
n
The strong law of large numbers
 The strong law of large numbers states the following:

X 1  X 2  ...  X n
P(lim n  )  1
n

 This is stronger than the weak law because this states that
the probability of the desired event (that the empirical mean
is equal to the actual mean) is equal to 1 given enough
samples. The weak laws states that it tends to 1.

 The proof of the strong law is formidable and beyond the

scope of our course.
42
(The incorrect) Law of averages
 As laymen we tend to believe that if something has
been going wrong for quite some time, it will suddenly
turn right – using the law of averages.

 This supposed law is actually a fallacy – it reflects

wishful thinking, and the core mistake is that we
mistake the distribution of samples among a small set
of outcomes for the distribution of a larger set.

 This is also called as Gambler’s fallacy.

43
(The incorrect) Law of averages
 Let’s say a gambler independently tosses an unbiased
coin 20 times, and gets a head each time. He now
applies the “law of averages” and believes that it is
more likely that the next coin toss will yield a tail.

 The mistake is as follows: The probability of getting

all 21 heads = (1/2)21. The probability of getting 20
heads and 1 tail also = (1/2)21.

44
Joint distributions/pdfs/pmfs

45
Jointly distributed random variables
 Many times in statistics, one needs to model
relationships between two or more random variables –
for example, your CPI at IITB and the annual salary
offered to you during placements!

 Another example: average amount of sugar consumed

per day and blood sugar level recorded in a blood test.

 Another example: literacy level and crime rate.

46
Joint CDFs
 Given continuous random variables X and Y, their joint
cumulative distribution function (cdf) is defined as:
FXY ( x, y )  P( X  x, Y  y )
 The distribution of either random variable (called as
marginal cdf) can be obtained from the joint distribution as
follows:
FX ( x )  P( X  x, Y  )  FXY ( x, ) I’ll explain this
a few slides
FY ( y )  P( X  , Y  y )  FXY (, y ) further down

 These definitions can extended to handle more than two

random variables as well.

47
Joint PMFs
 Given two discrete random variables X and Y, their
joint probability mass function (pmf) is defined as:
pXY ( xi , y j )  P( X  xi , Y  y j )

 The pmf of either random variable (called as marginal

pmf) can be obtained from the joint distribution as
follows:

P{ X  xi }  P({ X  xi , Y  y j }) Why?
j

  P{ X  xi , Y  y j }   p( xi , y j )
48 j j
Joint PMFs: Example
 Consider that in a city 15% of the families are childless, 20% have
only one child, 35% have two children and 30% have three
children. Let us suppose that male and female child are equally
likely and independent.
 What is the probability that a randomly chosen family has no
children?
 P(B = 0, G = 0) = 0.15 = P(no children)
 Has 1 girl child?
 P(B=0,G=1)=P(1 child) P(G=1|1 child) = 0.2 x 0.5 = 0.1
 Has 3 girls?
 P(B = 0, G = 3) = P(3 children) P(G=3 | 3 Children) = 0.3 x (0.5)3
 Has 2 boys and 1 girl?
 P(B = 2, G = 1) = P(3 children) P(B = 2, G = 1| 3 children) = 0.3 x
(1/8) x 3 = 0.1125 (all 8 combinations of 3 children are equally
49 likely. Out of these there are 3 of the form 2 boys + 1 girl)
Joint PDFs
 For two jointly continuous random variables X and Y,
the joint pdf is a non-negative function fXY(x,y) such
that for any set C in the two-dimensional plane, we
have:
P{( X , Y )  C}   f
( x , y )C
XY ( x, y )dxdy

 The joint CDF can be obtained from the joint PDF as

follows:
a b
FXY (a, b)  f
  
XY ( x, y )dxdy

2
f XY (a, b)  FXY ( x, y ) |x a , y b
50 xy
Y

X
The joint probability that (X,Y) belongs to any
arbitrary-shaped region in the XY-plane is
obtained by integrating the joint pdf of (X,Y)
over that region (eg: region C)

51
Joint and marginal PDFs
 The marginal pdf of a random variable can be
obtained by integrating the joint pdf w.r.t. the other
random variable(s):
 
f X ( x)  f

XY ( x, y )dy f X ( x)  f XY ( x, y )dy

 a a 
fY ( y )  f

XY ( x, y )dx
f X ( x )dx  f XY ( x, y )dydx
   

 FX ( a )  FXY ( a, )

52
Independent random variables
 Two continuous random variables are said to be
independent if and only if:
x, y, f XY ( x, y )  f X ( x) fY ( y )
i.e., the joint pdf is equal to the product of the
marginal pdfs.

 For independent random variables, the joint CDF is

also equal to the product of the marginal CDFs:
FXY ( x, y )  FX ( x) FY ( y ) Try proving this yourself!

53
Independent random variables
 Some n continuous random variables X1, X2, …, Xn are said to be
mutually independent if and only if for any finite subset of k
random variables Xi1, Xi2,…, Xik and finite sequence of number
x1, x2,…, xk , the events Xi1 ≤ x1, Xi2 ≤ x2,…, Xik ≤ xk are mutually
independent.

 As a consequence
x1 , x2 ,..., xn ,
f X1 , X 2 ,...,X n ( x1 , x2 ,..., xn )  f X1 ( x1 ) f X 2 ( x2 )... f X n ( xn )
i.e., the joint pdf is equal to the product of all n marginal pdfs.

 Note that this condition is stronger than pairwise independence!

( xi , x j ),1  i  n,1  j  n, i  j,

54
f X i , X j ( xi , x j )  f X i ( xi ) f X j ( x j )
Independent random variables
 Mutual independence between n random variables
implies that they are pairwise independent, or in fact,
k-wise independent for any k < n.

 But pairwise independence does not necessarily imply

mutual independence.

 Example: Consider a sample space {1,2,3,4} where

each singleton element is equally likely to be chosen.

55
Independent random variables
 Consider A = {1,2}, B = {1,3}, C = {1,4}.

 Then P(A) = P(B) = P(C) = 1/2. P(ABC) = P({1}) = ¼

≠ P(A)P(B)P(C) implying that A,B,C are not mutually
independent.

 But P(AB) = ¼ = P(A)P(B) and likewise for AC, BC.

56
Concept of covariance
 The covariance of two random variables X and Y is
defined as follows:
Cov( X , Y )  E[( X   X )(Y  Y )]

 Further expansion:
Cov( X , Y )  E[( X   X )(Y  Y )]
 E[ XY   X Y  Y X   X Y ]
 E[ XY ]   X Y  Y  X   X Y    why ?
 E[ XY ]   X Y
 E[ XY ]  E[ X ]E[Y ]

57
Concept of covariance: properties
 Cov(X,Y) = Cov(Y, X)

 Cov(X, X) = Var(X) [verify this yourself!]

 Cov(aX,Y) = aCov(X,Y) [prove this!]

 Relationship with correlation coefficient:

Cov( X , Y )
r( X ,Y ) 
Var ( X )Var (Y )

58
Concept of covariance: properties
Cov( X  Z , Y )  Cov( X , Y )  Cov( Z , Y )
Pr oof :
Cov( X  Z , Y )  E[( X  Z )Y ]  E[ X  Z ]E[Y ]
 E[ XY  ZY ]  E[ X ]E[Y ]  E[ Z ]E[Y ]
 E[ XY ]  E[ X ]E[Y ]  E[ ZY ]  E[ Z ]E[Y ]
 Cov( X , Y )  Cov( Z , Y )

Cov(  X i , Y )   Cov( X i , Y ) Try proving this

i i
yourself! Along
Cov(  X i , Y j )   Cov( X i , Y j )
similar lines as the
previous one.
i j i j

59
Concept of covariance: properties
Cov(  X i , Y )   Cov( X i , Y )
i i

Cov(  X i , Y j )   Cov( X i , Y j )
i j i j

Var (  X i )  Cov(  X i ,  X i ) Notice that the variance of the

i i i sum of random variables is not
  Cov( X i , X j ) equal to the sum of their
i j individual variances. This is quite
  Cov( X i , X i )   Cov( X i , X j )
unlike the mean!
i i j i

 Var ( X i )   Cov( X i , X j )
i i j i

60
Concept of covariance: properties
 For independent random variables X and Y, Cov(X,Y) =
0, i.e. E[XY] = E[X]E[Y].

 Proof:

E[ XY ]   xi y j P{ X  xi , Y  y j } Cov( X , Y )  E[( X   X )(Y  Y )]

i j
 E[ XY ]   X E[Y ]  Y E[ X ]   X Y
  xi y j P{ X  xi }P{Y  y j }
i j  E[ XY ]  E[ X ]E[Y ]  0
  xi P{ X  xi } y j P{Y  y j }
i j

 E[ X ]E[Y ]

61
Concept of covariance: properties
 Given random variables X and Y, Cov(X,Y) = 0 does
not necessarily imply that X and Y are independent!

 Proof: Construct a counter-example yourself!

62
Conditional pdf/cdf/pmf
 Given random variables X and Y with joint pdf fXY(x,y),
then the conditional pdf of X given Y = y is defined as
follows:
f XY ( x, y ) 
f X |Y ( x | y )   FX |Y ( x | y )
fY ( y ) x

 Conditional cdf FX|Y(x,y):

x
FX |Y ( x | y )  lim  0 P( X  x | y  Y  y   )  f

X |Y ( z | y )dz

x
f X ,Y ( z, y )
  dz

f Y ( y )

http://math.arizona.edu/~jwatkins/m-conddist.pdf
63
Conditional pdf/cdf/pmf
 Conditional cdf FX|Y(x,y):
P ( X  x, y  Y  y   )
P( X  x | y  Y  y   ) 
P( y  Y  y   )
F ( x, y   )  FXY ( x, y )
 XY
( FY ( y   )  FY ( y ))
(FXY ( x, y ) / y )

fY ( y )
(FXY ( x, y ) / y )

fY ( y )

   FXY ( x, y ) / y   
f X |Y ( x | y )  FX |Y ( x | y )   
f ( x, y )
  f X |Y ( x | y )dx   XY dx
x x  fY ( y )   
fY ( y )
 2 FXY ( x, y ) / xy f XY ( x, y ) 1

fY ( y )
 
fY ( y ) fY ( y )
 
fY ( y )  
f XY ( x , y ) dx 
fY ( y )
1

http://math.arizona.edu/~jwatkins/m-conddist.pdf
64
Conditional mean and variance
 Conditional densities or distributions can be used to
define the conditional mean (also called conditional
expectation) or conditional variance as follows:

E( X | Y  y)   xf

X |Y ( x | y )dx


Var ( X | Y  y )   ( x  E ( X | Y  y )) 2 f X |Y ( x | y )dx


65
Example
f ( x, y )  2.4 x (2  x  y ),0  x  1,0  y  1
 0 otherwise
Find conditional density of X given Y  y.
Find conditional mean of X given Y  y.

66
Moment Generating Functions

67
Definition
 The moment of random variable X of order n is
defined as follows mn = E(Xn).

 The moment generating function (MGF) of a random

variable X is defined as follows:

X (t )  E etX    etx P( X  x ) (discrete r.v.)

x

 

f X ( x )etx dx (continuous r.v.)

68
Why is it so called?
 Because of:

etX  1  tX  (tX )2 / 2!  (tX )3 / 3!  ....

X (t )  E etX   1  tm1  t 2m2 / 2!t 3m3 / 3!...

mi  E X i , i  1

69
Key property
 Differentiating the MGF w.r.t. the parameter t yields
the different moments of X.

 (t )  E e   E  e   E ( Xe tX )
' d tX  d tX 
X
dt  dt 
X' (0)  E ( X )

 (t ) 
( 2)
X
d
dt
  
E Xe tX  E ( X 2 etX )

 X( 2) (0)  E ( X 2 )
....
 X( n ) (0)  E ( X n )
70
Other properties

 If Y = aX+b, then we have: Y ( t )  e tb
X (at )

 If Y and X are independent, then:  X Y (t )   X (t )Y (t )

 Let X and Y be random variables. Let Z be a third r.v.

which is equal to X with probability p, and equal to Y
with probability 1-p. Then we have:
Z (t )  p X (t )  (1  p)Y (t )

71
Uniqueness
 For a discrete random variable with finite range, the
MGF and PMF uniquely determine each other.
 Proof:
 X (t )  E (etX )   p( X  x)etX  PMF uniquely determines MGF.
x
To prove the converse, consider t hat X takes on some n values.
Consider some n values of t as well. Then we have :
n
X (tk )   et x P( X  xi )
k i

i 1
Vectors with n elements
Matrix of size n x n
 φ X  Mp

The matrix M has a special form that makes it invertible.

Hence p  M1φ X is uniquely determined .
Proof here.

72
Uniqueness: Another proof
 If two discrete random variables X and Y have MGFs
X(t) and Y(t) that both exist and X(t) = Y(t) for all t,
then X and Y have the same probability mass function.
 Proof for discrete random variables:

 X (t )  Y (t )
  etx p( X  x)   ety p(Y  y )   etx p(Y  x)
x y x

  etx ( p( X  x)  p(Y  x))  0

  s x cx  0 where s  et , cx  p( X  x)  p(Y  x)
x

This is a polynomial in s with coefficien ts {c x }. The polynomial

can be 0 for all values of s, iff cx  0. Hence p( X  x)  p(Y  x) for all x.

73
Uniqueness: Continuous case
 The uniqueness theorem is also applicable to
continuous random variables, although we do not
prove it here.

Equipment ''Hygrapha''
No ratings yet
Equipment ''Hygrapha''
46 pages
F - FINAL COMPREHENSIVE EXAM GELERA - Docx - 140790664
No ratings yet
F - FINAL COMPREHENSIVE EXAM GELERA - Docx - 140790664
16 pages
E-06 Marine Quartz Chronometer System
No ratings yet
E-06 Marine Quartz Chronometer System
60 pages
Random Variables PDF
No ratings yet
Random Variables PDF
64 pages
Random Variables
No ratings yet
Random Variables
44 pages
RV Intro
No ratings yet
RV Intro
5 pages
Random Variables and Probability Distribution
No ratings yet
Random Variables and Probability Distribution
73 pages
CHAPTER TWO (2) S
No ratings yet
CHAPTER TWO (2) S
69 pages
Ch4
No ratings yet
Ch4
71 pages
Chapter 4
80% (5)
Chapter 4
21 pages
Simulation
No ratings yet
Simulation
59 pages
Lecture2 Probablity
No ratings yet
Lecture2 Probablity
27 pages
ProbabilityStatistics_Probability2 (1)
No ratings yet
ProbabilityStatistics_Probability2 (1)
11 pages
Notes ch1 Random Variables and Probability Distributions
No ratings yet
Notes ch1 Random Variables and Probability Distributions
30 pages
2 Random Variables and Probability Distributions
No ratings yet
2 Random Variables and Probability Distributions
23 pages
Module 2
No ratings yet
Module 2
36 pages
UNIT 1 Rejinpaul
No ratings yet
UNIT 1 Rejinpaul
110 pages
Chapter 4-6
No ratings yet
Chapter 4-6
39 pages
PROBABILITY 03 Rv-Dist-Moments 5 8
No ratings yet
PROBABILITY 03 Rv-Dist-Moments 5 8
21 pages
ST3236_Note3
No ratings yet
ST3236_Note3
17 pages
02 Random Variables
No ratings yet
02 Random Variables
51 pages
Random Variables
No ratings yet
Random Variables
17 pages
LECT3 Probability Theory
No ratings yet
LECT3 Probability Theory
42 pages
02 Random Variables SEIDTCHR
No ratings yet
02 Random Variables SEIDTCHR
44 pages
61d5697681ad6
No ratings yet
61d5697681ad6
37 pages
Chapter Two
No ratings yet
Chapter Two
10 pages
Chapter 3: Random Variables: Random Variable Assigns A Numerical Value To Each
No ratings yet
Chapter 3: Random Variables: Random Variable Assigns A Numerical Value To Each
19 pages
Chap2 Discrete Distributions
No ratings yet
Chap2 Discrete Distributions
22 pages
Binomial and Hypergeometric PDF
No ratings yet
Binomial and Hypergeometric PDF
12 pages
Review Some Basic Statistical Concepts: Topic
No ratings yet
Review Some Basic Statistical Concepts: Topic
55 pages
PRP Module 2
No ratings yet
PRP Module 2
113 pages
Learn Distribute
No ratings yet
Learn Distribute
23 pages
ECE316 Notes 2
No ratings yet
ECE316 Notes 2
25 pages
3 Discrete Random Variables and Probability Distributions
No ratings yet
3 Discrete Random Variables and Probability Distributions
22 pages
Probability
No ratings yet
Probability
28 pages
Unit I (Part 2)
No ratings yet
Unit I (Part 2)
49 pages
Unit 4.
No ratings yet
Unit 4.
14 pages
Math2101Stat 4
No ratings yet
Math2101Stat 4
15 pages
NOTES_DC
No ratings yet
NOTES_DC
109 pages
Unit 1 - Digital Communication - WWW - Rgpvnotes.in
No ratings yet
Unit 1 - Digital Communication - WWW - Rgpvnotes.in
11 pages
Chapter 3: Random Variables and Probability Distributions This Chapter Is All About
No ratings yet
Chapter 3: Random Variables and Probability Distributions This Chapter Is All About
8 pages
UECM2273 Mathematical Statistics
No ratings yet
UECM2273 Mathematical Statistics
16 pages
02-Random Variables
No ratings yet
02-Random Variables
62 pages
CPSC531 Probability
No ratings yet
CPSC531 Probability
75 pages
05 Random Signal
No ratings yet
05 Random Signal
40 pages
week_4_1
No ratings yet
week_4_1
51 pages
WINSEM2024-25_MAT1011_ETH_AP2024254000674_2025-01-22_Reference-Material-I
No ratings yet
WINSEM2024-25_MAT1011_ETH_AP2024254000674_2025-01-22_Reference-Material-I
32 pages
Discrete Random Variables: He Shuangchi
No ratings yet
Discrete Random Variables: He Shuangchi
47 pages
CH 3
No ratings yet
CH 3
22 pages
ECMT1020_lecture_notes_01_rv1
No ratings yet
ECMT1020_lecture_notes_01_rv1
6 pages
Discrete Random Variable
No ratings yet
Discrete Random Variable
41 pages
Seismic Resistant Design of Structures: Random Variables
No ratings yet
Seismic Resistant Design of Structures: Random Variables
30 pages
One dim. RV (Ch 3 and 4)
No ratings yet
One dim. RV (Ch 3 and 4)
11 pages
Week 10
No ratings yet
Week 10
24 pages
Slides-Probability and Random Processes, 4, March 2024
No ratings yet
Slides-Probability and Random Processes, 4, March 2024
116 pages
Probability Review Stochastic
No ratings yet
Probability Review Stochastic
23 pages
Random Variables and Mathematical Expectations - Lecture 13 Notes
No ratings yet
Random Variables and Mathematical Expectations - Lecture 13 Notes
9 pages
Random Variables
No ratings yet
Random Variables
26 pages
Addis Ababa Science & Technology University Department of Electrical & Computer Engineering
No ratings yet
Addis Ababa Science & Technology University Department of Electrical & Computer Engineering
63 pages
Infinite Series
From Everand
Infinite Series
James M Hyslop
No ratings yet
Elementary Calculus
From Everand
Elementary Calculus
George N. Frempong
No ratings yet
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
Effects of Mutagen On Chlorophyll Mutation in Horse Gram (Macrotyloma Uniflorum (Lam) Verdcourt)
No ratings yet
Effects of Mutagen On Chlorophyll Mutation in Horse Gram (Macrotyloma Uniflorum (Lam) Verdcourt)
6 pages
Effectiveness of Early Literacy Instruction
No ratings yet
Effectiveness of Early Literacy Instruction
123 pages
Introduction To Communicative English
No ratings yet
Introduction To Communicative English
39 pages
1979 Historical and Political Gazetteer of Afghanistan Vol 4 Mazar-i-Sharif S PDF
100% (1)
1979 Historical and Political Gazetteer of Afghanistan Vol 4 Mazar-i-Sharif S PDF
760 pages
Course Syllabus ( Fall 2023 ) (2)
No ratings yet
Course Syllabus ( Fall 2023 ) (2)
5 pages
Lexical Variation of Iloilo and South Cotabato Hiligaynon
No ratings yet
Lexical Variation of Iloilo and South Cotabato Hiligaynon
11 pages
Bucci Family Album - Luisa Piccarreta
No ratings yet
Bucci Family Album - Luisa Piccarreta
1 page
Africanamhistlife PDF
No ratings yet
Africanamhistlife PDF
222 pages
Estimasion Cost Repair Engine and Transmission
No ratings yet
Estimasion Cost Repair Engine and Transmission
20 pages
ITwin Technology 3353
No ratings yet
ITwin Technology 3353
5 pages
Grade 10 Mid Term Exam
No ratings yet
Grade 10 Mid Term Exam
4 pages
Distance Learning and Effect On Practical Skills Courses in BTVTED of ZCSPC
No ratings yet
Distance Learning and Effect On Practical Skills Courses in BTVTED of ZCSPC
12 pages
M&A Pharama
No ratings yet
M&A Pharama
9 pages
Hold-Time-Validation-for-Sterilized-Media-Prepared-Media-Plates-and-Tubes
No ratings yet
Hold-Time-Validation-for-Sterilized-Media-Prepared-Media-Plates-and-Tubes
12 pages
Part II Dyselectrolytemia
No ratings yet
Part II Dyselectrolytemia
97 pages
Financial Analysis of Kerala Clays and Ceramics Products LTD, Pappinissery, Kannur
100% (1)
Financial Analysis of Kerala Clays and Ceramics Products LTD, Pappinissery, Kannur
64 pages
Pseudocodes and Flowcharts(Riyansha Shahare)
No ratings yet
Pseudocodes and Flowcharts(Riyansha Shahare)
14 pages
India Buyers & Directory - Great Export Import Australian Wheat
No ratings yet
India Buyers & Directory - Great Export Import Australian Wheat
3 pages
Information Management and Technology NSG 3039
No ratings yet
Information Management and Technology NSG 3039
4 pages
REVISED ONE DTI SECOND SURVEILLANCE AUDIT REPORT - ROs and Bureaus
No ratings yet
REVISED ONE DTI SECOND SURVEILLANCE AUDIT REPORT - ROs and Bureaus
230 pages
DP Char Editor
No ratings yet
DP Char Editor
217 pages
Chapter 9 - Categorical Logic
100% (2)
Chapter 9 - Categorical Logic
57 pages
Book of The Beasts PDF
No ratings yet
Book of The Beasts PDF
3 pages
Pcme-701 PDF
No ratings yet
Pcme-701 PDF
5 pages
Motor Trend - Fall 2024 USA
No ratings yet
Motor Trend - Fall 2024 USA
100 pages
Lesson Plan NLC English
No ratings yet
Lesson Plan NLC English
5 pages
Report On Pran Xcel MKT201
No ratings yet
Report On Pran Xcel MKT201
17 pages

Random Variables: Fall 2017 Instructor: Ajit Rajwade

Uploaded by

Random Variables: Fall 2017 Instructor: Ajit Rajwade

Uploaded by

Random Variables

 Example: we may be interested in the sum of the

 Any such quantities determined by the results of

 Individual values the random variable can acquire are

 Example: results of coin toss or random dice

 The probability that a random variable X takes on

 Example: the dimensions (length, height, width,

 Why? Because there are infinitely many values – say

 Note: Zero probability in case of continuous random

 The cdf is basically the probability that X takes on a

 The cdf can be used to compute cumulative interval

 To find: probability that X exceeds 1

 P(X > 1) = 1-P(X ≤1)=1-FX (1) = e-1

 It is a non-negative function fX(x) such that for any set

The area beneath the blue curve in between the

fX(a)dx = probability that the random variable

 Gaussian (normal) pdf:  2

 For a discrete random variable X, it is defined as:

 For a continuous random variable X, it is defined as:

 The expected value should not be (mis)interpreted to be the

 One example is the pdf for the Pareto distribution

 The game of roulette consists of a ball and wheel with

 For a continuous random variable, the expected value

This property is called the linearity of the expected value. In general, a

J (c)  cFX (c)  (Q (c)  Q ( ))  (Q ()  Q (c))  c(1  FX (c))

 The definition of variance is:

 The positive square-root of the variance is called the

 Low-variance probability mass functions or probability

 Example: Pareto distribution (see slides on expectation

 Note in some cases the mean is defined, but the

 The exact probability can usually not be computed as

 Proof: next slide

( X   )2 is a non - negative random variable

 If I throw the die n times, and average the results, I

 As n increases, the average value should move closer

 That’s our basic intuition!

 Let X1, X2,…,Xn be a sequence of independent and

 The proof of the strong law is formidable and beyond the

 This supposed law is actually a fallacy – it reflects

 This is also called as Gambler’s fallacy.

 The mistake is as follows: The probability of getting

 Another example: average amount of sugar consumed

 Another example: literacy level and crime rate.

 These definitions can extended to handle more than two

 The pmf of either random variable (called as marginal

 The joint CDF can be obtained from the joint PDF as

 For independent random variables, the joint CDF is

 Note that this condition is stronger than pairwise independence!

 But pairwise independence does not necessarily imply

 Example: Consider a sample space {1,2,3,4} where

 Then P(A) = P(B) = P(C) = 1/2. P(ABC) = P({1}) = ¼

 But P(AB) = ¼ = P(A)P(B) and likewise for AC, BC.

 Cov(X, X) = Var(X) [verify this yourself!]

 Cov(aX,Y) = aCov(X,Y) [prove this!]

 Relationship with correlation coefficient:

Cov(  X i , Y )   Cov( X i , Y ) Try proving this

Var (  X i )  Cov(  X i ,  X i ) Notice that the variance of the

E[ XY ]   xi y j P{ X  xi , Y  y j } Cov( X , Y )  E[( X   X )(Y  Y )]

 Proof: Construct a counter-example yourself!

 Conditional cdf FX|Y(x,y):

 The moment generating function (MGF) of a random

X (t )  E etX    etx P( X  x ) (discrete r.v.)

etX  1  tX  (tX )2 / 2!  (tX )3 / 3!  ....

 If Y and X are independent, then:  X Y (t )   X (t )Y (t )

 Let X and Y be random variables. Let Z be a third r.v.

The matrix M has a special form that makes it invertible.

  etx ( p( X  x)  p(Y  x))  0

This is a polynomial in s with coefficien ts {c x }. The polynomial

You might also like