0% found this document useful (0 votes)
76 views36 pages

Lec Expectation

This chapter discusses mathematical expectation and some related concepts in probability theory. It begins by defining mathematical expectation as the weighted average of all possible values of a random variable, using the variable's probability function as weights. Special expectations such as the mean, variance, and covariance of a random variable are then introduced. The chapter concludes by presenting several important inequalities that involve expectations, such as Markov's and Chebyshev's inequalities.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
76 views36 pages

Lec Expectation

This chapter discusses mathematical expectation and some related concepts in probability theory. It begins by defining mathematical expectation as the weighted average of all possible values of a random variable, using the variable's probability function as weights. Special expectations such as the mean, variance, and covariance of a random variable are then introduced. The chapter concludes by presenting several important inequalities that involve expectations, such as Markov's and Chebyshev's inequalities.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 36

Chapter 6

Mathematical Expectation

6.1 Mathematical Expectation


6.2 Some Special Expectations
6.2.1 The Mean of a Random Variable
6.2.2 Variance of a Random Variable
6.2.3 Linear Functions of a Random Variable
6.2.4 Linear Combinations of Random Variables
6.2.5 Expectations for Jointly Distributed Random Variables
6.2.6 Moments, Covariance and Correlation
6.3 Conditional Expectation and Conditional Variance
6.4 A Few Inequalities
6.4.1 Markov’s Inequality
6.4.2 Chebyshev’s Inequality
6.4.3 Chauchy-Schwarz Inequality
Chapter 6
Mathematical Expectation

Probability density function or probability function f (x) provides an overview of


the behavior of a random variable X and its distribution. To comprehend and ana-
lyze the information contained in f(x), mathematical expectation or expected value
plays an important role. It is one of the most important concepts in probability the-
ory. Expected value is a key aspect of how one characterizes a probability distri-
bution.
This chapter discusses important properties and rules of mathematical expectation.
Some special expectations such as moments, variance, covariance, correlation co-
efficient and conditional expectations are discussed. Some important inequalities
are also discussed with applications.

6.1 Mathematical Expectation


Although the origin of the concept of mathematical expectation lies in the games
of chance, the main concern of which was to select the strategy that has most
promising mathematical expectation, now it plays important roles in variety of
contexts. In decision theory, and in particular in uncertainty, it is used to make an
optimal choice, maximizing the value of some objective function, and reaching in
optimal decisions. Contractors and business people use it for estimating expected
profits, minimizing cost and expected loss, and for maximizing tax advantages and
so forth. In regression analysis, one can use it for finding the good estimates of the
parameters. It is being extensively used now as a tool for solving various problems
of mathematical statistics.
Mathematical Expectation: If X is a random variable having a
probability function or probability density function f ( x), then the mathematical
expectation or the expected value of X, denoted by E(X), is given by

if X is discrete

= , if X is continuous
provided that the sum or the integral in (6.1), converges absolutely. If it is not true
that either
3

∞ or <∞
we say that X has no finite expected value.
Mathematical expectation of X is often referred to as expected value, or simply the
expectation of X. It follows from the above definition that expected value of the
discrete random variable X is a weighted average of the possible values that X can
take on, each value being weighted by the probability that X assumes it. The same
principle applies to continuous random variable, except that an integral of the vari-
able with respect to its probability density replaces the sum.
The expected value of X is essentially the mean of X, the mean of the population it
represents. Expectation is a single number which summarizes certain features of a
random variable. It is obvious that expected value can be positive, zero, or nega-
tive depending on the possible values that X can take on. A more general defini-
tion of expectation is as follows.
A General Definition of Expectation: If X is a random variable, either discrete
or continuous, and Y = g(X) is a function of X with probability density f (x), then
the expected value of the random variable g(X) is given by

(6.2) =
∑ g ( x ) f (x ), if X is discrete
x

= if X is continuous.

In particular, if = X, (6.2) reduces to (6.1), and then we see that

= =
μ = Mean of X.

If = C, where C is a constant, it is readily verified that E(C) = C. It also


follows that E (CX + D) = CE(X) + D.

Example 6.1. If = ; = and , then

Example 6.2. If X is the outcome when a fair die is tossed, then


4

, and then the expectation of X is given by

Example 6.3. If = 2x ; 0
¿ x <1 , then

Example 6.4. Let the probability function of X be given by

k = 1, 2, 3, . . .
Then

∞.
1+1+.......=
Therefore, expected value does not exist.

Example 6.5. Let the pdf of X is given by

= {2(01−x ) ; 0< x <1


; elsewhere

Find .

2
Solution. We may write E(3 X +6 X ) = . Also,

E(X) = 2 0
∫ x ( 1−x ) dx = ,
5

Thus,

.
Example 6.6. A contractor intends to prepare a proposal for a building contract.
Suppose the cost of preparing the proposal is Tk. 10,000 and the probabilities for
potential gross profit of Tk. 80,000, Tk. 50,000, Tk. 30,000 or Tk. 0 are 0.20, 0.50,
0.20 and 0.10, provided the proposal is accepted. If the probability is 0.40 that the
proposal will be accepted, what is the expected net profit?
Solution: The probability that the contractor will make a net profit of Tk. 70,000
(Tk. 80,000 minus the cost of proposal) with probability 0.20 (0.40) = 0.08. Simi-
larly, the net profit of Tk. 40,000 and Tk. 20,000 will be with probabilities
0.50(0.4) = 0.04 and 0.20(0.40) = 0.08 respectively. The probability of Tk. 10,000
loss is 0.10(0.40) + 0.60=0.64. Thus, the expected profit in Tk. is given by
70,000(0.08) + 40,000(0.04) + 20,000(0.08)-10,000(0.64) = 2,400.
It is up to the contractor whether he should take a risk of Tk.10,000 for an ex -
pected profit of Tk. 2,400. Besides, the assigned probabilities are essentially sub-
jective probabilities based on intuition or past experience.
Example 6.7. A contractor has to choose between two jobs. The first job promises
a profit of $240,000 with a probability 0.75, or a loss of $60,000 (due to strikes,
hartals, and other delays) with a probability of 0.25. The second promises a profit
of $360,000 with a probability of 0.50 or a loss of $90,000 with a probability of
0.50. (i.) Which job the contractor should choose if s/he wants to maximize her/his
expected profit? (ii.) Which job the contractor should probably choose if s/he des-
perately wants to make a profit of at least $300,000 for survival of her/his busi-
ness?
Solution. The expected profit for the first job is 240,000(0.75) + 60,000(0.25) =
165,000 dollars. And the expected profit for the second job is 360,000(0.50) +
90,000(0.50) = 135,000 dollars. Therefore, the contractor will choose the first job
for maximizing his profit.
The contractor will possibly take the risk of choosing the second job for making at
least a profit of $300,000.
6

Example 6.8. Shakib wants to insure his vehicle for a year for Tk.50,000. The in-
surance company estimates that a total loss may occur with probability 0.002, a
50% loss with probability 0.01, and a 25% loss with probability 0.1. Ignoring all
other partial losses, what premium should the insurance company charge each year
to ensure an average profit of Tk.2,000 ?
Solution. The expected claim per year is given by
E(claim) = 50,000(0.002) + 25,000(0.01) + 12,500(0.0.1) = Tk.1,600
which is the expected loss/risk of the company. To make an average profit of
Tk.2.000, the premium should be at least 1,600 + 2,000 = Tk3,600.

6.2 Some Special Expectations


In the following, some special expectations, and important properties and rules of
expectation are given. For proving certain properties, we shall use integral signs
assuming the variables are continuous. Replacing integral sign by summation,
similar proofs or results can be obtained for the discrete cases.

6.2.1 The Mean of a Random Variable


The mean of a random variable X is defined by

E(X) = if X is discrete

= ; if X is continuous.

The expected value is essentially the mean of X. Thus,


E(X) = µ ,

where, μ denotes the population mean or the mean of X.

Example 6.9. Let X be a random variable having pdf

1
= β−α ;
α < x< β .

β
1 α+β
Then E(X) = ∫ x ( β−α )dx =
2
.
α
7

6.2.2 Variance of a Random Variable


Variance: Let X is a random variable and g(X) is a function of X . Then the vari-
ance of the function g(X) is defined by

(6.4) .

In the special case when = X, the variance of X is given by

(6.5) Var(X) = E[ X−E ( X )]2.

The variance of a variable is usually denoted byσ 2. It is evident that variance can-
not be negative, but can be infinitely large.
A more convenient formula for variance of X can be derived as follows:

2
= Var(X) = E[ X−E ( X )]
2 2
= E[ X −2 XE ( X )+(E ( X )) ]

= E(X2) - 2(E ( X ) )2 + (E ( X ) )2

= E(X2) - [ E ( X ) ]2

= E(X2) - μ2.

Thus,

(6.6) Var(X) = σ 2 = E(X2) – (E(X))2.

This formula is frequently used for ease in computation. The standard deviation is
nothing but the positive square root of variance, σ.

Note that Var(X) = 0, if X is degenerate.

Remark: If X is measured in cm, then Var(X) is expressed in(cm)2. However, the


unit of measurement of the variable X and the unit of measurement of standard de-
viation are the same, and thus standard deviation is the preferred measure of dis-
persion.
Example 6.10. If X is a continuous variable with pdf
8

f (x) = {1+1−xx ;−1≤ x≤0


; 0 ≤ x <1
0 1
then E(X) = ∫ x ( 1+ x ) dx + ∫ x ( 1−x ) dx = 0
−1 0

0 1
1
E( X 2 ) = ∫ x 2 ( 1+ x ) dx + ∫ x 2 (1−x ) dx = 6 .
−1 0

1
Thus, Var(X) = σ 2 = E( X 2 ) – (E ( X ) )2= .
6

Minimal Property of Variance:


By definition, we have

Var (X) = E (X - μ)2


If b ≠ μis any arbitrary constant , then

E( X −b)2 = E( X −μ+ μ−b)2

= E(X −μ)2+2 E ( X −μ ) ( μ−b )+ E (μ−b)2

= E(X −μ)2+(μ−b)2,

since E(X - μ ¿=E ( X )−μ=0.

This gives,

E(X −μ)2 = Var(X) = E(X −b)2 - E( μ−b)2.

Hence, Var (X) ≤ E(X −b)2, the equality holds when μ=b.Thus, variance may
be regarded as the minimum value of E(X −b)2, which is attained when b = μ,
the mean of X.

6.2.3 Linear Function of a Random Variable


Theorem 6.1: Let a and b be any two constants, and X be a random variable.
Then the linear function Y = a + bX has the mean, variance, and standard devia-
tion as
9

(6.7) E(Y) = µy = a + bE(X) = a + bµx


(6.8) Var(Y) = σy2 = b2Var(X) = b2σx2 and σy = |b| σx .
Proof: By definition, we get

µy = E( a + bX) =

= a+ b

= a + b E(X) = a + bµx
Again, σy2 = E[Y – E(Y)]2 = E[a + bX – a – b E(X)]2
= b2 E[X – E(X)]2 = b2 σx2
and therefore,
σy = |b| σx
Corollary 1. It follows that E(bX) = bE(X), which we can see by setting a = 0 in
(6.7).
Theorem 6.2: Let X be a random variable and C is a constant, then Var(CX) =
C2 σx2.
Proof: By definition, we may write

Var(aX) =

= C 2 E( X 2) - C 2( E ( X ))2

= C 2 [E ( X 2 )−( E ( X ) ) ]
2

= C 2 Var ( X )=¿C2 σx2

Theorem 6.3: If C is a constant, Var(X + C) = Var(X).


Proof: By definition,
2
Var(X + C) = E[ ( X +C )−E( X +C)]

= E [X +C−E ( X )−C ]2
2
= E [ X −E ( X ) ]
10

= σ 2 = Var(X)
Note that the variance of a constant is 0.

6.2.4 Linear Combination of Random Variables


Theorem 6.4: Let X and Y be independent or dependent random variables, and a
and b be any two constants. Then the new random variable W = aX ± bY, a linear
combination of X and Y, has the mean and variance giver by

(6.9) E(W) = µw = a E(X) ± b E(Y)

(6.10) Var(W) = σw2 = a2 Var(X) + b2 Var(Y),


if X and Y are independent, and

(6.11) Var(W) = σw2 = a2 Var(X) + b2 Var(Y)± 2ab Cov(X,Y), if X and Y are


dependent,
where,

µx and µy and σ x ∧σ y are the respective means and standard deviations of X and
Y and Cov(X,Y) = E[(X – E(X))(Y – E(Y))] = covariance of X and Y.
Proof : By definition,

E(W) = E( aX ± bY) = a E(X) ± b E(Y), and

Var(W) = Var(aX ± bY) = E[aX ±bY −E(aX ± bY )]2


2
= E[ ( aX−aE ( X ) ) ±(bY −bE ( Y ) )]

= a2[E(X – E(X)]2+ b2 [E(Y – E(Y)]2± 2ab[X – E(X)][Y – E(Y)]

= a2 Var(X) + b2 Var(Y) ± 2 abCov(X,Y)

If X and Y are mutually independent, then E(XY) = E(X).E(Y) and therefore, Cov
(X,Y) = 0. Thus, for independent X and Y,

Var(aX ± bY) = a2 Var(X) ± b2 Var(Y).

If X and Y are dependent, Cov(X,Y) will not be zero and the third term will be ei-
ther +2ab Cov(X,Y) or – 2ab Cov(X,Y) depending on whether W = aX + bY or W =
aX – bY. Thus, if X and Y are dependent,

Var(aX ± bY) = a2 Var(X) + b2 Var(Y) ± 2Cov(X,Y).


11

For the linear combination of more than two variables, similar results also hold. In

general, the variance of is given by

Var[
k
If X1, X2 , … , Xn are independent or uncorrelated random variables, Var[ ∑ X ¿=
1
k

∑ Var ( X ).
i=1

Corollary 1. If X and Y are independent random variables, then


Var(X + Y) = Var(X) + Var(Y).
Example 6.11. Let X and Y be independent random variables with respective
means E(X) = 70 and E(Y) =50 and variances Var(X) = 16 and Var(Y) = 9. Find
the mean, and the variance of (i) W =3 + 2X , (ii) W = X + Y, (iii) W = X – Y, and
(iv) W = 3X – 2Y.
Solution:
(i). E(W) = 3 + 2 E(X) = 3 + 2(70) = 143
Var(X) = 22 Var(X) = 4(16) = 64
(ii). E(W) = E(X + Y) =E(X) + E(Y) =70 + 50 =120
Var(W) = Var(X + Y) = Var(X) + Var(Y) =16 + 9 =25.
(iii). E(W) = E(X – Y) = E(X) – E(Y) = 70 – 50 = 20
Var(W) = Var(X – Y) = Var(X) + Var(Y) =25.
(iv). E(W) = E(3X – 2Y) = 3E(X) – 2E(Y) =3(70) – 2(50) = 110
Var(X) = Var(3X – 2Y) = 32Var(X) +22Var(Y) = 180.
Example 6.12. Let T = 2X + Y – Z, where X, Y and Z are random variables. Then
Var(T)=
2
2 Var ( X )+Var ( Y ) +Var ( Z ) +2.2 Cov ( X , Y )−2.2 Cov ( X , Z )−2Cov ( Y , Z ) .
= 4Var(X) + Var(Y) + Var(Z) +4Cov(X,Y) - 4Cov(X,Z) – 2Cov(Y,Z).
12

2
Example 6.13. Let X and Y are independent random variables with variances σ x
2
= 1, and σ y = 2. Find the variance of Z, where Z = 3X – 2Y + 5.

Solution:

Var(Z) = 32 Var ( X )+ 22 Var ( Y ) +Var (5)


2 2
= 9σ x + 4 σ y , since Var(5) = 0

= 9(1) + 4(2) = 17.


Example 6.14. The number of cars X that passes through a car wash in the after-
noon of any Friday has the following distribution:

X ∨4 5 6 7 8 9
1 1 1111
f ( x )∨
12 12 4 4 6 6
Let g(X) = 2X – 1 represents the amount of money in dollars paid to the attendant.
Find the attendant’s expected earnings for this time period.
Solution: Expected earnings = E[g(X)] = 2E(X) - 1. Here E(X) is obtained as

1 1 1 1 1 1
E(X) = 4( ) + 5( ) + 6( ) + 7( ) + 8( ) + 9( ) = 6.883
12 12 4 4 6 6
Therefore, the attendant’s expected earnings are $ 12.67.
Example 6.15. Suppose X is a random variable such that E(X) = 3 and Var(X) =
5. Let g(X) = 2X – 7. Then
E[g(X)] = E(2X – 7) = 2E(X) – 7 = -1
and
Var[g(X)] = Var(2X – 7) = 4Var(X) = 20.

6.2.5 Expectation for Jointly Distributed Random Variables


Let ( X 1 , X 2 , … … X k ) be a k-dimensional continuous random variable with joint
probability density function (pdf) f ( x 1 , x 2 , … … x k ). Then the expected value of
Xi is defined by

Xi
(6.12) E( )=
13

provided that the multiple integrals exist. For discrete case, the integral signs will
have to be replaced by summation signs. We shall limit our discussions by consid-
ering at most two-dimensional random variables.
Let (X,Y) is a two-dimensional random variables with joint pdf
f (x,y). Then expectation of the function g(X,Y) is given by

(6.13) E[g(X,Y)] = .
Theorem 6.5: If X1, X2, … , Xk are random variables with finite expectations,
then expectation of their sum is equal to the sum of their individual expectations,
that is,
E(X1 + X2 + … +Xk ) = E(X1) + E(X2) + --- +E(Xk).
Proof: For simplicity, we give a proof for the two variable case of X and Y. If X
and Y are random variables with joint pdf f (x, y), then the expectation of their
sum is given by

E(X+Y) =

=
= E(X) + E(Y).
Using (6.12), the proof is straight forward for k-variable case. The mean of the dif-
ference of two or more random variables is equal to the difference of the means of
these variables, i.e. E(X – Y) = E(X) – E(Y).
It can easily be proved that
E(aX ± bY) = aE(X) ± bE(Y),
and

E(a 1 X 1 ± a2 X 2 ± … .) = a 1E( X 1 ) ± a 2E( X ) ± . . . . .

Theorem 6.6: It X and Y are random variables, their product is a random vari-
able with expectation
14

(6.14) E(XY) = .
When X and Y are mutually independent,
(6.15) E(XY) = E(X) . E(Y),
which follows from f (x,y) = f (x).f (y) for independent X and Y.
Example 6.16. Let the joint pdf of X and Y is given by,
f (x,y) = x + y ; 0 < x < 1, 0 < y < 1.
1 1
17
Then E( X Y ) =2
∫∫ x 2 y ( x + y ) dxdy = 72 .
0 0

Example 6.17. Consider the joint distribution f (x, y), and let

f (x,y) = p x+ y (1− p)2−( x+ y); x = 0, 1 ; y = 0, 1 and 0 < p < 1. Find E(X + Y),
E(XY) and E(X).
1 1
Solution. E(X + Y) = ∑ ∑ (x + y) p x+ y (1− p)2−( x+ y)
0 0

1 1 1 1
= ∑ ∑ x px + y (1− p)2−(x + y) +¿ ∑ ∑ y p x+ y (1− p)2−(x+ y)
0 0 0 0

= 2p.
1 1
E(XY) = ∑ ∑ xy p x+ y (1− p)2−( x+ y)
0 0

1 1
= ∑ x p x(1−p)1− x ∑ y p y (1− p)1− y
x=0 y=0

= p.p = p2.
1 1
E(X) = ∑ ∑ x p x+ y (1− p)2−(x+ y)
x=0 y=0

1 1
= ∑ x p (1−p) x 1− x
. ∑ y p y (1− p)1− y = p.
x=0 y=0
15

6.2.6 Moments, Covariance and Correlation


Moments, covariance and correlation are important characteristics of random vari-
ables and their distributions. These characteristics are defined here as special ex-
pectations.

Moments:
Moments are shape characteristics of a distribution (or a random variable). Central
moments ( or moments about mean) and raw moments (or moments about origin
or about an arbitrary number) can be defined in terms of expectation.
The rth raw moment (moment about origin) is defined by

( 6.16 ) μ'r = E( X r), r = 1, 2, 3, . . . .


'
Then μ1 = E(X) = μ = Mean
'
μ2 = E( X 2 ) = second raw moment

μ'3= E( X 3 ) = third raw moment


'
μ4 = E( X 4) = fourth raw moment
and so forth.

The rth central moment (or rth moment about mean μ ¿ is defined by

( 6.17 ) μr = E[(X −μ)r] = E [ X −E( X)] r, r = 1, 2, 3,. . ., μ=¿ μ1' .


Then, we find

μ1 = E(X - μ ¿ = E(X) – μ = 0.
μ2 = E[( X −μ)2]
= E[ X 2 - 2 μX + μ2]
' '2 '
= E( X 2 ) – 2 μ1 E(X) + μ1 , since μ=μ 1=E( X)
' '2
= μ 2 - μ1
2
= σ x = Variance.

μ3= E[(X −μ'1)3 ]


16

' ' '2 '3


= μ 3 - 3 μ1 μ 2 + μ 1 ,

after some simplification. Similarly, μ4 can also be expressed in terms of raw mo-
ments as,
' '3 ' '2 '4
μ4 =μ4 - 4 μ1 + 6 μ2 μ 1 - 3 μ1 .

Factorial Moments:
The rth factorial moment of the random variables X is defined by

(6.18) μ(r) = E[ X (r)] = E[X(X - 1)(X - 2). . . . . . . .(X –r +1)].

For r = 1,

μ(1 ) = E(X) = μ'1 = μ.


For r = 2,

μ(2 ) = E[X(X - 1)] = E( X 2 ) – E(X) = μ'2−μ '1


and so forth. These show that moments can also be obtained from factorial mo-
ments.

Product Moments and Covariance:


Let (X, Y) be a two-dimensional random variable with probability density f (x,y).
Then the expected value of X r Y s is defined as the rth and sth product moment
about origin and it is given by

( 6.19 ) μ'r , s = E( X r Y s) = ∑ ∑ x y f ( x , y ),
r s

x y

when X and Y are discrete, and

when X and Y are continuous.


The (r,s)th product moment about the means is given by

(6.20)μ r , s = E [(X −E ( X ) )r (Y −E (Y ))s ]


for r = 0, 1, 2, . . . . . and s = 0, 1, 2, . . . .
17

Clearly, μ1 ,0 = μ0 , 1 = 0, μ2 , 0 = Var(X) and μ0 , 2 = Var(Y) and μ1,1 , the first order


product moment, is given by

μ1,1 = E[(X - E(X))(Y - E(Y))],


which is called the covariance of X and Y, and it is denoted by Cov(X,Y), or σ xy .
The μ1,1=Cov ( X ,Y ) has special importance as it indicates the relationship, if
any, between X and Y.
If the variables X and Y increase or decrease together, then covariance should be
positive, whereas it would be negative if one variable increases and at the same
time the other decreases. A more convenient formula for covariance of X and Y is
given below.
Co-variance: If X and Y are random variables with respective means E(X) and
E(Y), then covariance of X and Y is given by
(6.21) Cov (X,Y) = E[(X-E(X))(Y-E(Y))]
= E (XY) – E (X) E(Y)

= E(XY) - μ x μ y = σ xy .

If X and Y are independent, E(XY) = E(X).E(Y), and then Cov(X, Y) = 0. It is im-


portant to note that covariance can be zero even though the random variables are
not independent. If Cov(X,Y) = 0, we say X and Y are uncorrelated.
Consider the following example of the joint distribution of X and Y:
X
-1 0 1 f (y)
Y

-1 1 1 1 2
6 3 6 3
0 0 0
0
0 1 1
1 1 6 3
0
6
1 1 1 1
f (x)
3 3 3
'
Here μ x = 0, μ y = - 1/3 and μ1,1 = E(XY) = 0. Since f ( x , y ) ≠ f ( x ) . f ( y ) for X =
-1 and Y = -1, X and Y are not independent.
18

Example 6.18. Let the joint pdf of X and Y is given by

f (x,y) = {2; x >00;, yelsewhere


>0. a nd x + y <1

1 1− x
1
Then μ x = ∫ ∫ 2 xdydx = = E(X)
0 0
3
1 1− y
1
μ y = ∫ ∫ 2 ydydx = = E(Y)
0 0
3
1 1− x
1
'
and μ 1 ,1= ∫ ∫ 2 xydydx = 12 ,
0 0

1
μ1 ,1 = σ XY = Cov(X,Y) = μ' 1 ,1- μ x μ y = - .
36

Correlation and Correlation Coefficient:


The relationship between two or more variables is called correlation. Correlation
analysis is an important topic in statistical analysis. Correlation co-efficient is a
measure of linear relationship between two random variables.
Correlation co-efficient: If X and Y are jointly distributed random variables on a
sample space, the correlation coefficient between X and Y, denoted by ρ , is de-
fined as

E[(X −E( X))(Y − E (Y ))]


(6.22) ρ= .
√ Var ( x ) .Var ( y )
Since E[(X – E(X))(Y – E(Y))] = E(XY) – E(X)E(Y) = Cov(X,Y), the correlation co-
efficient ⍴ can also be expressed as

E ( XY )−E( X) E ( Y )
(6.23) ⍴=
σxσ y
Cov( X ,Y )
=
σxσ y
19

If X and Y are independent, ⍴ = 0. However, ρ = 0 does not necessarily indicate


that X and Y are independent. We say that the random variables X and Y are uncor-
related if ρ = 0, or Cov(XY) = 0. It should be noted that the correlation coefficient
is a measure of linear relationship between X and Y.

Theorem 6.7: Correlation coefficient ρ is a number that satisfies

|ρ| ≤ 1 or -1 ≤ ρ ≤ 1.

Proof: Consider the function of the real variable t,


2
φ ( t )=E [u+tv ] ,
where u = X - E(X), and v = Y- E(Y). Since [u+tv]2 ≥ 0 , we have
2
E [u+tv ] = E[u2 +2 tuv+t 2 v 2] ≥ 0
for all t. This is a quadratic expression, and its discriminant must be ≤ 0, since b 2-
4ac > 0 would mean the equation has two distinct roots. Applying this, we obtain
2
4 [ E ( uv ) ] - 4E( v 2)E(u2) ≤ 0,
which leads to
2
E(uv )
≤ 1,
E ( u ) E (v )
2 2

and hence
2
E[ ( X −E ( X ) ) ( Y −E ( Y ) ) ]
= ρ2 ≤ 1.
Var ( X ) Var ( Y )
Therefore,

|ρ| ≤ 1 or -1 ≤ ρ ≤ 1.

It can be shown that equality holds (|ρ| = 1) if Y = aX+b for some constants a
≠ 0∧b .
Example 6.19. Suppose the joint pdf of X and Y is given by

f (x,y) = {x+ y 0;0<;elsewhere


x <1 , 0< y <1

Then
20

1 1

σ x = E(X ) – (E(X)) = ∫ ∫ x ( x + y ) dxdy – (7/12)2


2 2 2 2

0 0

= 11/144.
2
Similarly, E(Y) = 7/12 and σ y = 11/144.

The covariance of X and Y is


Cov(X,Y) = E(XY) – E(X)E(Y) = - 1/144.

Cov( X ,Y )
Therefore, ρ= = - 1/11,
σxσ y
which shows the existence of negative linear relationship between X and Y.

6.3 Conditional Expectation and Conditional Variance


Conditional Expectation: Consider a two-dimensional random variable (X, Y)
with joint pdf f (x,y). Let g(X) be a function of X, and f (x|y) be the conditional pdf
of X, given Y. Then the conditional expectation of g(X), given Y, is defined by

(6.24) E ( g( X)|Y ) = ∑ g ( x ) f ( x| y ), if X and Y are discrete


x

= if the variables are continuous


and f (x|y) is the conditional pdf of X for given Y.
Similarly, if h(Y) is a function of Y, then the conditional expectation of h(Y), given
X is given by

( 6.25 ) E ( h ( Y )|X ) = ∑ h ( y ) f ( y|x ) , if X and Y are discrete and


y

= if X and Y are continuous, where


f ( y|x ) isthe conditional pdf of Y for given X.
If g(X) = X and h(Y) = Y, we get the conditional expectation of X for given Y as

( 6.26 ) E ( X|Y ) =∑ xf ( x| y ) , for discrete case , and


x
21

= if X and Y are continuous. Conditional


expectation of Y for given X can be defined analogously.

It should be noted that E[g(X)|Y] is a function of Y only and E[ h ( Y )|X ] is a func-


tion of X only and they are random variables. This can easily be seen from

( Y |X ) ¿
E =

which is a function of X alone.


When X and Y are independent, it can easily be shown that
E(X|Y) = E(X) and E(Y|X) = E(Y).
We close our discussion on conditional expectation by stating an important prop-
erty of conditional expectation as below.
Theorem 6.8: The expectation of conditional expectation is unconditional expec-
tation. That is,

(6.27) E[E( X|Y )] = E(X) and E[E( Y |X )] = E(Y) .

Briefly, a proof is given below


Proof: Since E(X|Y) is a function of Y only, we may write

( X|Y )
E[E ]=

=
22

=
Thus, expectation of conditional expectation is unconditional expectation.

Conditional Variance:
The conditional variance of X, given Y, is defined by

= Var( X|Y ) = E[( X −E( X|Y ) )


2 2
(6.28)σ x∨ y ∨Y ]

= E( X 2|Y ) – ( E( X|Y ) ) .
2

Theorem 6.9: The variance of a random variable can be expressed as the sum of
the expected value of its conditional variance and the variance of its conditional
expectation, given by

(6.29) Var(Y) = E[Var( Y |X )] + Var[E( X|Y )].

Proof: By definition,

Var( Y |X ) = E( Y 2|X ) – [ E (Y | X ) ]
2

Taking expectation on both sides, we get

E[Var( Y |X )] = E[E( Y 2|X ) – ¿ ¿


2
=E(Y 2)–E[ E (Y | X ) ] –( E(Y ))2+( E(Y ))2
2
= (E(Y 2) – (E ( Y ) )2−{E[ E (Y | X ) ] -[ E(Y )]2}

= Var(Y) –{ E[(E( Y |X ) ¿2 – (E(E(Y|X))2}

= Var(Y) – Var(E(Y|X)) ,
which gives,

Var(Y) = E[Var(E( Y |X ) ¿ + Var(E(Y|X)).

Thus, the variance of Y is the expectation of the conditional variance of Y plus the
variance of the conditional expectation of Y. Similar results can be established for
X.
23

Example 6.20: Given the joint distribution of X and Y,


X
0 1 f(y)
Y

0 2 6 2
20 20 5

1 6 6 3
20 20 5
2 3 1
f(x)
5 5

The conditional distribution of X, given Y is given by

P ( X =x , Y = y ) f (x , y)
f( x| y ) = = ; x = 0, 1 and y = 0, 1
P (Y = y ) f ( y)

∑ xf ( x|0 ) = ∑ x f (x , 0)
E( X|Y =0 ) =
x x f ( y =0)
2 6
0( ) 1( )
20 20 3
= + = .
2 2 4
5 5

Example 6.21: Let f(x, y) = {2;0 ;0<elsewhere


x< y<1

1
Then f (x) = ∫ 2 dy
x

=
{2(01−x ) ; 0< x <1
; elsewhere
y
and f (y) = ∫ 2 dx
0

= {02;yelsewhere
; 0< y<1
24

f (x , y) 1
f( x| y ) = = ; 0 < x < y, 0 < y < 1
f ( y) y
Then the conditional mean and conditional variance of X, given Y are given re-
spectively by
y y
1 y
E( X|Y ) = ∫ f ( x| y ) dx = ∫ x . y
dx = ; 0 < y < 1
2
0 0

and

Var( X|Y ) = E[( X−E ( X|Y ¿ ) Y )] |


2

y 2
= ∫ (x− 2y ) f( x|y )dx
0

=
y 2 ; 0 < y < 1.
12

6.4 A Few Important Inequalities

We consider here a few inequalities which are useful in theoretical discussions as


well as for finding bounds for tail probability in terms of moments of the random
variables. Markov’s inequalityType equation here ., Chebyshev’s inequality,
and CaType equation here .uchy-Schwarz inequality are some of such inequali-
ties.

6.4.1 Markov’s Inequality


In probability theory, Markov’s inequality gives upper bound for the probability
that a non-negative function of a random variable is greater than or equal to a pos-
itive constant. It is named after the Russian mathematician Andrey Andreyevich
Markov (1856 - 1922).
Markov’s Inequality: Let g(X) be a non-negative function of a random variable
X. If E[g(X)] exists, then for any positive constant C,

E[g ( X )]
(6.30) P[g(X) ≥ C] ≤ .
C
Proof: For giving a proof to the Markov’s inequality as stated above, suppose that
the variable X is of continuous type with pdf
25

f (x). The proof for discrete X is identical, which can be done by simply replacing
integration by summation.

Let A = {X ; g(X) ≥ C} . Write

E[g(X)] = =

because, each of the integral on the extreme right-hand member of the above equa-
tion is positive. Here, A and A’ are complements to each other. Hence


E[g(X)] C , since C is the smallest value of g(X).


Again, since = P[g(X) C], it follows that

E[g ( X ) ]
E[g(X) ≥ C] ≤ ,
C
which is the desired result.
Corollary 1. If g(X) = X, the above inequality reduces to

E[ X ]
(6.31) P[X ≥ C] ≤ .
C
Assuming no income is negative, Markov’s inequality shows that no more than
1/5 of the population can have more than 5 times the average income.
Corollary 2. Let g(X) = |X|r and C = Kr, where r > 0 and K > 0. Then
r
¿ X∨¿
(6.32) P[ |X| ≥ K ] ≤ E ¿.
Kr
In particular, if g(X) = (X - µ)2, and C = (Kσ )2 , we get Chebyshev-Bienayme in-
equality.
26

6.4.2 Chebyshev’s Inequality


The Chebyshev’s Inequality due to the Russian mathematician Chebyshev is an
well-known inequality which plays important role in the theory of mathematical
statistics. It uses the variance to bound the probability that a random variable devi-
ates far from the mean. It provides upper or lower bounds for certain probabilities.
It is stated as follows.
Chebyshev’s Inequality: Let X be a random variable, either discrete or continu-
ous, with finite mean and varianceσ 2. Then for every k > 0,

1
(6.33) P[| X−μ|≥kσ ] ≤
k2
Or, equivalently,

1
(6.34) P[| X−μ|<¿ kσ ] ≥ 1-
k2
This may also be written as

σ2
(6.35) P[| X−μ|≥ ∈] ≤ .
∈2
Proof: Let Y be a continuous random variable with pdf f (y). Write,

2
Y
E( )=

where, ∈ > 0 is a constant. Hence,


, since is the smallest value of Y.

= ∈2P[|Y | ≥ ∈].

Taking Y = X - μ, where μ is the mean of X, we get

E(X −μ)2 ≥ ∈2P[| X−μ| ≥ ∈],


27

which gives,
2
E( X −μ) σ 2
P[| X−μ| ≥ ∈] ≤ 2
= 2.
∈ ∈
Taking ∈ = kσ , we easily get

1
P[| X−μ| ≥ kσ ] ≤ 2 .
k
The equation (6.34) is the complement of (6.33). Chebyshev’s inequality (6.33)
can also be proved taking g(X) = (X - µ)2 and C = k2ϭ2 in the Markov’s inequality.
The Chebyshev’s inequality is also cited as Chebyshev-Bienayme inequality. Re-
cent literature suggests that I. J. Bienayme also had contribution in it.
The inequality becomes interesting if k > 1. It is seen that Chebyshev’s inequality
provides upper and lower bounds of probability. The number 1/k 2 is an upper
bound and 1-1/k 2 is a lower bound for the probability P[| X−μ| ≥ kσ ]. The
bounds, however, do not provide close approximation to the exact probability.
The remarkable benefit of Chebyshev’s inequality is that it enables to find upper
and lower bounds of probabilities without knowing the distributional form (pdf/pf)
of a random variable. If the distributional form of a random variable is known, we
can easily find the mean and variances of X, if these exist. However, we cannot
find probabilities like P[|X - µ| ≤ C] from the knowledge of mean and variance.
Chebyshev’s inequality gives a useful upper or lower bounds of such probabilities.
Chebyshev’s inequality is a very useful theoretical tool for deriving or providing
proofs of important results in probability theory.
It also throws light on the connection between standard deviation and dispersion in
a distribution. It shows precisely how the standard deviation measures variability
about the mean of a random variable. By assigning different values to k, this can
easily be seen. It is evident that σ = 0 implies P(X = μ) = 1. Also, P[| X−μ| ≥ k]
= 0 implies σ = k.

1
Remarks: We have P[| X−μ| ≥ kσ ] ≤ .This means, the probability that a
k2
random variable will take on a value which deviates from the mean by at least two
standard deviations is at at most 1/4 , the probability that it will take on a value
which deviates from the mean by at least 5 (five) standard deviations is at most
28

1/25, and the probability that it will take on a value which deviates from the
mean by 10 standard deviations or more is less than or equal to 1/100.

Example 6.22: The number of customers who visit a showroom on a Saturday


morning is a random variable with μ = 18 and σ = 2.5. With what probability can
we assert that there will be between 8 and 28 customers?
Solution: P(8 < X < 28) = P(-10 < X - 18 < 10)

1
= P(| X−18|<10 ) ≥ 1- , by Chebyshev’s inequality
k2
where kσ = 10 and σ = 2.5. Then,

1 15
P(| X−18|<10 ) ≥ 1- = .
16 16
15
Thus, the probability is at least that there will be between 8 and 28 customers.
16
Example 6.23: In a throw of a true die, let X denotes the outcomes. Then

1 6(6+ 1) 21
E(X) = (1+2+3+ 4+5+ 6)/6 = . =
6 2 6
1 6 ( 6+1 ) (12+1) 91 35
E( X 2 ) = ∑ x 2 f (x ) = 6 . 6
=
6
, and then σ 2 =
12
.

By Chebyshev’s inequality, we get,


2
σ
P[| X−μ| ≥ 2.5] ≤ 2 = 0.46
(2.5)
The exact probability is given by

P[| X−μ| ¿ 2.5] = P[(X - μ ¿>¿ 2.5] + P[(X - μ ¿ ←2.5]

= P[X¿ 6] + P[X¿ 1] = 0 + 0 = 0.

Thus, the bounds provided by the Chebyshev’s inequality is a very poor approxi-
mation of exact probability.
Example 6.24. Let X be the number of heads in 20 tosses of an unbiased coin.
Find P[| X−μ| ¿ 6].
29

1
Solution: X follows binomial distribution with parameters n = 20 and p = such
2
1 1 1
that mean μ = np = 20. = 10, and σ 2 = npq = 20. . = 5. Then exact proba-
2 2 2
bility is

P[| X−μ| ¿ 6] = P[(X - μ ¿>¿ 6] + P[(X - μ ¿ ←6 ]

= P[X¿ 16] + P[X¿ 4 ]


3 20
=2 ∑ (20
x) 2
1
( ) , by symmetry
x=0

= 0.0026
By Chebyshev’s inequality, the upper bound of probability is given by
2
σ
P[| X−μ| ¿ 6] = 2
= 0.69,
6
which is not precise at all.

m
E[|X|]
Example 6.25. Given that f(x) = e− x; x > 0, and P[| X| ≥ b] ≤ m
.
b
Find for which value of m, the bound is better, when b = 3 or b = √ 2 .

Solution: The exact probability is



P[| X| ≥ b] = P[X ≥ b] = ∫ e−x dx = e−b
0

= 0.0497, when b = 3

= 0.2431, when b = √ 2 .

When f (x) = e− x; x > 0, E(X) = μ = 1, E( X 2 ) = 2, then σ 2 = 1.

Taking m = 1 and m = 2, and applying Chebyshev’s inequality gives

E( X ) 1
(i) for m = 1 and b = 3, P[| X|≥ 3] ≤ = ,
3 3
30

2
E( X ) 2
for m = 2 and b = 3, P[| X|≥ 3 ] ≤ = , which is better but not precise.
32 9
1
(ii) for m = 1, b = √ 2 , P[| X|≥ √ 2] ≤ = 0.7071, which is better,
√2
again not precise
2
for m = 2 and b =√ 2, P[| X|≥ √ 2] ≤ = 1 ,.
2
Example 6.26. Let X has a Poisson distribution with mean 100. Find a lower
bound for P(75 < X < 125).
Solution:
For Poisson distribution, mean = variance = 100. By Chebyshev’s inequality, the
lower bound is given by

1
P[| X−μ| ¿ kσ ] ≥ 1 - ,
k2
which can also be written as
2
σ
P[| X−μ| ¿ ∈] ≥ 1 - 2
.

Now,
P[75 < X < 125] = P[-25 < X - 100 < 25]

100
= P[| X−100|<25 ] ≥ 1- 2 = 0.84, by Chebyshev’s inequality.
(25)
Thus, the lower bound of probability is 0.84.
Example 6.27. How many times do we have to flip a balanced coin to be able to
assert with a probability of at most 0.01 that the difference between the proportion
of tails and 0.50 will be at least 0.04?

σ2
Solution: Using the Chebyshev’ formula P[| X−μ|≥ ∈] ≤ 2 , and according

Var (p)
to the given condition, we have P[| p−0.50|≥ 0.04] ≤ 2 = 0.01, where
(0.04)
x
p= = proportion of x tails in n flips. The Var(p) = Var(X)/n2 = npq/n2 = pq/n
n
31

where p = q = probability of getting a tail = ½. Thus, we have to flip a balanced


1
coin n = = 15,625 times.
4 (0.04)(0.04)(0.01)

6.4.3 Cauchy-Schwarz Inequality


Chauchy-Schwarz Inequality: If the random variables X and Y have finite sec-
ond moments, then
2
(3.36)[ E(XY )]2=|E (XY )| ≤ E( X 2 )E(Y 2)

with equality only if P[Y = CX] = 1 for some constant C.

Proof: Since E( X 2 ) and E(Y 2) exist, then obviously E(X), E(Y) and E(XY) exist.
Define

0 ≤ h(t) = E[(tX −Y )2]

= E(t 2 X 2) – 2E(XY)t + E(Y 2).

The quadratic function h(t) in t has no real roots if h(t) > 0. So the discriminant

4[ E(XY )]2 - 4E( X 2 )E(Y 2) < 0

Or,[ E(XY )]2 < E( X 2 )E(Y 2).

If h(t) = 0 for some t, say t 0, then


2
E[(t 0 X−Y ) ] = 0,

which implies that

P[t 0 X=Y ] = 1.

Hence proved.

Example 6.28. Show that|⍴ xy| ≤ 1, with equality if and only if one variable is
a linear function of the other with probability one.
Solution: From Cauchy-Schwartz inequality, we get
2
[E( XY )]
≤ 1.
E ( X ) E(Y )
2 2
32

Writing X ≡ X - μ x, Y ≡ Y - μ y , we get
2
[Cov (xy )]
2 2 ≤ 1.
σx σy
which leads to

| Cov( xy )
σx σ y |≤ 1 → |⍴ xy| ≤ 1.

Also |⍴ xy| = 1, if and only if P(t 0 X=Y ) = 1.

EXERCISES 6:
1. In a gambling game, a man is paid $5 if he gets all heads or all tails when
three coins are tossed, and he pays out $3 if either 1 or 2 heads shows up.
What is his expected gain?
2. A fair coin is tossed until a head appears. What is the expected number of re-
quired tosses? Ans. 2
3. If a club sells 5,000 raffle tickets for a cash prize of $1,000, what is the math-
ematical expectation of a person who buys one of the tickets?
4. Let X have a pf that is positive at x = - 1, 0, 1 and zero elsewhere. (i) If f (0) =
1 1 1
, find E(X2). (ii) If f (0) = , and E(X) = , determine f (-1) and f (1).
2 2 6
Ans. (i) ½, (ii) 1/6, 1/3
5. Let the joint pdf of X and Y be f (x, y) = e –x–y
; 0 < x < ∞, 0 < y < ∞, zero
elsewhere. Find E(X), E(Y), and E(XY).
1 1
6. Let X and Y have the pf f (x, y) = ; x = 0, 1 and y = 0, 1. Find E[(X - )(Y -
3 3
2
)]. Ans. – 1/27
3
7. A lot of twelve TV sets includes two that are defective. If three of the sets are
randomly shipped to a hotel, how many defective sets can they expect?
1
8. Let a continuous random variable X have the pdf f (x) = ; - a < x < a,
2a
zero elsewhere. Find the variance. (a2/3)
9. A game is considered fair if each player’s expectation is equal to zero. If a
player pays us $10 each time we roll a 3 or a 4 with a balanced die, how much
should we pay that person when we roll a 1, 2, 5, or 6 to make the game fair?
33

10. The pdf of X is given by

{
−x
f ( x )= e ; x> 0
0 ; elsewhere
3 x /4
Find the expected value of the random variable g ( x )=e . Ans. 4
11. If a student answers the 144 questions of a true-false test by flipping a bal-
anced coin – heads is “true” and tails is “false”—what does Chebyshev’s the-
orem with k = 4 tell us about the number of correct answers s/he will get?
(The probability of getting between 48 and 96 correct answers is 1/16)
12. If X is a random variable such that E(X) = 3, and E( X 2 ) = 13, use Cheby-
shev’s inequality to determine a lower bound for the probability ( - 2 < x < 8).
13. The daily number of car repairs that a workshop can complete is a random
variable with mean µ = 142 and standard deviation ϭ = 12. Using Cheby-
shev’s inequality, with what probability can we assert that on any day it will
repair between 82 and 202 cars?
14. For the following distributions, compute exact probabilities

for k = 1, 2, 3 and compare them with the Che-


byshev’s bounds.
a. X N (0 , σ 2)
b. X b( n=10 , p=0.5)
c. X is Poisson with parameter λ = 4.

d. for all x.

15. The probability that Ms Jenifer will sell a land property at a profit of $3,000 is
3/20, the probability that she will it at a profit of $1,500 is 7/20, the probabil-
ity that she will break even is 7/20, and the probability that she will lose
$1,500 is 3/20. What is her expected profit? Ans. $750

16. If X have the pdf f ( x)= 3x2 ; 0 < x < 1, zero elsewhere, compute E(X3).

17. An urn contains 4 red chips and 3 white chips. Two are drawn at random
without replacement. Let X be the number of red chips in the sample. Find the
probability distribution of Y = 2X – 1. Also find mean and variance.
18. A committee of 3 is selected at random from 4 men and 3 women. Write
down the formula for probability distribution of the random variable X repre-
senting the number of men on the committee. Find expected number of men
on the committee.
34

19. The content of magnesium in an alloy is a random variable given by the pdf
x
f (x) = ;0<x<6
18
= 0, elsewhere
The profit obtained from this alloy is P = 10 + 2X. What is the expected profit?
Find the probability distribution of P.
20. By investing in a stock, John can make a profit in a year of $10,000 with
probability 0.4 or take a loss of $5,000 with probability 0.7. What is his ex-
pected gain?
21. Consider the joint density given by
xy
f(x, y) = x2 + ; 0 < x <1, 0 <y < 2
3
Find conditional densities of X and Y and then show that

9 x+ 4 1 17
E(Y|X = x) = . Also E(Y| ) = .
9 x +3 2 15
22. Let the joint pdf of X and Y is given by
f (x, y) = 6x ; 0 < x < y < 1
Show that the marginal pdfs of X and Y are given by
f (x) = 6x(1 – x); 0 < x < 1, and zero elsewhere and
f (y) = 3y2; zero elsewhere.
Also find the expected values and variances of X and Y.
23. If
f (x, y) = x + y ; 0 < x < 1, 0 < y < 1
= 0; elsewhere
Find (i) E(X|Y), (ii) E(X), (iii) E(Y).
24. The joint distribution of X and Y is given by

X
0 1 2
Y

0 1 1 1
6 3 12
1
2 1
2 9 6
1
36
35

(i) Find the marginal distributions of X and Y


(ii) Find the covariance of X and Y
(iii) Find E( XY2)
(iv) Find the conditional distributions
(v) Also find the correlation co-efficient between X and Y
25. Given the following probability distribution of X:
x∨−3 6 9
111
f ( x )∨
623
a. Find μ x and σ x .
b. Find E[g(X)], where g(X) = (2X + 1)2.
26. Let X and Y are independent random variables with means µx= 10 and µy = 7,
2 2
and variances σ x =5∧σ y = 3. Find the mean and variance of
a. 2X – Y
b. X + 3Y – 5
c. Z = - 2X +4Y - 3
27. Siddiqur and Chang have played the local golf tournament many times.
Their scores are random variables with the following means and standard de-
viations:
Siddiqur, X: μ1 = 115, σ 1 = 12 Chang, Y: μ2= 100, σ 2 = 8.
Assume that their scores vary independently of each other.
a. Compute the mean and standard deviation for the difference of
their scores W = X – Y.
b. Compute the mean and variance of the average of their scores W =
0.5 X + 0.5 Y
c. Compute the mean and variance of Siddiqur’s handicap formula L
= 0.8 X – 2.

28. Show that .


29. Given the pdf
36

a. Determine

b. Find Correlation Coefficient between X & Y and Var(2X - 3Y + 8).

30. The pdf of X is given by

If Find a and b. Also find , and Var(X).

You might also like