0% found this document useful (0 votes)
22 views

Unit 1 Random Variables - MA241T

CMM and Machine Tool Metrology
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Unit 1 Random Variables - MA241T

CMM and Machine Tool Metrology
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

RV COLLEGE OF ENGINEERING

(An autonomous institution affiliated to VTU, Belgaum)


Department of Mathematics
MA241T: Probability Theory and Linear Programming
Unit 1: Random Variables

Topic Learning Objectives:


• To apply the knowledge of the statistical analysis and theory of probability in the study of
uncertainties.
• Define the degree of dependence between two random variables and measuring the same .

Prerequisites:
If an experiment is repeated under essentially homogeneous and similar conditions one
generally comes across two types of situations:
(i) The result or what is usually known as the 'outcome' is unique or certain.
(ii) The result is not unique but may be one of the several possible outcomes.
The phenomena covered by (i) are known as 'deterministic' or 'predictable' phenomena. By a
deterministic phenomenon the result can be predicted with certainty.

For example:
(a) The velocity 'v' of a particle after time 't ' is given by v = u + at where ‘u’ is the initial velocity
and ‘a’ is the acceleration. This equation uniquely determines ‘v’ if the right-hand quantities
are known.
(b) Ohm’s Law , viz., C = E/R where C is the flow of current, E the potential difference between
the two ends of the conductor and R the resistance, uniquely determines the value C as soon
as E and R are given.
A deterministic model is defined as a model which stipulates that the conditions under which
an experiment is performed to determine the outcome of the experiment. For a number of
situations the deterministic model suffices. However, there are phenomena (as covered by (ii)
above) which do not lend themselves to deterministic approach and are known as
'unpredictable' or 'probabilistic' phenomena. For example:
(a) In tossing of a coin one is not sure if a head or tail will be obtained.
(b) If a light tube has lasted for t hours, nothing can be said about its further life. It may fail to
function any moment.
In such cases chance or probability comes into picture which is taken to be a quantitative
measure of uncertainty.

Fourth Semester 1 Probability theory and Linear Programming (MA241T)


Some basic definitions:
Trial and Event: Consider an experiment which, though repeated under essentially identical
conditions, does not give unique results but may result in any one of the several possible
outcomes. The experiment is known as a trial and the outcomes are known as events or casts.
For example:
(i) Throwing of a die is a trial and getting l (or 2 or 3, ... or 6) is an event.
(ii) Tossing of a coin is a trial and getting head (H) or tail (T) is an event.
(iii) Drawing two cards from a pack of well-shuffled cards is a trial and getting a king and a
queen are events.
Exhaustive Events: The total number of possible outcomes in any trial is known as
exhaustive events or exhaustive cases. For example:
(i) In tossing of a coin there are two exhaustive cases, viz., head and tai1.
(ii) In throwing of a die, there are six, exhaustive cases since anyone of the 6 faces 1, 2, ... ,
6 may come uppermost.
(iii) In drawing two cards from a pack of cards the exhaustive number of cases is 52𝐶2 , since 2
cards can be drawn out of 52 cards in 52𝐶2 ways.
Favourable Events or Cases: The number of cases favourable to an event in a trial is the
number of outcomes which entail the happening of the event. For example:
(i) In drawing a card from a pack of cards the number of cases favourable to drawing of an
ace is 4, for drawing a spade 13 and for drawing a red card is 26.
(ii) In throwing of two dice, the number of cases favourable to getting the sum 5 is : (1,4)
(4,1) (2,3) (3,2), i.e., 4.
Mutually exclusive events: Events are said to be mutually exclusive or incompatible if the
happening of any one of them precludes the happening of all the others, i.e., if no two or
more of them can happen simultaneously in the same trial. For example:
(i) In throwing a die all the 6 faces numbered 1 to 6 are mutually exclusive since if any one
of these faces comes, the possibility of others, in the same trial, is ruled out.
(ii) Similarly in tossing a coin the events head and tail are mutually exclusive.
Equally likely events: Outcomes of a trial are set to be equally likely if taking into
consideration all the relevant evidences, there is no reason to expect one in preference to the
others. For example:
(i) In tossing an unbiased or uniform coin, head or tail are likely events.
(ii) In throwing an unbiased die, all the six faces are equally likely to come.

Fourth Semester 2 Probability theory and Linear Programming (MA241T)


Independent events: Several events are said to be independent if the happening (or non-
happening) of an event is not affected by the supplementary knowledge concerning the
occurrence of any number of the remaining events. For example:
(i) In tossing an unbiased coin the event of getting a head in the first toss is independent of
getting a head in the second, third and subsequent throws.
(ii) If one draws a card from a pack of well-shuffled cards and replace it before drawing the
second card, the result of the second draw is independent of the first draw. But, however,
if the first card drawn is not replaced then the second draw is dependent on the first draw.
There are three systematic approaches to the study of probability as mentioned below.
Mathematical or Classical or ‘a priori’ Probability: If a trial results in n exhaustive,
mutually exclusive and equally likely cases and m of them are favourable to the happening of
an event E then the probability 'p' of happening of E is given by
Favourable number of cases 𝑚
𝑝 = 𝑃(𝐸) = =
Exhaustive number of cases 𝑛

Since the number of cases favourable to the 'non-happening' of the event E are (𝑛 − 𝑚), the
probability '𝑞' that 𝐸 will not happen is given by
𝑛−𝑚 𝑚
𝑞= = 1 − = 1 − 𝑝 gives 𝑝 + 𝑞 = 1
𝑛 𝑛
Obviously p as well as q are non-negative and cannot exceed unity, i.e,
0 ≤ 𝑝 ≤ 1, 0 ≤ 𝑞 ≤ 1 .
Statistical or Empirical Probability: If a trial is repeated a number of times under
essentially homogeneous and identical conditions, then the limiting value of the ratio of the
number of times the event happens to the number of trials, as the number of trials become
indefinitely large, is called the probability of happening of the event. (It is assumed that the
limit is finite and unique).
Symbolically, if in n trials an event E happens m times, then the probability 'p' of the
𝑚
happening of E is given by 𝑝 = 𝑃(𝐸) = lim .
𝑛→∞ 𝑛

Axiomatic Probability: Let 𝐴 be any event in the sample space 𝑆, then 𝑃(𝐴) is called the
probability of event 𝐴, if the following axioms are satisfied.
Axiom 1: 𝑃(𝐴) ≥ 0
Axiom 2: 𝑃(𝑆) = 1, S being the sure event
Axiom 3: For two mutually exclusive events 𝐴 & 𝐵, 𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵)

Fourth Semester 3 Probability theory and Linear Programming (MA241T)


Some important results:
1. The probability of an event always lies between 0 and i.e., 0 ≤ P(A) ≤ 1.
2. If A and A′ are complementary events to each other defined on a random experiment then
P(A) + P(A′ ) = 1.
3. Addition Theorem: If A and B are any two events with respective probabilities P(A) and
P(B), then the probability of occurrence of at least one of the events is given by
P(A ∪ B) = P(A) + P(B) − P(A ∩ B).
4. The probability of null event is zero i.e., P(∅) = 0.
5. For any two events A and B of a sample space S
(i) P(A − B) = P(A) − P(A ∩ B)
(ii) P(B − A) = P(B) − P(A ∩ B)
(iii) ̅ ∩ B) = P(B) − P(A ∩ B) = P(A ∪ B) − P(A)
P(A
(iv) P[(A − B) ∪ (B − A)] = P(A) + P(B) − 2P(A ∩ B)
6. Addition Theorem for three events: If A, B and C are any three events with respective
probabilities P(A), P(B) and P(C), then the probability of occurrence of at least one of the
events is given by
P(A ∪ B ∪ C) = P(A) + P(B) + P(C) − P(A ∩ B) − P(B ∩ C) − P(A ∩ C) + P(A ∩ B ∩ C).

Random variable
Random variable is a real number X connected with the outcome of a random experiment E.
For example, if E consists of three tosses of a coin, one can consider the random variable
which is the number of heads (0, 1, 2 or 3).

Outcome HHH HHT HTH THH TTH THT HTT TTT


Value of X 3 2 2 2 1 1 1 0

Let S denote the sample space of a random experiment. A random variable means it is a rule
which assigns a numerical value to each and every outcome of the experiment. Thus, random
variable is a function X (ω) with domain S and range (-∞, ∞) such that for every real number
a, the event [ω: X (ω) ≤ a ] ϵ B the field of subsets in S. It is denoted as f: S → R.
Note that all the outcomes of the experiment are associated with a unique number. Therefore,
f is an example of a random variable. Usually, a random variable is denoted by letters such as
X, Y, Z etc. The image set of the random variable may be written as f(S) = {0, 1, 2, 3}.
There are two types of random variables. They are;
1. Discrete Random Variable (DRV)
Fourth Semester 4 Probability theory and Linear Programming (MA241T)
2. Continuous Random Variable (CRV).
Discrete Random Variable: A discrete random variable is one which takes only a countable
number of distinct values such as 0, 1, 2, 3, … . Discrete random variables are usually (but
not necessarily) counts. If a random variable takes at most a countable number of values, it is
called a discrete random variable. In other words, a real valued function defined on a
discrete sample space is called a discrete random variable.
Examples of Discrete Random Variable:
(i) In the experiment of throwing a die, define X as the number that is obtained. Then X takes
any of the values 1 – 6. Thus, X(S) = {1, 2, 3…6} which is a finite set and hence X is a DRV.
(ii) If X be the random variable denoting the number of marks scored by a student in a subject of
an examination, then X(S) = {0, 1, 2, 3…100}. Then, X is a DRV.
(iii) The number of children in a family is a DRV.
(iv) The number of defective light bulbs in a box of ten is a DRV.
Probability Mass Function: Suppose X is a one-dimensional discrete random variable
taking at most a countably infinite number of values x1, x2, …. With each possible outcome
xi, one can associate a number pi = P(X = xi) = p(xi), called the probability of xi.
The numbers p(xi); i = 1, 2, … must satisfy the following conditions:
(i) 𝑝(𝑥𝑖 ) ≥ 0 ∀ 𝑖 ,
(ii) ∑∞
𝑖=1 𝑝(𝑥𝑖 ) = 1 .

This function 𝑝 is called the probability mass function of the random variable X and the
set {𝑥𝑖 , 𝑝(𝑥𝑖 )} is called the probability distribution of the random variable X.
Remarks:
1. The set of values which X takes is called the spectrum of the random variable.
2. For discrete random variable, knowledge of the probability mass function enables us to
compute probabilities of arbitrary events. In fact, if E is a set of real numbers, (𝑋 ∈ 𝐸) =
∑𝑥∈𝐸∩𝑆 𝑝(𝑥) , where S is the sample space.
Discrete Distribution Function: For the random variable 𝑋 = {𝑥1 , 𝑥2 , 𝑥3 , … }. The
cumulative distribution function is given by 𝐹(𝑥) = 𝑃(𝑋 ≤ 𝑥) = ∑𝑥𝑖 ≤𝑥 𝑝 (𝑥𝑖 ) .
Mean/Expected Value, Variance and Standard Deviation of DRV:
The mean or expected value of a DRV X is defined as
𝐸(𝑋) = 𝜇 = ∑ 𝑥𝑖 𝑃(𝑋 = 𝑥𝑖 ) = ∑ 𝑝(𝑥𝑖 )𝑥𝑖 .
The variance of a DRV X is defined as
𝑉𝑎𝑟 (𝑋) = 𝜎 2 = ∑ 𝑃(𝑋 = 𝑥𝑖 ) (𝑥𝑖 − 𝜇)2 = ∑ 𝑝𝑖 (𝑥𝑖 − 𝜇)2 = ∑ 𝑝𝑖 (𝑥𝑖2 − 𝜇 2 ) .
The standard deviation of DRV X is defined as

Fourth Semester 5 Probability theory and Linear Programming (MA241T)


𝑆𝐷 (𝑋) = 𝜎 = √𝜎 2 = √𝑉𝑎𝑟 (𝑋) .
Continuous Random Variable: A continuous random variable is not defined at specific
values. Instead, it is defined over an interval of values, and is represented by the area under a
curve. Thus, a random variable X is said to be continuous if it can take all possible values
between certain limits. In other words, a random variable is said to be continuous when its
different values cannot be put in 1-1 correspondence with a set of positive integers. Here, the
probability of observing any single value is equal to zero, since the number of values which
may be assumed by the random variable is infinite.
A continuous random variable is a random variable that (at least conceptually) can be
measured to any desired degree of accuracy.
Examples of Continuous Random Variable:
(i) Rainfall in a particular area can be treated as CRV.
(ii) Age, height and weight related problems can be included under CRV.
(iii) The amount of sugar in an orange is a CRV.
(iv) The time required to run a mile is a CRV.
Important Remark: In case of DRV, the probability at a point i.e., P (x = c) is not zero for
some fixed c. However, in case of CRV the probability at a point is always zero.
i.e., P(x = c) = 0 for all possible values of c.

Probability Density Function: The probability density function (p.d.f) of a random variable
X usually denoted by 𝑓𝑥 (𝑥) or simply by 𝑓(𝑥) has the following obvious properties:
i) 𝑓(𝑥) ≥ 0, −∞ < 𝑥 < ∞

ii) ∫−∞ 𝑓(𝑥)𝑑𝑥 = 1
iii) The probability 𝑃(𝐸) given by 𝑃(𝐸) = ∫ 𝑓(𝑥)𝑑𝑥 is well defined for any event E.
If f (x) is the p.d.f of x, then the probability that x belongs to A, where A is some interval (a,
b) is given by the integral of f (x) over that interval.
𝑏
i.e., 𝑃(𝑋 ∈ 𝐴) = ∫𝑎 𝑓(𝑥)𝑑𝑥
Cumulative Density Function: Cumulative density function of a continuous random
𝑥
variable is defined as 𝐹(𝑥) = ∫−∞ 𝑓(𝑡)𝑑𝑡 for − ∞ < 𝑥 < ∞ .
Mean/Expectation, Variance and Standard deviation of CRV:

The mean or expected value of a CRV X is defined as 𝜇 = 𝐸(𝑋) = ∫−∞ 𝑥 𝑓(𝑥)𝑑𝑥

The variance of a CRV X is defined as 𝑉𝑎𝑟(𝑋) = 𝜎 2 = ∫−∞ 𝑥 2 𝑓(𝑥)𝑑𝑥 − 𝜇 2

Fourth Semester 6 Probability theory and Linear Programming (MA241T)


The standard deviation of a CRV X is given by = √𝑉𝑎𝑟(𝑋) .

Examples:
1. The probability density function of a discrete random variable X is given below:
x 0 1 2 3 4 5 6
P(X = x) = f(x) K 3k 5k 7k 9k 11k 13k

Find (i) k; (ii) F (4); (iii) 𝑃(𝑋 ≥ 5); (iv) 𝑃(2 ≤ 𝑋 < 5); (v) E(X) and (vi) Var (X).
Solution: To find the value of k, consider the sum of all the probabilities which equals to
49k. Equating this to 1, k = 1/49. Therefore, distribution of X may now be written as
x 0 1 2 3 4 5 6
P(X = x) = f(x) 1/49 3/49 5/49 7/49 9/49 11/49 13/49

25
𝐹(4) = 𝑃[𝑋 ≤ 4] = 𝑃[𝑋 = 0] + 𝑃[𝑋 = 1] + 𝑃[𝑋 = 2] + 𝑃[𝑋 = 3] + 𝑃[𝑋 = 4] = 49 .
24
𝑃[𝑋 ≥ 5] = 𝑃[𝑋 = 5] + 𝑃[𝑋 = 6] = .
49
21
𝑃[2 ≤ 𝑋 < 5] = 𝑃[𝑋 = 2] + 𝑃[𝑋 = 3] + 𝑃[𝑋 = 4] = .
49
Next to find E(X), consider
203
𝐸(𝑋) = ∑ 𝑥𝑖 ∗ 𝑓( 𝑥𝑖 ) = .
49
𝑖

To obtain Variance, it is necessary to compute


973
𝐸(𝑋 2 ) = ∑𝑖 𝑥𝑖2 ∗ 𝑓(𝑥𝑖 ) = .
49

Thus, Variance of X is obtained by using the relation,


973 203 2
𝑉𝑎𝑟(𝑋) = 𝐸(𝑋 2 ) − [𝐸(𝑋)]2 = − ( 49 ) .
49

2. A random variable, X, has the following distribution function.


X -2 -1 0 1 2 3
f (xi) 0.1 k 0.2 2k 0.3 k
Find (i) k; (ii) F (2); (iii) 𝑃(−2 < 𝑋 < 2); (iv) 𝑃(−1 < 𝑋 ≤ 2); (v) E(X) and
(vi) Variance.
Solution: Consider the result, namely, sum of all the probabilities equals 1,

Fourth Semester 7 Probability theory and Linear Programming (MA241T)


0.1 + 𝑘 + 0.2 + 2𝑘 + 0.3 + 𝑘 = 1 yields k = 0.1. In view of this, distribution function of
X may be formulated as

X -2 -1 0 1 2 3
f (xi) 0.1 0.1 0.2 0.2 0.3 0.1

Note that 𝐹(2) = 𝑃[𝑋 ≤ 2]


= 𝑃[𝑋 = −2] + 𝑃[𝑋 = −1] + 𝑃[𝑋 = 0] + 𝑃[𝑋 = 1] + 𝑃[𝑋 = 2]
= 0.9. The same also be obtained using the result,
F(2) = P[X ≤ 2] = 1 − P[X < 1] = 1 − {P[X = −2] + P[X = −1] + P[X = 0]} = 0.6 .
Next, 𝑃(−2 < 𝑋 < 2) = 𝑃[𝑋 = −1] + 𝑃[𝑋 = 0] + 𝑃[𝑋 = 1] = 0.5 .
Clearly, 𝑃(−1 < 𝑋 ≤ 2) = 0.7 .
Now, consider (𝑋) = ∑𝑖 𝑥𝑖 ∗ 𝑓(𝑥𝑖 ) = 0.8 .
Then 𝐸(𝑋 2 ) = ∑𝑖 𝑥𝑖2 ∗ 𝑓(𝑥𝑖 ) = 2.8. 𝑉𝑎𝑟(𝑋) = 𝐸(𝑋 2 ) − {𝐸(𝑋)}2 = 2.8 − 0.64 = 2.16 .
3. A shipment of 20 similar laptop computers to a retail outlet contains 3 that are defective. If
a school makes a random purchase of 2 of these computers, find the probability distribution
for the number of defectives.
Solution: Let X be a random variable whose values x are the possible numbers of defective
computers purchased by the school. Then x can only take the numbers 0, 1, and 2. Now
(30)(17
2) 68 (31)(17
1) 51
𝑓(0) = 𝑃(𝑋 = 0) = (20
= , 𝑓(1) = 𝑃(𝑋 = 1) = (20
= 190
2) 95 2)

(32)(17
0) 3
𝑓(2) = 𝑃(𝑋 = 2) = (20
= 190.
2)

Thus, the probability distribution of X is


x 0 1 2
f (x) 68/95 51/190 3/190

4. If a car agency sells 50% of its inventory of a certain foreign car equipped with side
airbags, find a formula for the probability distribution of the number of cars with side airbags
among the next 4 cars sold by the agency.
Solution: Since the probability of selling an automobile with side airbags is 0.5, the 24 = 16
points in the sample space are equally likely to occur. Therefore, the denominator for all
probabilities, and also for our function, is 16. To obtain the number of ways of selling 3 cars
with side airbags, it is required to consider the number of ways of partitioning 4 outcomes

Fourth Semester 8 Probability theory and Linear Programming (MA241T)


into two cells, with 3 cars with side airbags assigned to one cell and the model without side
airbags assigned to the other. This can be done in (43) = 4 ways. In general, the event of

selling x models with side airbags and 4 - x models without side airbags can occur in (𝑥4)
ways, where x can be 0, 1, 2, 3, or 4. Thus, the probability distribution f(x) = P(X = x) is
1
𝑓(𝑥) = (16 ) (𝑥4) 𝑓𝑜𝑟 𝑥 = 0,1,2,3,4.
5. The diameter of an electric cable, say X, is assumed to be a continuous random variable
6𝑥(1 − 𝑥) 0≤𝑥≤1
with p.d.f 𝑓(𝑥) = {
0 otherwise
(i) Check that above is p.d.f.
2
(ii) Find 𝑃 (3 < 𝑥 < 1)

(iii) Determine a number b such that 𝑃(𝑋 < 𝑏) = 𝑃(𝑋 > 𝑏).
Solution: (i) 𝑓(𝑥) ≥ 0 in the given interval.
∞ 0 1 ∞
∫ 𝑓(𝑥)𝑑𝑥 = ∫ 𝑓(𝑥)𝑑𝑥 + ∫ 𝑓(𝑥)𝑑𝑥 + ∫ 𝑓(𝑥)𝑑𝑥
−∞ −∞ 0 1
1
= 0 + ∫0 6𝑥(1 − 𝑥) 𝑑𝑥 + 0
6𝑥 2 6𝑥 3
={ − } by putting limits x = 0 to 1
2 3

=1
2 1 1 7
(ii) 𝑃 (3 < 𝑥 < 1) = ∫2/3 𝑓(𝑥)𝑑𝑥 = ∫2/3(6𝑥 − 6𝑥 2 )𝑑𝑥 = 27 .

(iii) 𝑃(𝑋 < 𝑏) = 𝑃(𝑋 > 𝑏)


𝑏 1
∫0 𝑓(𝑥)𝑑𝑥 = ∫𝑏 𝑓(𝑥)𝑑𝑥
𝑏 1
6 ∫0 𝑥(1 − 𝑥)𝑑𝑥 = 6 ∫𝑏 𝑥(1 − 𝑥)𝑑𝑥
𝑏2 𝑏3 1 1 𝑏2 𝑏3
(2 − ) = [(2 − 3) − ( 2 − )]
3 3

3𝑏 2 − 2𝑏 3 = [1 − 3𝑏 2 + 2𝑏 3 ]
4𝑏 3 − 6𝑏 2 + 1 = 0
(2𝑏 − 1)(2𝑏 2 − 2𝑏 − 1) = 0
From this b = ½ is the only real value lying between 0 and 1 and satisfying the given
condition.
6. Suppose that the error in the reaction temperature, in ◦C, for a controlled laboratory
experiment is a continuous random variable X having the probability density function

Fourth Semester 9 Probability theory and Linear Programming (MA241T)


𝑥2
𝑓(𝑥) = { 3 , −1 < 𝑥 < 2
0 , elsewhere
(i) Verify that f (x) is a probability density function.
(ii) Find P(0 < X ≤ 1).
∞ 2 𝑥2
Solution: a) ∫−∞ 𝑓(𝑥)𝑑𝑥 = ∫−1 𝑑𝑥 = 1. Hence the given function is a p.d.f.
3
1 𝑥2 1
b) P(0 < X ≤ 1) = ∫0 𝑑𝑥 = 9 .
3

7. The length of time (in minutes) that a certain lady speaks on telephone is found to be a
−𝑥

random variable with probability function 𝑓(𝑥) = { 𝐴𝑒 5 𝑓𝑜𝑟 𝑥 ≥ 0


0 otherwise
(i) Find A
(ii) Find the probability that she will speak on the phone
(a) more than 10 min (b) less than 5 min (c) between 5 & 10 min.

Solution: (i) Given f (x) is p.d.f. i.e., ∫−∞ 𝑓(𝑥)𝑑𝑥 = 1
0 ∞
∫−∞ 𝑓(𝑥)𝑑𝑥 + ∫0 𝑓(𝑥)𝑑𝑥 = 1
𝑦𝑖𝑒𝑙𝑑𝑠 −𝑥

→ 0 + ∫0 𝐴𝑒 5 𝑑𝑥 = 1
𝑦𝑖𝑒𝑙𝑑𝑠 1
→ 𝐴=5
∞ ∞1 −𝑥
(ii) (a) 𝑃(𝑥 > 10) = ∫10 𝑓(𝑥)𝑑𝑥 = ∫10 5 𝑒 5 𝑑𝑥 = 𝑒 −2 = 0.1353
−𝑥
5 51
(b) 𝑃(𝑥 < 5) = ∫−∞ 𝑓(𝑥)𝑑𝑥 = ∫0 5 𝑒 5 𝑑𝑥 = −𝑒 −1 + 1 = 0.6322
−𝑥
10 10 1
(c) 𝑃(5 < 𝑥 < 10) = ∫5 𝑓(𝑥)𝑑𝑥 = ∫5 𝑒 5 𝑑𝑥 = −𝑒 −2 + 𝑒 −1 = 0.2325 .
5

8. Suppose X is a continuous random variable with the following probability density function
𝑓(𝑥) = 3𝑥 2 for 0 < 𝑥 < 1 . Find the mean and variance of X.

Solution: Mean = 𝜇 = ∫−∞ 𝑥𝑓(𝑥)𝑑𝑥
0 1 ∞
= ∫−∞ 𝑥𝑓(𝑥)𝑑𝑥 + ∫0 𝑥𝑓(𝑥)𝑑𝑥 + ∫1 𝑥𝑓(𝑥)𝑑𝑥
1 1 3
= 0 + ∫0 𝑥 ∗ 3𝑥 2 𝑑𝑥 + 0 = ∫0 3𝑥 3 𝑑𝑥 = .
4

Variance = 𝜎 2 = ∫−∞ 𝑥 2 𝑓(𝑥)𝑑𝑥 − 𝜇 2
1
= ∫0 𝑥 2 𝑓(𝑥)𝑑𝑥 − 𝜇 2
1 3 2
= ∫0 𝑥 2 ∗ 3𝑥 2 𝑑𝑥 − (4)

Fourth Semester 10 Probability theory and Linear Programming (MA241T)


1 3 2 3
= ∫0 3𝑥 4 𝑑𝑥 − (4) = .
80

Exercise:
1. Two cards are drawn randomly, simultaneously from a well shuffled deck of 52 cards. Find
the variance for the number of aces.
1 2 𝑥
2. If X is a discrete random variable taking values 1,2,3,… with 𝑃(𝑥) = 2
(3) . Find P(X being

an odd number) by first establishing that P(x) is a probability function.


3. The probability mass function of a random variable X is zero except the points x = 0,1,2. At
these points it has the values 𝑝(0) = 3𝑐 3 , 𝑝(1) = 4𝑐 − 10𝑐 2 and
𝑝(2) = 5𝑐 − 1 for some 𝑐 > 0.
a) Determine the value of c.
b) Compute the probabilities P(X < 2)and P(1 < 𝑋 ≤ 2).
1
c) Find the largest x such that F(𝑥) < 2 .
1
d) Find the smallest x such that F(𝑥) ≥ 3 .
1
4. If X is a random variable with P(𝑋 = 𝑥) = 2x , where 𝑥 = 1,2,3, … ∞.

Find i) P(X) (ii) P(X = even) (iii) P(X = divisible by 3).


5. A continuous random variable has the density function
2
𝑓(𝑥) = {𝑘𝑥 − 3 < 𝑥 < 3
0 otherwise
Find k and hence find 𝑃(𝑥 < 3) , 𝑃(𝑥 > 1) .

6. Let X be a continuous random variable with p.d.f.


𝑎𝑥 , 0≤𝑥≤1
𝑎, 1≤𝑥≤2
𝑓(𝑥) = {
−𝑎𝑥 + 3𝑎 , 2≤𝑥≤3
0, otherwise
(i) Determine the constant. (ii) Compute 𝑃(𝑋 ≤ 1.5).

1
7. Find the mean and variance of the probability density function 𝑓(𝑥) = 2 𝑒 −|𝑥|

8. A continuous distribution of a variable X in the range (-3, 3) is defined by


1
(3 + 𝑥)2 , − 3 ≤ 𝑥 ≤ −1
16
1
𝑓(𝑥) = (6 − 2𝑥 2 ) , −1≤𝑥 ≤1
16
1 2
{ 16 (3 − 𝑥) , 1≤𝑥≤3

(i) Verify that the area under the curve is unity.


(ii) Find the mean and variance of the above distribution.

Fourth Semester 11 Probability theory and Linear Programming (MA241T)


Answers: 1) 0.1392 2) 3/5 3) 1/3, 1/3, 2/3, 1, 1 4) 1, 1/3, 1/7 5) 1/18, 1, 13/27
6) 1/2, 1/2 7) Mean = 0 and Variance = 2 8) Unit area and 0, 1.

JOINT PROBABILITY

Two or more random variables:


So far, only single random variables were considered. If one chooses a person at random and
measures his or her height and weight, each measurement is a random variable – but taller
people also tend to be heavier than shorter people, so the outcomes will be related. In order to
deal with such probabilities, joint probability distribution of two random variables are studied
in detail.

Joint Probability distribution for discrete random variables

Joint Probability Mass Function:


Let X and Y be random variables on the same sample space S with respective range spaces
R X = {x1 , x2 , … , xn } and R Y = {y1 , y2 , … , ym }. The joint distribution or joint probability
function of X and Y is the function h on the product space RX × RY defined by
ℎ(𝑥𝑖 , 𝑦𝑗 ) ≡ 𝑃(𝑋 = 𝑥𝑖 , 𝑌 = 𝑦𝑗 ) ≡ 𝑃({𝑠 ∈ 𝑆 ∶ 𝑋(𝑠) = 𝑥𝑖 , 𝑌(𝑠) = 𝑦𝑗 })
The function ℎ has the following properties:
(i) ℎ(𝑥𝑖 , 𝑦𝑗 ) ≥ 0
(ii) ∑𝑖 ∑𝑗 ℎ(𝑥𝑖 , 𝑦𝑗 ) = 1
Thus, ℎ defines a probability space on the product space RX × RY .

Y
𝑦1 𝑦1 … 𝑦𝑗 … 𝑦𝑚 ∑ 𝑦𝑖
X 𝑖

𝑥1 ℎ(𝑥1 , 𝑦1 ) ℎ(𝑥1 , 𝑦2 ) … ℎ(𝑥1 , 𝑦𝑗 ) … ℎ(𝑥1 , 𝑦𝑚 ) 𝑓(𝑥1 )


𝑥2 ℎ(𝑥2 , 𝑦1 ) ℎ(𝑥2 , 𝑦2 ) … ℎ(𝑥2 , 𝑦𝑗 ) … ℎ(𝑥2 , 𝑦𝑚 ) 𝑓(𝑥2 )
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮

Fourth Semester 12 Probability theory and Linear Programming (MA241T)


𝑥𝑖 ℎ(𝑥𝑖 , 𝑦1 ) ℎ(𝑥𝑖 , 𝑦2 ) … ℎ(𝑥𝑖 , 𝑦𝑗 ) … ℎ(𝑥𝑖 , 𝑦𝑚 ) 𝑓(𝑥𝑖 )
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
𝑥𝑚 ℎ(𝑥𝑛 , 𝑦1 ) ℎ(𝑥𝑛 , 𝑦2 ) … ℎ(𝑥𝑛 , 𝑦𝑗 ) … ℎ(𝑥𝑛 , 𝑦𝑚 ) 𝑓(𝑥𝑛 )

∑ 𝑥𝑖 𝑔(𝑦1 ) 𝑔(𝑦2 ) … 𝑔(𝑦𝑗 ) … 𝑔(𝑦𝑚 )


𝑖

The functions 𝑓 and 𝑔 on the right side and the bottom side, respectively, of the joint
distribution table are defined by
𝑓(𝑥𝑖 ) = ∑𝑗 ℎ(𝑥𝑖 , 𝑦𝑗 ) and 𝑔(𝑦𝑗 ) = ∑𝑗 ℎ(𝑥𝑖 , 𝑦𝑗 ).
That is, 𝑓(𝑥𝑖 ) is the sum of the entries in the 𝑖 𝑡ℎ row and 𝑔(𝑦𝑗 ) is the sum of the entries in the
𝑗 𝑡ℎ column. They are called the marginal distributions of X and Y, respectively.

Expectation: Consider a function 𝜑(𝑋, 𝑌) of 𝑋 and 𝑌. Then the function


𝐸{𝜑(𝑋, 𝑌)} = ∑𝑖 ∑𝑗 ℎ(𝑥𝑖 , 𝑦𝑗 ) 𝜑(𝑥𝑖 , 𝑦𝑗 )
is called the mathematical expectation of 𝜑(𝑋, 𝑌) in the joint distribution of 𝑋 and 𝑌.
Co-variance and Correlation: Let X and Y be random variables with the joint distribution
ℎ(𝑥, 𝑦), and respective means 𝜇𝑋 and 𝜇𝑌 . The covariance of X and Y, is denoted by
𝑐𝑜𝑣(𝑋, 𝑌) and is defined as

𝑐𝑜𝑣(𝑋, 𝑌) = ∑(𝑥𝑖 − 𝜇𝑋 ) (𝑦𝑗 − 𝜇𝑌 ) ℎ(𝑥𝑖 , 𝑦𝑗 )


𝑖,𝑗

𝑐𝑜𝑣(𝑋, 𝑌) = ∑ 𝑥𝑖 𝑦𝑗 ℎ(𝑥𝑖 , 𝑦𝑗 ) − 𝜇𝑋 𝜇𝑌
𝑖,𝑗

The correlation of X and Y is defined by


𝑐𝑜𝑣(𝑋, 𝑌)
𝜌(𝑋, 𝑌) =
𝜎𝑋 𝜎𝑌
The correlation 𝜌 is dimensionless and has the following properties:
(i) 𝜌(𝑋, 𝑌) = 𝜌(𝑌, 𝑋),
(ii) −1 ≤ 𝜌 ≤ 1,
(iii) 𝜌(𝑋, 𝑋) = 1, 𝜌(𝑋, −𝑋) = −1,
(iv) 𝜌(𝑎𝑋 + 𝑏, 𝑐𝑌 + 𝑑) = 𝜌(𝑋, 𝑌) if 𝑎, 𝑐 ≠ 0.

Conditional probability distribution:

Fourth Semester 13 Probability theory and Linear Programming (MA241T)


We know that the value 𝑥 of the random variable 𝑋 represents an event that is a subset of the
sample space. If we use the definition of conditional probability,
𝑃(𝐴∩𝐵)
𝑃(𝐵|𝐴) = , provided 𝑃(𝐴) > 0,
𝑃(𝐴)

where 𝐴 and 𝐵 are now the events defined by 𝑋 = 𝑥 and 𝑌 = 𝑦, respectively, then
𝑃(𝑋=𝑥,𝑌=𝑦) ℎ(𝑥,𝑦)
𝑃(𝑌 = 𝑦|𝑋 = 𝑥) = = , provided 𝑓(𝑥) > 0,
𝑃(𝑋=𝑥) 𝑓(𝑥)

Where 𝑋 and 𝑌 are discrete random variables.

Problem 1. A coin is tossed three times. Let 𝑋 be equal to 0 or 1 according as a head or a tail
occurs on the first toss. Let 𝑌 be equal to the total number of heads which occurs. Determine
(i) the marginal distributions of 𝑋 and 𝑌, and (ii) the joint distribution of 𝑋 and 𝑌, (iii)
expected values of 𝑋, 𝑌, 𝑋 + 𝑌 and 𝑋𝑌, (iv) 𝜎𝑋 and 𝜎𝑌 , (v) 𝐶𝑜𝑣(𝑋, 𝑌) and 𝜌(𝑋, 𝑌).
Solution: Here the sample space is given by
𝑆 = {𝐻𝐻𝐻, 𝐻𝐻𝑇, 𝐻𝑇𝐻, 𝐻𝑇𝑇, 𝑇𝐻𝐻, 𝑇𝐻𝑇, 𝑇𝑇𝐻, 𝑇𝑇𝑇}
(i) The distribution of the random variable 𝑋 is given by the following table
X 0 1
(First toss Head or Tail) (First toss Head) (First toss Tail)
P(X) 4 4
(Probability of random variable X) 8 8

which is the marginal distribution of the random variable X.

The distribution of the random variable Y is given by the following table


Y 0 1 2 3
(Total number of Heads) (zero Heads) (one Head) (two Head) (three Head)
P(Y) 1 3 3 1
(Probability of random variable Y) 8 8 8 8

which is the marginal distribution of the random variable Y.

(ii) The joint distribution of the random variables 𝑋and 𝑌 is given by the following table
Y 0 1 2 3
X (zero Heads) (one Head) (two Head) (three Head)

Fourth Semester 14 Probability theory and Linear Programming (MA241T)


0 1 2 1
0
(First toss Head 8 8 8
1 1 2 1
0
(First toss Tail) 8 8 8
4 4 4
(iii) E[X] = μX = ∑ xi P(xi ) = x1 P(x1 ) + x2 P(x2 ) = 0 × + 1 × =
8 8 8
E[Y] = μY = ∑ yj P(yj ) = y1 P(y1 ) + y2 P(y2 ) + y3 P(y3 )

1 3 3 1 12
=0× +1× +2× +3× =
8 8 8 8 8
E[X + Y] = ∑ ∑ Pij (xi + yj )

= P11 (x1 + y1 ) + P12 (x1 + y2 ) + P13 (x1 + y3 ) + P14 (x1 + y4 ) + P21 (x2 + y1 )
+ P22 (x2 + y2 ) + P23 (x2 + y3 ) + P24 (x2 + y4 )
1 2 2 1 16
= 0(0 + 0) + (0 + 1) + (0 + 2) + (1 + 1) + (1 + 2) + 0(1 + 3) =
8 8 8 8 8
= 2.

E[XY] = ∑ ∑ Pij (xi yj )

= P11 (x1 y1 ) + P12 (x1 y2 ) + P13 (x1 y3 ) + P14 (x1 y4 ) + P21 (x2 y1 ) + P22 (x2 y2 )
+ P23 (x2 y3 ) + P24 (x2 y4 )
1 2 2 1
= 0(0 × 0) + (0 × 1) + (0 × 2) + (1 × 1) + (1 × 2) + 0(1 × 3) = 2.
8 8 8 8
4 4
(iv) σ2X = E[X 2 ] − μ2X = ∑ xi2 P(xi ) − [E(X)]2 = x12 P(x1 ) + x22 P(x2 ) = 02 × + 12 × −
8 8

4 2 1
(8) = 4

σ2Y = E[Y 2 ] − μ2Y

= ∑ yi2 P(yi ) − [E(Y)]2 = y12 P(y1 ) + y22 P(y2 ) + y32 P(y3 ) + +y42 P(y4 )

1 3 3 1 2 2 3
= 02 × + 12 × + 22 × +32 × − ( ) =
8 8 8 8 8 4
1 1 3 1
(v) Cov(X, Y) = E(XY) − μX μY = − × = −
2 2 2 4
Cov(X,Y) −1/4 1
ρ(X, Y) = = (1/2)( =− .
σX σY √3/2) √3

Problem 2: The joint distribution of two random variables X and Y is given by the following
table:

Fourth Semester 15 Probability theory and Linear Programming (MA241T)


Y
2 3 4
X
1 0.06 0.15 0.09
2 0.14 0.35 0.21
Determine the individual distributions of X and Y. Also, verify that X and Y are stochastically
independent.
Solution: X takes values 1, 2 and Y takes the values 2, 3, 4. Also, h11 = 0.06, h12 = 0.15
h13 = 0.09, h21 = 0.14, h22 = 0.35, h23 = 0.21
Therefore, f1 = h11 + h12 + h13 = 0.3, f2 = h21 + h22 + h23 = 0.7,
g1 = h11 + h21 = 0.2, g 2 = h12 + h22 = 0.5, g 3 = h13 + h23 = 0.3.
The distribution of X is given by
xi 1 2
fi 0.3 0.7

The distribution of 𝑌 is given by


yj 2 3 4
gj 0.2 0.5 0.3
f1 g1 = 0.06 = h11 , f1 g 2 = 0.15 = h12, f1 g 3 = 0.09 = h13,
f2 g1 = 0.14 = h21 , f2 g 2 = 0.35 = h22, f2 g 3 = 0.21 = h23,
Thus, fi g j = hij for all values of i and j so, X and Y are stochastically independent.
Problem 3.The joint distribution of two random variables 𝑋 and 𝑌 is given by the following
table:
Y
0 1
X
0 0.1 0.2
1 0.4 0.2
2 0.1 0
(a) Find P(X + Y > 1)
(b) Determine the individual (marginal) probability distributions of X and Y and verify that X
and 𝑌 are not independent.
(c) Find 𝑃(𝑋 = 2|𝑌 = 0).
(d) Find the conditional distribution of 𝑋 given 𝑌 = 1.

Fourth Semester 16 Probability theory and Linear Programming (MA241T)


Solution: Note that 𝑋 takes the values 0, 1, 2 and 𝑌 takes the values 0, 1
ℎ11 = 0.1, ℎ12 = 0.2, ℎ21 = 0.4, ℎ22 = 0.2, ℎ31 = 0.1, ℎ32 = 0,
(a) The event 𝑋 + 𝑌 > 1 occurs only when the pair (𝑋, 𝑌) takes the values (1,1), (2,0) and
(2,1). The probability that this event occurs is therefore
P(𝑋 + 𝑌 > 1) = ℎ22 + ℎ31 + ℎ32 = 0.2 + 0.1 + 0 = 0.3.

(b) 𝑓1 = ℎ11 + ℎ12 = 0.1 + 0.2 = 0.3.


𝑓2 = ℎ21 + ℎ22 = 0.4 + 0.2 = 0.6.

𝑓 3 = ℎ31 + ℎ32 = 0.1 + 0 = 0.1.


𝑔1 = ℎ11 + ℎ21 + ℎ31 = 0.6
𝑔 2 = ℎ12 + ℎ22 + ℎ32 = 0.4.
The distribution of 𝑋 is given by
𝑥𝑖 0 1 2
𝑓𝑖 0.6 0.6 0.1
The distribution of 𝑌 is given by
𝑦𝑗 0 1
𝑔𝑗 0.6 0.4
It is verified that 𝑓1 𝑔1 = 0.18 ≠ ℎ11 .
Therefore, 𝑋 and 𝑌 are not stochastically independent.
ℎ(2,0) ℎ31 0.1 1
(c) 𝑃(𝑋 = 2|𝑌 = 0) = = = 0.6 = 6
𝑔(0) 𝑔1

(d) Conditional distribution of 𝑋 given 𝑌 = 1 is


ℎ(𝑥, 1) ℎ𝑖2
𝑃(𝑋 = 𝑥|𝑌 = 1) = =
𝑔(1) 𝑔2

ℎ(0,1) ℎ12 0.2


𝑃(𝑋 = 0|𝑌 = 1) = = = = 0.5
𝑔(1) 𝑔2 0.4
ℎ(1,1) ℎ22 0.2
𝑃(𝑋 = 1|𝑌 = 1) = = = = 0.5
𝑔(1) 𝑔2 0.4
ℎ(2,1) ℎ32 0
𝑃(𝑋 = 2|𝑌 = 1) = = = =0
𝑔(1) 𝑔2 0.4

Fourth Semester 17 Probability theory and Linear Programming (MA241T)


Problem 4. The joint distribution of two random variables 𝑋 and 𝑌 is given by 𝑝𝑖𝑗 =
𝑘(𝑖 + 𝑗), 𝑖 = 1, 2, 3, 4; 𝑗 = 1, 2, 3. Find (i) 𝑘 and (ii) the marginal distributions of 𝑋 and 𝑌.
Show that 𝑋 and 𝑌 are not independent.
Solution: For the given 𝑝𝑖𝑗 ,
4 3 4 3

∑ ∑ ℎ𝑖𝑗= ∑ ∑ ℎ = 𝑘 ∑ ∑(𝑖 + 𝑗 )
𝑖 𝑗 𝑖=1 𝑗=1 𝑖=1 𝑗=1
4 4

= 𝑘 ∑{(𝑖 + 1) + (𝑖 + 2) + (𝑖 + 3)} = 𝑘 ∑(3 𝑖 + 6)


𝑖=1 𝑖=1

= 𝑘 {(3 + 6) + (3 × 2 + 6) + (3 × 3 + 6) + (3 × 4 + 6)} = 54𝑘


Since

∑𝑖 ∑𝑗 ℎ𝑖𝑗 = 1, 𝑖. 𝑒. , 54𝑘 = 1 𝑜𝑟 𝑘 = 1/54

𝑖+2
𝑓𝑖 = ∑𝑗 ℎ𝑖𝑗 = ∑3𝑗=1 ℎ𝑖𝑗 =𝑘 ∑3𝑗=1(𝑖 + 𝑗 ) =
18

2 𝑗 +5
𝑔𝑗 = ∑𝑖 ℎ𝑖𝑗 = ∑4𝑗=1 ℎ𝑖𝑗 =𝑘 ∑3𝑗=1(𝑖 + 𝑗 ) =
27

Therefore, the marginal distributions of X and Y are


𝑖+2 2 𝑗 +5
{𝑓𝑖 } = { }, i=1,2,3,4 and {𝑔𝑗 } = { }, j=1,2,3.
18 27

Finally note that 𝑓𝑖 𝑔𝑗 ≠ ℎ11 , so X and Y are not independent.


Problem 5. The joint probability distribution of two random variables X and Y is given by
the following table.
Y
1 3 9
X
2 1/8 1/24 1/12
4 1/4 1/4 0
6 1/8 1/24 1/12
Find the marginal distribution of 𝑋 and 𝑌, and evaluate 𝑐𝑜𝑣(𝑋, 𝑌). Find 𝑃(𝑋 = 4|𝑌 = 3) and
𝑃(𝑌 = 3|𝑋 = 4)
Solution: From the table, note that
1 1 1 1
𝑓1 = + + =
8 24 12 4
1 1 1
𝑓2 = 4
+ 4
+ 0= 2

Fourth Semester 18 Probability theory and Linear Programming (MA241T)


1 1 1 1
𝑓3 = 8 + 24 + 12 = 4
1 1 1 1
𝑔1 = 8 + 4 + =2
8
1 1 1 1
𝑔 2 = 24 + 4 +24 = 3
1 1 1
𝑔3 = +0+ =
12 12 6

The marginal distribution of X is given by the table:


𝑥𝑖 2 4 6
𝑓𝑖 1/4 1/2 1/4

And the marginal distribution of Y is given by the table:


11 𝑦𝑗 1 3 9
𝑔𝑗 1/2 1/3 1/6

Therefore, the means of these distributions are respectively,


1 1 1
𝜇𝑋 = ∑ 𝑥 𝑖 𝑃(𝑥 𝑖 ) = (2 × ) + (4 × ) + (6 × ) = 4
4 2 4

1 1 1
𝜇𝑌 = ∑ 𝑦 𝑗 𝑃(𝑦 𝑗 ) = (1 × ) + (3 × ) + (9 × ) = 3
2 3 6

E[𝑋𝑌] = ∑𝑖 ∑𝑗 ℎ𝑖𝑗 𝑥 𝑖 𝑥 𝑗
= (ℎ11 𝑥1 𝑦1 + ℎ12 𝑥1 𝑦2 + ℎ13 𝑥1 𝑦3 ) + (ℎ21 𝑥2 𝑦1 + ℎ22 𝑥2 𝑦2 + ℎ23 𝑥2 𝑦3 )
+(ℎ31 𝑥3 𝑦1 + ℎ32 𝑥3 𝑦2 + ℎ31 𝑥3 𝑦3 )
1 1 1 1 1 1
= (2 × 8) + (6 × 24) + (18 × 12) + (4 × 4) + (12 × 4) + 36 × 0 + (6 × 8) +
1 1
(18 × 24) + (54 × 12)

= 2 + 4 + 6 = 12
𝐶𝑜𝑣 (𝑋, 𝑌) = 𝐸[𝑋𝑌] − 𝜇𝑋 𝜇𝑌 = 12 − 12 = 0
𝜌(𝑋, 𝑌) = 0.

1
ℎ(4,3) ℎ22 4 3
𝑃(𝑋 = 4|𝑌 = 3) = = = 1 =
𝑔(3) 𝑔2 4
3

Fourth Semester 19 Probability theory and Linear Programming (MA241T)


1
ℎ(4,3) ℎ22 4 2
𝑃(𝑌 = 3|𝑋 = 4) = = = 1 = = 0.5
𝑓(4) 𝑓2 4
2

Problems to practice:
1) The joint probability distribution of two random variables X and Y is given by the
following table.

Y 5
-2 -1 4
X
1 0.1 0.2 0 0.3
2 0.2 0.1 0.1 0
(a) Find the marginal distribution of 𝑋 and 𝑌, and evaluate 𝑐𝑜𝑣(𝑋, 𝑌).
(b) Also determine whether 𝜇𝑋 and 𝜇𝑌 .
(c) Find 𝑃(𝑌 = −1|𝑋 = 1) and 𝑃(𝑋 = 2|𝑌 = 4)

2) Two textbooks are selected at random from a shelf containing three statistics texts,
two mathematics texts and three engineering texts. Denoting the number of books
selected in each subject by S, M and E respectively, find (a) the joint distribution of S
and M, (b) the marginal distributions of S, M and E, and (c) Find the correlation of the
random variables S and M.
3) Consider an experiment that consists of 2 throws of a fair die. Let 𝑋 be the number of
4s and 𝑌 be the number of 5s obtained in the two throws. Find the joint probability
distribution of 𝑋 and 𝑌. Also evaluate 𝑃(2𝑋 + 𝑌 < 3).

Joint Probability distribution for continuous random variables:


Let 𝑥 and 𝑦 be two continuous random variables. Suppose there exists a real valued function
ℎ(𝑥, 𝑦) of 𝑥 and 𝑦 such that the following conditions hold:
(i) h(x, y) ≥ 0 for all 𝑥, 𝑦
∞ ∞
(ii) ∫−∞ ∫−∞ h(x, y) 𝑑𝑥 𝑑𝑦 exists and is equal to 1.
Then, h(x, y) is called joint probability density function.

Fourth Semester 20 Probability theory and Linear Programming (MA241T)


If [𝑎, 𝑏] and [𝑐, 𝑑] are any two intervals, then the probability that 𝑥 ∈ [𝑎, 𝑏] and 𝑦 ∈ [𝑐, 𝑑], is
denoted by 𝑃(𝑎 ≤ 𝑥 ≤ 𝑏, 𝑐 ≤ 𝑦 ≤ 𝑑) is defined by the formula
𝑏 𝑑
𝑃(𝑎 ≤ 𝑥 ≤ 𝑏, 𝑐 ≤ 𝑦 ≤ 𝑑) = ∫ ∫ ℎ(𝑥, 𝑦) 𝑑𝑦 𝑑𝑥.
𝑎 𝑐

For any specified real numbers u, v, the function


𝑢 𝑣
F(u, v) = ∫−∞ ∫−∞ ℎ(𝑥, 𝑦) 𝑑𝑦 𝑑𝑥
is called the joint or the compound cumulative distribution function.
𝜕2 𝐹
Where 𝐹(𝑢, 𝑣) = 𝑃(−∞ < 𝑥 ≤ 𝑢, − ∞ < 𝑦 ≤ 𝑣) and = 𝑝(𝑢, 𝑣).
𝜕𝑢𝜕𝑣

Further, the function ℎ1 (𝑥) = ∫−∞ ℎ(𝑥, 𝑦) 𝑑𝑦 is called marginal density function of 𝑥, and

the function ℎ2 (𝑦) = ∫−∞ ℎ(𝑥, 𝑦) 𝑑𝑥 is called marginal density function of 𝑦. ℎ1 (𝑥) is the
density function of 𝑥 and ℎ2 (𝑦) is the density function of 𝑦.
The variables x and y are said to stochastically independent if ℎ1 (𝑥)ℎ2 (𝑦) = ℎ(𝑥, 𝑦).
If 𝜑(𝑥, 𝑦) is a function of 𝑥 and 𝑦, then the expectation of 𝜑(𝑥, 𝑦) is defined by
∞ ∞
E{𝜑(𝑥, 𝑦)} = ∫−∞ ∫−∞ 𝜑(𝑥, 𝑦) ℎ(𝑥, 𝑦) 𝑑𝑥 𝑑𝑦.
The covariance between 𝑥 and 𝑦 is defined as
𝐶𝑜𝑣(𝑥, 𝑦) = 𝐸{𝑥𝑦} − 𝐸{𝑥} 𝐸{𝑦}.
Conditional Probability:
The idea of conditional probability function of discrete random variables of extended to the
case of continuous random variables.
If 𝑋 and 𝑌 are continuous random variables, then the conditional probability distribution 𝑌
ℎ(𝑥,𝑦)
given 𝑋 is 𝑓(𝑦|𝑥) = ℎ1 (𝑥)

where ℎ(𝑥, 𝑦) is the joint density function of 𝑋 and 𝑌, ℎ1 (𝑥) is the marginal density
function of 𝑋.
𝑃(𝑎 < 𝑋 < 𝑏, 𝑐 < 𝑌 < 𝑑)
𝑃(𝑐 < 𝑌 < 𝑑|𝑎 < 𝑥 < 𝑏) =
𝑃(𝑎 < 𝑥 < 𝑏)

Problem 6: Find the constant ‘𝑘’ so that


𝑘(𝑥 + 1)𝑒 −𝑦 , 0 < 𝑥 < 1, 𝑦 > 0
h(x, y) = {
0, 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒
is a joint probability density function. Are x and y independent?
Solution: Observe that ℎ(𝑥, 𝑦) ≥ 0 for 𝑥, 𝑦, if 𝑘 ≥ 0

Fourth Semester 21 Probability theory and Linear Programming (MA241T)


∞ ∞ ∞ 1
∫ ∫ ℎ(𝑥, 𝑦) 𝑑𝑥 𝑑𝑦 = ∫ ∫ ℎ(𝑥, 𝑦) 𝑑𝑥 𝑑𝑦
−∞ −∞ 𝑦=0 𝑥=0
1 ∞
= 𝑘 {∫ (𝑥 + 1)𝑑𝑥} {∫ 𝑒 −𝑦 𝑑𝑦}
0 0
22 − 12 3
= 𝑘{ } {0 + 1} = 𝑘.
2 2
∞ ∞ 2
Hence ∫−∞ ∫−∞ ℎ(𝑥, 𝑦) 𝑑𝑥 𝑑𝑦 = 1 if k = 3.
2
Therefore, ℎ(𝑥, 𝑦) is a joint probability density function if k = 3.
2
With k = 3, the marginal density functions are

ℎ1 (𝑥) = ∫ ℎ(𝑥, 𝑦) 𝑑𝑦, 0<𝑥<1
−∞

2
= (𝑥 + 1) ∫ 𝑒 −𝑦 𝑑𝑦
3 0
2
= (𝑥 + 1)(0 + 1).
3
2
= (𝑥 + 1), 0 < 𝑥 < 1
3

Next,

ℎ2 (𝑦) = ∫ ℎ(𝑥, 𝑦) 𝑑𝑥, 𝑦 > 0
−∞

2 −𝑦 1 2 −𝑦 22 1
= 𝑒 ∫ (𝑥 + 1) 𝑑𝑥 = 𝑒 { − }
3 0 3 2 2
= 𝑒 −𝑦 , 𝑦 > 0.
Therefore, ℎ1 (𝑥)ℎ2 (𝑦) = ℎ(𝑥, 𝑦) and hence 𝑥 and 𝑦 are stochastically independent.
Problem 7: The life time 𝑥 and brightness 𝑦 of a light bulb are modeled as continuous
random variables with joint density function
ℎ(𝑥, 𝑦) = 𝛼𝛽𝑒 −(𝛼 𝑥+ 𝛽𝑦) , 0 < 𝑥 < ∞, 0 < 𝑦 < ∞.
Where 𝛼 and 𝛽 are appropriate constants. Find (i) the marginal density functions of 𝑥 and 𝑦,
and (ii) the compound cumulative distributive function.
Solution: For the given distribution, the marginal density function of 𝑥 is
∞ ∞
ℎ1 (𝑥) = ∫ ℎ(𝑥, 𝑦) 𝑑𝑦 = ℎ1 (𝑥) = ∫ 𝛼𝛽𝑒 −(𝛼 𝑥+ 𝛽𝑦) 𝑑𝑦
−∞ 0

= 𝛼𝛽 𝑒 −𝛼𝑥 ∫0 𝑒 −𝛽𝑦 𝑑𝑦 = 𝛼 𝑒 −𝛼𝑥 , 0 < 𝑥 < ∞

Fourth Semester 22 Probability theory and Linear Programming (MA241T)


the marginal density function of 𝑦 is

ℎ2 (𝑦) = ∫ ℎ(𝑥, 𝑦) 𝑑𝑥 = 𝛽 𝑒 −𝛽𝑦 , 0 < 𝑦 < ∞.
−∞

Further, the compound cumulative distribution function is


𝑢 𝑣 𝑢 𝑣
F (u, v) = ∫ ∫ ℎ(𝑥, 𝑦) 𝑑𝑦 𝑑𝑥 = ∫ ∫ 𝛼𝛽𝑒 −(𝛼 𝑥+ 𝛽𝑦) 𝑑𝑦 𝑑𝑥
−∞ −∞ 0 0
𝑢 𝑣
= 𝛼𝛽 {∫ 𝑒 −𝛼𝑥 𝑑𝑥} {∫ 𝑒 −𝛽𝑦 𝑑𝑦}
0 0

1 1
= 𝛼𝛽 { (1 − 𝑒 −𝛼 𝑢 )} { (1 − 𝑒 −𝛽 𝑣 )}
𝛼 𝛽
= (1 − 𝑒 −𝛼 𝑢 )(1 − 𝑒 −𝛽 𝑣 ), 0 < 𝑢 < ∞, 0 < 𝑣 < ∞.
Problem 8: The joint probability density function of two random variables 𝑥 and 𝑦 is given
2, 0 < 𝑥 < 𝑦 < 1
by h(x, y) = {
0, 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒
(i) Find the covariance between 𝑥 and 𝑦.
Solution: The marginal density function of 𝑥 is
1

ℎ1 (𝑥) = ∫ ℎ(𝑥, 𝑦) 𝑑𝑦 = {∫𝑥 2 𝑑𝑦 = 2(1 − 𝑥), 0 < 𝑥 < 1
−∞
0 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒
The marginal density function of 𝑦 is
𝑦

∫ 2 𝑑𝑥 = 2𝑦, 0<𝑦<1
ℎ2 (𝑦) = ∫ ℎ(𝑥, 𝑦) 𝑑𝑥 = { 0
−∞
0 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒
∞ 1
E[x] = ∫ 𝑥 ℎ1 (𝑥)𝑑𝑥 = ∫ 𝑥{2 (1 − 𝑥)} 𝑑𝑥
−∞ 0
1
1 1 1
= 2 ∫ (𝑥 − 𝑥 2 ) 𝑑𝑥 = 2 ( − ) = ,
0 2 3 3
∞ 1
2
E[y] = ∫ 𝑦 ℎ2 (𝑦)𝑑𝑦 = ∫ 𝑦(2𝑦)𝑑𝑦 = ,
−∞ 0 3
∞ ∞ 1 𝑦 1 1
E[xy] = ∫−∞ ∫−∞ 𝑥𝑦 ℎ(𝑥, 𝑦) 𝑑𝑥 𝑑𝑦 = ∫0 2𝑦 {∫0 𝑥 𝑑𝑥} 𝑑𝑦 = ∫0 𝑦 3 dy = 4 .

Therefore,
1 1 2 1
𝐶𝑜𝑣 (𝑥, 𝑦) = 𝐸[𝑥𝑦] − 𝐸[𝑥]𝐸[𝑦] = − . = .
4 3 3 36

𝑒 −(𝑥+𝑦) , 𝑥 ≥ 0, 𝑦 ≥ 0
Problem 9: Verify that f (x, y) = { is a density function of a joint
0, 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒
probability distribution. Then evaluate the following:

Fourth Semester 23 Probability theory and Linear Programming (MA241T)


1
(i) 𝑃 (2 < 𝑥 < 2, 0 < 𝑦 < 4) (ii) 𝑃(𝑥 < 1) (iii) 𝑃(𝑥 > 𝑦) (iv) 𝑃(𝑥 + 𝑦 ≤ 1),

(v) 𝑃(0 < 𝑥 < 1|𝑦 = 2).


Solution: Given 𝑓(𝑥, 𝑦) ≥ 0

∞ ∞ ∞ ∞ ∞ ∞
𝑓(𝑥, 𝑦) = ∫ ∫ 𝑓(𝑥, 𝑦)𝑑𝑥 𝑑𝑦 = ∫ ∫ 𝑒 −(𝑥+𝑦) 𝑑𝑥 𝑑𝑦 = ∫ 𝑒 −𝑥 𝑑𝑥 ∫ 𝑒 −𝑦 𝑑𝑦
−∞ −∞ −∞ −∞ −∞ 0

= (0 + 1)(0 + 1) = 1.
Therefore, 𝑓(𝑥, 𝑦) is a density function.
1 2 4 2 4
(i) 𝑃 ( < 𝑥 < 2, 0 < 𝑦 < 4) = ∫1/2 ∫0 𝑓(𝑥, 𝑦) 𝑑𝑦 𝑑𝑥 = ∫1/2 ∫0 𝑒 −(𝑥+𝑦) 𝑑𝑦 𝑑𝑥
2
2 4
= ∫1/2 𝑒 −𝑥 𝑑𝑥 ∫0 𝑒 −𝑦 𝑑𝑦 = (𝑒 −1/2 − 𝑒 −2 )(1 − 𝑒 −4 ).

(ii) The marginal density function of 𝑥 is

∞ ∞ ∞
ℎ1 (𝑥) = ∫ 𝑓(𝑥, 𝑦) 𝑑𝑦 = ∫ 𝑒 −(𝑥+𝑦) 𝑑𝑦 = 𝑒 −𝑥 ∫ 𝑒 −𝑦 𝑑𝑦 = 𝑒 −𝑥
−∞ 0 0
1 1 1
Therefore, 𝑃(𝑥 < 1) = ∫0 ℎ1 (𝑥) 𝑑𝑥 = ∫0 𝑒 −𝑥 𝑑𝑥 = 1 − .
𝑒
∞ 𝑦 ∞ 𝑦
(iii) 𝑃(𝑥 ≤ 𝑦) = ∫0 {∫0 𝑓(𝑥, 𝑦) 𝑑𝑥} 𝑑𝑦 = ∫0 {∫0 𝑒 −(𝑥+𝑦) 𝑑𝑥} 𝑑𝑦
∞ 𝑦 ∞
= ∫ 𝑒 −𝑦 (∫ 𝑒 −𝑥 𝑑𝑥) 𝑑𝑦 = ∫ 𝑒 −𝑦 (1 − 𝑒 −𝑦 ) 𝑑𝑦
0 0 0

1 1
= ∫ (𝑒 −𝑦 − 𝑒 −2𝑦 ) 𝑑𝑦 = 1 − =
0 2 2
1 1
Therefore, 𝑃(𝑥 > 𝑦) = 1 − 𝑃(𝑥 ≤ 𝑦) = 1 − = .
2 2

(iv) 𝑃(𝑥 + 𝑦 ≤ 1) = ∬𝐴 𝑓(𝑥, 𝑦)𝑑𝐴

1 1−𝑥 1 1−𝑥
= ∫ ∫ 𝑓(𝑥, 𝑦) 𝑑𝑦 𝑑𝑥 = ∫ {∫ 𝑒 −(𝑥+𝑦) 𝑑𝑦} 𝑑𝑥
𝑥=0 𝑦= 0 0 0
1 1−𝑥 1
= ∫ 𝑒 −𝑥 {∫ 𝑒 −𝑦 𝑑𝑦} 𝑑𝑥 = ∫ 𝑒 −𝑥 {1 − 𝑒 − (1−𝑥) } 𝑑𝑥
0 0 0
1
2
= ∫ (𝑒 −𝑥 − 𝑒 −1 ) 𝑑𝑥 = 1 − .
0 𝑒
𝑃(0 < 𝑥 < 1|𝑦 = 2)
(v) 𝑃(0 < 𝑥 < 1|𝑦 = 2) = 𝑃(𝑦=2)

[putting 𝑦 = 2]

Fourth Semester 24 Probability theory and Linear Programming (MA241T)


1
∫0 𝑒 −(𝑥+2) 𝑑𝑥 1
𝑃(0 < 𝑥 < 1|𝑦 = 2) = ∞ =1− = 0.63
∫0 𝑒 −(𝑥+2) 𝑑𝑥 𝑒

Problems to practice:
1) If the joint probability function for 𝑓(𝑥, 𝑦) is
𝑐(𝑥 2 + 𝑦 2 ), 0 ≤ 𝑥 ≤ 1, 0 ≤ 𝑦 ≤ 1, 𝑐 ≥ 0
f (x, y) = { is a density function of a
0, 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒
joint probability distribution. Then evaluate the following:
(i) the value of the constant 𝑐. (ii) the marginal density functions of 𝑥 and 𝑦.
1 1 1 3 1
(iii) 𝑃 (𝑥 < 2 , 𝑦 > 2) (iv) 𝑃 (4 < 𝑥 < 4) (v) 𝑃 (𝑦 < 2) .

2) For the distribution given by the density function


1
f (x, y) = {96 𝑥𝑦, 0 < 𝑥 < 4, 1 < 𝑦 < 5
0, 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒
evaluate (i) 𝑃(1 < 𝑥 < 2, 2 < 𝑦 < 3), (ii) 𝑃(𝑥 > 3, 𝑦 ≤ 2) (iii) 𝑃(𝑦 ≤ 𝑥), (iv)
(𝑥 + 𝑦 ≤ 3)

3) For the distribution defined by the density function


3𝑥𝑦(𝑥 + 𝑦), 0 ≤ 𝑥 ≤ 1, 0≤𝑦≤1
f (x, y) = {
0, 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒
find the covariance between 𝑥 and 𝑦.

4) For the distribution defined by the density function

1
(6
f (x, y) = {8 − 𝑥 − 𝑦), 0 < 𝑥 < 2, 0<𝑦<4
0, 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒
Evaluate (i) 𝑃(𝑥 < 1, 𝑦 < 3), (ii)𝑃(𝑥 + 𝑦 < 3), (iii) the covariance between 𝑥 and 𝑦
and (iv) 𝑃(𝑥 < 1|𝑦 < 3)
. Video Links:

https://www.youtube.com/watch?v=82Ad1orN-NA
https://www.youtube.com/watch?v=eYthpvmqcf0
https://www.youtube.com/watch?v=L0zWnBrjhng
https://www.youtube.com/watch?v=Om68Hkd7pfw
https://www.youtube.com/watch?v=RYIb1u3C13I

Fourth Semester 25 Probability theory and Linear Programming (MA241T)

You might also like