0% found this document useful (0 votes)
58 views

Sta 2200 Probability & Statistics II (Course Outline With Notes)

Uploaded by

Prissy Makena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views

Sta 2200 Probability & Statistics II (Course Outline With Notes)

Uploaded by

Prissy Makena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 155

STA 2200: PROBABILITY AND STATISTICS II

Course Purpose
At the end of the course, students will be able to handle probability distributions
of both discrete and continuous random variable, in addition perform some simple
hypothesis tests.

Course description
1. Random variables: discrete and continuous, probability mass, density and
distribution functions, expectation, variance, percentiles and mode. Moments
and moment generating function. Moment generating function and transfor-
mation of variable technique for univariate distribution.

2. Probability distributions: Binomial, Poisson, Hyper-geometric, Geometric,


Negative binomial, Beta and Gamma, Normal,

3. Statistical inference including one sample normal and t-tests

Prerequisite: STA 2100 Probability and Statistics I

Learning outcomes
At the end of the course the student should be able to:

1. Define a probability generating function,

2. Derive a moment generating function, a cumulant generating function and


cumulants

3. Derive them in simple cases, and use them to evaluate moments.

4. Define basic discrete and continuous distributions, be able to apply them and
simulate them in simple cases.

5. Delineate an introduction to statistical inference.

ii
Instruction methodology
• Online Tutorials

• Case studies

• Self Reading

• Discussions

Core References:
1. Uppal, S. M. , Odhiambo, R. O. & Humphreys, H. M. Introduction to Proba-
bility and Statistics. JKUAT Press, 2005, ISBN 9966923950

2. I Miller & M Miller John E Freund’s Mathematical Statistics with Applica-


tions, 7th ed., Pearsons Education, Prentice Hall, New Jersey, 2003 ISBN-10:
0131427067

3. H J Larson Introduction to Probability Theory and Statistical Inference. 3rd


ed., Wiley, 1982, ISBN-13: 978-0471059097

Additional Reference:
• RV Hogg, JW McKean & AT Craig Introduction to Mathematical Statistics,
6th ed., Prentice Hall, 2003 ISBN 0-13-177698-3

• J Crawshaw & J Chambers A Concise Course in A-Level statistics, with


worked examples, 3rd ed. Stanley Thornes, 1994 ISBN 0-534- 42362-0

Assessment information
The module will be assessed as follows;

• 10% of marks from two (2) assignments to be submitted online

• 20% of marks from one written CAT to be administered at JKUAT main cam-
pus or one of the approved centres

• 70% of marks from written Examination to be administered at JKUAT main


campus or one of the approved centres

iii
Contents

1 Gentle Introduction 1
1.1 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Definitions and Motivational examples . . . . . . . . . . . . . . . . 2

2 Random Variables Continued 9


2.1 Continuous Random Variable . . . . . . . . . . . . . . . . . . . . . 10
2.2 Probability Density Function (PDF) . . . . . . . . . . . . . . . . . 10
2.3 Probability Distribution of a Random Variable . . . . . . . . . . . . 13
2.4 Derived Random Variables . . . . . . . . . . . . . . . . . . . . . . 20

3 Measures of Central Tendency 1


3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3.2 Median . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3.3 Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.4 Expectation of a Random Variable . . . . . . . . . . . . . . . . . . 11
3.4.1 Properties of Expectation . . . . . . . . . . . . . . . . . . . 12

4 Variance of a Random variable, Moments and Moment Generating


Functions (MGF) 15
4.1 Variance of a Random Variable . . . . . . . . . . . . . . . . . . . . 16
4.2 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.3 Relationship between Raw and Central moments . . . . . . . . . . . 24
4.4 Factorial Moments . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.5 Moment Generating Functions (MGF) . . . . . . . . . . . . . . . . 25
4.6 Derivation of Moments from MGF . . . . . . . . . . . . . . . . . . 26
4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

iv
CONTENTS CONTENTS

5 Theoretical Probability Distributions 30


5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.1.1 Discrete probability distributions . . . . . . . . . . . . . . . 31
• Bernoulli Distribution . . . . . . . . . . . . . . . 31
• Binomial Probability Distribution . . . . . . . . . 33
• Negative Binomial Distribution: . . . . . . . . . . 36
• Geometric Distribution . . . . . . . . . . . . . . . 39
5.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

6 Discrete Probability Distributions cont... 43


6.1 Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 44
6.1.1 Mean and variance of the Poisson distribution . . . . . . . . 46
6.1.2 The MGF of a Poisson Distribution . . . . . . . . . . . . . 47
6.1.3 Poisson approximation to the Binomial distribution: . . . . . 49
6.2 Hypergeometric Distribution . . . . . . . . . . . . . . . . . . . . . 52
6.2.1 Mean and Variance of Hypergeometric distribution . . . . . 53
6.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

7 Continuous Probability Distributions 59


7.1 Uniform (Rectangular) Distribution . . . . . . . . . . . . . . . . . 60
7.1.1 The mg f of random variable uniformly distributed . . . . . 62
7.2 The Gamma, Chi-square, Exponential and Beta distributions . . . . 63
7.2.1 The Gamma and Beta Distributions . . . . . . . . . . . . . 63
7.2.2 mg f of a Gamma Distribution . . . . . . . . . . . . . . . . 67
7.2.3 Chi-Square distribution . . . . . . . . . . . . . . . . . . . . 69
7.2.4 Exponential distribution . . . . . . . . . . . . . . . . . . . 69
7.2.5 Beta Distribution . . . . . . . . . . . . . . . . . . . . . . . 70
7.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
7.4 Revision questions . . . . . . . . . . . . . . . . . . . . . . . . . . 74

8 Change of Variable and Distribution function Techniques 75


8.1 Distribution Functions of Random Variables . . . . . . . . . . . . . 76
8.1.1 Functions of one random variable. . . . . . . . . . . . . . . 76
8.2 Distribution Function Technique . . . . . . . . . . . . . . . . . . . 76
8.3 Change Of Variable Technique . . . . . . . . . . . . . . . . . . . . 79

v
CONTENTS CONTENTS

8.3.1 Generalization for an Increasing Function . . . . . . . . . . 80


8.3.2 Generalization for a Decreasing Function . . . . . . . . . . 81
8.4 Learning Activities . . . . . . . . . . . . . . . . . . . . . . . . . . 84

9 Normal Distribution 85
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
9.2 Use of Standard Normal Tables . . . . . . . . . . . . . . . . . . . . 88
9.3 Properties of the Normal distribution . . . . . . . . . . . . . . . . . 89
9.4 The mg f of a Normal distribution . . . . . . . . . . . . . . . . . . 90
9.5 Normal Approximation to the Binomial . . . . . . . . . . . . . . . 93
9.6 Learning Activities . . . . . . . . . . . . . . . . . . . . . . . . . . 98

10 Statistical Inference 99
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
10.2 Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . 101
10.3 Steps for classical hypothesis test . . . . . . . . . . . . . . . . . . 102
10.4 Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . 104
10.5 Estimation of µ and σ based on a sample of size n . . . . . . . . . . 105
10.6 Students t-Distribution . . . . . . . . . . . . . . . . . . . . . . . . 106
10.7 Properties of the t-distribution . . . . . . . . . . . . . . . . . . . . 106
10.8 Hypothesis test (t-Test) . . . . . . . . . . . . . . . . . . . . . . . . 107
10.9 Revision Questions . . . . . . . . . . . . . . . . . . . . . . . . . . 110
Solutions to Exercises . . . . . . . . . . . . . . . . . . . . . . . . . 111

vi
STA 2200 Probability and Statistics II

LESSON 1
Gentle Introduction

Learning outcomes
Upon completion of this section, you should be able to:

• Be familiar with some of the terminologies that shall be used in this course

• Define a random variable-both discrete and continuous

• Proof that a function is a probability distribution

• Appreciate the application of the set theory, permutation and combination


skills covered in STA 2100

1
STA 2200 Probability and Statistics II

1.1. Random Variables


1.2. Introduction
In application of probability, we are often interested in a number associated with
the outcome of a random experiment. Such as a quantity whose value is determined
by the outcome of the random experiment is called a random variable.

1.3. Definitions and Motivational examples


A random variable is a function whose range is finite or continuously infinite. Simi-
larly a random variable is any quantity or attribute whose value varies from one unit
of a population to another, or A random variable X is a rule that assigns a numerical
value to each outcome in the sample space of an experiment.
Suppose we let X have the following properties;

• It is a discrete variable

• It can take or assume values like x1 , x2 , ....xn only

• The probability associated with these values are P(X = x1 ) = p1 , P(X = x1 ) =


p1 , P(X = x2 ) = p2 ...P(X = xn ) = pn

• Therefore X is a discrete random variable if ∑ pi = 1,i = 1, 2, 3...n (that is


sum of the probabilities is 1)

We recall from Probability and Statistics I that a variable X was said to be discrete
wherever it takes;
Either;

• A finite set of distinct values, for instance the set of single digit whole num-
bers S = 0, 1, . . . .9

OR

• A countable infinite set of values, for instance the set of natural numbers
N = 1, 2, . . . ..

Therefore, a discrete random variable is a quantity which may take up any value
within a discrete set of numbers having a given probability which may vary accord-
ing to the value of the variable. To specify the probability distribution of a discrete

2
STA 2200 Probability and Statistics II

random variable, we need to know both the set of values of the random variable and
the probability associated with each of these values. The probability distribution of
a discrete random variable say(Y ) may be specified using a table or a formula.

NOTE: Random variables are denoted by capital letters while the particular (spec-
ified) values are denoted by lowercase letters.

Probability Distribution

For a discrete random variable, a probability distribution is a listing of all possible


outcomes of a random variable, say X, and all the associated probabilities. There
are two conditions necessary for a probability distribution;

1. 0 ≤ P(X) ≤ 1

2. ∑ P(X) = 1
all x

To specify a probability distribution, we can use either a

1. A table

2. Formula

Example . The table below illustrate how we can specify a probability distribu-
tion
y 1 2 3 4
P(Y=y) 0.2 0.5 02 01
From this table is the probability that e.g. P(Y = 2) = P(2) = P2 = 0.5
On the other hand, we can define a probability distribution using a formula as fol-
lows:
P(X = x) = p(x) = 10 x 10−x , for x = 0, 1, 2, ..10

x (0.6) (0.4)
We recall that for a random variable X with particular values of x that the vari-
able may take (the variate), and with associated probability P(X = x) = p(x) and
∑ P(X) = 1
all x

Example . A balanced coin is tossed 4 times. Show that the number of tails
realized is a random variable.
Solution

3
STA 2200 Probability and Statistics II

Let X be the number of tails realized. This implies that X takes the values 0, 1, 2, 3, 4.
The number of possible values/outcomes is 2n = 24 = 16. This is because there
are two possible outcomes in any single trial and the experiment is carried out four
times, hence the power of the number of outcomes to obtain the total sample space.
Consider the following table illustrating the possible outcomes in the four tosses
of a coin. A single toss of a coin will give H,T with two tosses of a coin leads to
HH,HT,TH and TT. These are the two different tosses of the coin twice that we are
using to illustrate the four tosses of a coin as shown in the table (indicated in bold).
First two tosses
HH HT TH TT
HH HHHH HHHT HHTH HHTT
Next two tosses HT HHHT HTHT HTTH HTTT
TH THHH THHT THTH THTT
TT TTHH TTHT TTTH TTTT
Therefore, suppose we denote Zero tails with the subscript 0, one tail with subscript
1 and so on, then the probabilities will be denoted as
P0 = 1/16, P1 = 4/16, P2 = 6/16, P3 = 4/16, P4 = 1/16
Hence, looking at these probabilities, ∑ Pi = 1 implying that X is indeed a random
all i
variable.

Example . Let X be a discrete random variable representing the number of fours


observed when two dice are rolled together once. Show that X is a random variable.

Solution

Consider the following table illustrating the outcome of two dice rolled ones/together.
In each cell, we have for instance 3,2 (indicated in bold) meaning we get a 3 in Die
1 and a 2 in Die 2

4
STA 2200 Probability and Statistics II

Die 2
1 2 3 4 5 6
1 1,1 1,2 1,3 1,4 1,5 1,6
2 2,1 2,2 2,3 2,4 2,5 2,6
Die 1 3 3,1 3,2 3,3 3,4 3,5 3,6
4 4,1 4,2 4,3 4,4 4,5 4,6
5 5,1 5,2 5,3 5,4 5,5 5,6
6 6,1 6,2 6,3 6,4 6,5 6,6

Since the random variable X represents the number of fours, it implies that from
this table, we can get ;

• No fours (for instance where we have 1,1; 1,2; 3,2 e.t.c)

• One fours (for instance where we have 2,4;4,3;4,5; e.t.c)

• Two fours (4,4 and its only once)

These three scenarios can be represented by X = {0, 1, 2} respectively


Hence, P0 = 25/36 P1 = 10/36 P2 = 1/36
Using the conditions of a probability distribution, ∑ P(X = x) = 1, we observe that
all x
P0 + P1 + P2 = 25/36 + 10/36 + 1/36 = 1
Thus, X is a random variable and the probability distribution associated with it is

x 0 1 2
25 10 1
P(X=x) 36 36 36

or the same distribution can be represented in a formula as follows

2 1 x 5 2−x
P(X = x) = x (6) (6) , for x = 0, 1, 2

Example . Two tetrahedral dice are rolled and the sum of the scores facing up
noted. Find the Probability Mass Function (pm f ) of the random variable, the sum
facing up.
Solution
Before solving this problem, let us first define a pm f .

Definition

5
STA 2200 Probability and Statistics II

A PMF is a function that is responsible for allocating probabilities e.g. P(X = x)


as shown in the previous example is known as the pm f of the random variable X.
For continuous random variables, we refer to this function as probability density
function (pd f ).
Back to our problem of the two tetrahedral dice rolled together.
Let X be the random variable-the sum of facing up. This can be illustrated by the
results obtained in the following table. Note that a tetrahedral die has 4 sides, and
the numbers on the die are indicated in bold. Thus;

1 2 3 4
1 2 3 4 5
2 3 4 5 6
3 4 5 6 7
4 5 6 7 8

From the table, we see that the sums can be {2, 3, 4, 5, 6, 7, 8}, therefore the random
variable X = {2, 3, 4, 5, 6, 7, 8}
The pm f is therefore given by

x 2 3 4 5 6 7 8
1 2 3 4 3 2 1
P(X = x) 16 16 16 16 16 16 16

This can also be written as a function to get the pm f as follows;

x−1
P(X = x) = , f or x = 2, 3, 4, 5
16

9−x
P(X = x) = , f or x = 6, 7, 8
16
or simple as

 x−1 , f or x = 2, 3, 4, 5
16
P(X = x) =
 9−x , f or x = 6, 7, 8
16

Example . The PMF of a discrete random variable is given by P(X = x) = Cx2


for X = 0, 1, 2, 3, 4, where C is a constant? Find the value of C.
Solution:

6
STA 2200 Probability and Statistics II

∑ P(X = x) = 1,
all x
This implies that 0 +C + 4C + 9C + 16C = 1
So, how do we get these values above?
Since P(X = x) = Cx2 , it implies that
when X = 0, P(X = 0) = C ∗ 02 = 0.
Similarly, when X = 1, P(X = 1) = C ∗ 12 = C and so on till we reach
X = 4, P(X = 4) = C ∗ 42 = 16C
1
Therefore, 30C = 1 =⇒ C = 30

Learning Activities
Given a fair coin, perform an experiment and then define a variable from the results
of your experiment. Show that the variable defined is a random variable. If the
variable is a random variable, then find the pm f of the random variable. Carry out
a similar exercise described in part (1) above with an unfair coin. Carry out the
exercise in part (1) using a six sided die and a four sided die

Summary
In this lesson, we have focused mainly on the definitions of the random variables-
both discrete and continuous. We have also tried to show what a probability dis-
tribution is and conditions necessary for a random variable to have a probability
distribution. The lesson has mainly focused on the discrete random variables.

Revision Questions
List the following sets; N denotes the set of natural numbers while Z denotes the
set of integers.
E XERCISE 1.  A discrete random variable has a PMF shown in the following
table;
w -3 -2 -1 0 1
P(W = w) 0.1 0.25 0.3 0.15 d
Find; (a) The value of d (b) P(−3 ≤ W ≤ 0) (c) P(W > −1) (d) P(−1 < W < 1)
(e) The mode
E XERCISE 2.  The pm f of a discrete random variable is given by Cx = P(X = x)
for x = 1, 2, 3, 4, 5, 6. Find the value of C and hence find P(X < 4) and P(3 ≤ X < 6)

7
STA 2200 Probability and Statistics II

Assignment :
2x
1. Verify that P(X = x) = k(k+1) f or x = 1, 2, 3, .., k can serve as a PMF of
a random variable X

2. The pm f of a random variable is given by P(X = x) = a(3/4)x for x = 0, 1, 2.


Find the value of the constant a

8
STA 2200 Probability and Statistics II

LESSON 2
Random Variables Continued

Learning outcomes
Upon completing this topic, you should be able to:

• Clearly define the term continuous random variable

• Show that a given function is either a pd f or not

• Define the CDF of a random variable

• Find the pd f of a random variable given a CDF of the random variable

• Find the probability functions of derived random variables

9
STA 2200 Probability and Statistics II

2.1. Continuous Random Variable


A random variable is said to be continuous if it can assume any value in an inter-
val of the real number line. Therefore a continuous random variable assumes an
uncountable numbers of values.

For instance

Suppose an experiment involves observing the arrival of cars at a certain period of


time along a highway on a particular day. Let T denote the time which lapse before
the first arrival. Then T is a continuous random variable which assumes values in
the interval (0, ∞)

2.2. Probability Density Function (PDF)


Let X be a continuous random variable assuming values in the real number line, ℜ,
then f (x) is said to be a pd f of the random variable X if it satisfies the following
condition:

1. f (x) ≥ 0 f or all x
´b
2. P(a ≤ X ≤ b) = a f (x)dx
´∞
3. −∞ f (x)dx = 1

Where f (x) is a function which is continuous and differentiable


Example . Given the above definition of distribution function ,we can derive the
probabilities for a continuous random variable.

3x2 , 0 ≤ x ≤ 1
Suppose f (x) = , then
0 elsewhare

1. p(x ≤ a) = p(x ≥ a)

10
STA 2200 Probability and Statistics II

ˆ
´a 1
= 0 f (x)dx = f (x)dx
a
ˆ
´a 1
= 0 3x2 dx = 3x2 dx
0
3 1
= x3 |a0 = x |a
= a3 − 0 = 1 − a3
= 2a3 = 1

1
= a = 0.5 3

2. p(x > b) = 0.5

ˆ 1
= f (x)dx = 0.5
b
ˆ 1
= = 3x2 dx = 0.5
b
3 1
= = x |b = 0.5
= b3 =⇒ = 1 − b3 = 0.5
= b3 = 0.95

1
= b = 0.95 3
= 0.983

Example . Let X be a continuous random variable. Show that the function



1x , 0≤x≤2
f (x) = 2
0 elsewhare

is a pd f and hence find P( 12 ≤ x ≤ 1) and P(−1 ≤ x ≤ 1)

11
STA 2200 Probability and Statistics II

Example . Let X be a continuous random variable with pd f



x +k , 0≤x≤3
f (x) = 5
0 elsewhare

Find the value of k, and hence compute P(1 ≤ x ≤ 2)


Solution
´3
f (x)dx = 1
´03 x x2 3
0 5 + k = 1=⇒ ( 10 + xk)|0 = 1
Substituting the values in the intervals and evaluating the integral, we have
9 1 1
10 + 3k = 1 =⇒ 3k = 10 =⇒ k = 30

12
STA 2200 Probability and Statistics II

Next we obtain
´2 1 x2 1
P(1 ≤ x ≤ 2) = 1 ( 5x + 30 )dx = [ 10 x 2
+ 30 ]1 = 3

Example . Determine whether the following function f (x) can represent a pd f .





 x2 , f or 0 ≤ x ≤ 1


 1 (2 − x) , f or 1 < x ≤ 2

f (x) = 3


 x−2 , f or 2 < x ≤ 3



0 , elsewhere

Hence, f (x) can represent a pd f

2.3. Probability Distribution of a Random Variable


For every random variable X, we associate a function called the cumulative distri-
bution function (cd f ) of X.

Definition

13
STA 2200 Probability and Statistics II

The cd f of a random variable denoted by F(x) is defined by


F(x) = P(X ≤ x) f or all values o f x.
(a) If X is a discrete random variable with pm f , f (x), then

F(x) = ∑ f (t)
t<x

Note here that t is introduced to facilitate summation.

(b) If X is a continuous variable with pd f f (x), then


ˆ x
F(x) = f (t)dt
−∞

Again t is introduced as a variable of integration.

Example . Let X be a discrete random variable with pm f f (x)



 1 (x + 1) x = 1, 2, 3, 4, 5
f (x) = 20
0 elsewhare

• Sketch the graph of f(x).

• Determine the CDF of X and sketch its graph.

Solution
(a) First, we need to take note that this function is for a discrete random variable
that can be computed as follows

The above function can also be represented with the following sketch, as you only
need two points to sketch a linear equation.

14
STA 2200 Probability and Statistics II

x
(b) F(x) = P(X = x) = ∑ f (t)for any real number x
t=1
Since X is a real number it can be in any of the following mutually exclusive inter-
vals
−∞ ≤ x < 1; 1 ≤ x < 2; 2 ≤ x < 3; 3 ≤ x < 4; 4 ≤ x < 5 and 5 ≤ x < ∞,
 definition of f (x), we obtain
Using the


 0 , <1


1
1≤x<2

10 ,





1 ,

2≤x<3
F(x) = 4
9


 20 , 3≤x<4


7
4≤x<5

10 ,





≥5

1 ,
From the above cd f , the resultant sketch is as shown below

15
STA 2200 Probability and Statistics II

From the above sketch, we note the following;

• F(x) is a step function which assumes a certain value in every interval con-
taining 1,2,3,4,5 (see the bold lines)

• F(x) is everywhere continuous the right of any point - take note that the line
showing the jump from say at x = 2, is doted.

Example . Let X be a continuous random variable with pd f



x , 0≤x≤0
f (x) = 2
0 elsewhare
Obtain the CDF of X
solution
ˆ x
F(x) = f (t)dt
−∞

´x
= 0.5 0 t dt
1 2 x
= 4 t |0
x2
= 4
x2
= F(x) = 4 ,0 < x < 2
T hus 


0 , x<0

F(x) = x2
4 , 0≤x<2


x≥2

1 ,

Once again remember, t has been introduced here to facilitate integration.


The resulting distribution function is a quadratic function, hence its sketch is as
follows

16
STA 2200 Probability and Statistics II

Example . A cumulative distribution function is given by



1 − e−2x ,x > 0
F(x) =
0 elsewhare

(a) Derive the pdf of the CDF


(b) Show that the the derived function is actually a pd f
Solution
By definition
ˆ µ
F(µ) = f (x)dx
−∞

i.e. CDF is obtained by integrating the pd f


Then given a CDF we derive the pd f by differentiating F(x) w.r.t X

d
f (x) = F(X)
dx
D
= [1 − e−2x ]
dx
= 2e−2x 
2e−2x ,x > 0
= f (x) =
0 , elsewhere

Now

17
STA 2200 Probability and Statistics II

f (x) ≥ 0, e∼
= 2.71

ˆ ∞
2e−2x dx = −e−2x |∞
0
0
= −[e−∞ − 1] = −[0 − 1]
= 1

18
STA 2200 Probability and Statistics II

Example . Given the following pd f of the random variable X, determine the cd f


, F(x)



 x2 , f or 0 ≤ x ≤ 1


 1 (2 − x) , f or 1 < x ≤ 2

f (x) = 3


 x−2 , f or 2 < x ≤ 3



0 , elsewhere

Solution
Solution
For

CDF

For

But

For

But

So,

Hence

19
STA 2200 Probability and Statistics II

Example . Consider the following problem

A pdf is given by

• Show that is a pdf


• Evaluate
Solution
• For to be a pdf
a).

b)

From the information provided, clearly


 x

i.e.  1 e

40
dx 
0
40

-∞ -0
= [e -e ] =-[0-1] =1

= [0.2865-0.3679]=0.081
In general
i)

i.e.
x x
1 40u
F ( x)   f (u )du   e du 
0 0
40

F 50  1  =0.7135

F 40  1  =0.6321

2.4. Derived Random Variables


Given the pd f of a random variable X, we can obtain the distribution of a second
variable Y provided that we are given some kind of relationship between X and Y .
For example Y = u(x) say Y = 2x + 3,


y = X 2, y = x+c

Consider a one-to-one relationship of X and Y .


For instance,

x , x = 1, 2, 3
6
f (x) =
0 elsewhare

and Y = X 2
The only value of Y with non-zero probabilities are 1, 4, and 9.
Now
P(Y = 4) = P(x2 = 4) = P(x = −2) + P(x = 2) = 0 + 1/3 = 1/3
Similarly,
P(Y = 1) = 1/6 and P(Y = 9) = 1/2

Note

20
STA 2200 Probability and Statistics II

When there is a one-to-one relationship between X and Y , f (x) and g(y) yields
exactly the same probabilities; and only the random variables, and the set of
values it can assume with non-zero probabilities changes.
Example . Consider the following problem and solution

Let the p.m.f of X be given by find the pmf

of
Solution
X Y
0 0
1
2 1
3
4 4

3 1
P( y  0)  P( x  2)  
15 15
P y  1  P x  1 or P x  3  2 4 2
15 15 3
P y  4  P x  0 or P x  4  1  5  2
15 15 5
Therefore, the p.m.f of Y can then be written as
Y 0 1 4
P(Y=y) 1/5 2/5 2/5

Note
In general, if x1 , x2 , x3 ,.......xk all yields the same value Y then
g ( y)  P(Y  y)  P( X  x1 )  P X  x2   ......  P X  xk  that is
in some cases several values of X will give rise to the same value of
Y. The procedure of finding p.m.f of Y is the same as before but it is
necessary to add the several probabilities that are associated with
each value of X that produces the several value of Y.

21
STA 2200 Probability and Statistics II

E XERCISE 3.  Let X be a continuous random variable with pdf





ax ,0 ≤ x ≤ 1


,1 ≤ x ≤ 2

a
f (x) =


−ax + 3a ,2 ≤ x ≤ 3



0 elsewhare
(a) Determine the constant a
(b) Compute probability of p(X ≤ 1.5)

22
STA 2200 Probability and Statistics II

Learning Activities

1. A continuous random variable X has pd f f (x) where





kx , f or 0 ≤ x ≤ 2

f (x) = k(4 − x) , f or 2 < x ≤ 4



0 , elsewhere

Find the value of k and Sketch y = f (x) , hence compute P(1 ≤ x ≤ 3) and P(x ≥ 3)

2. Which of the following functions can represent a pd f



2 − x , 1 ≤ x ≤ 2
(i) f (x) =
0 , elswhere

2 − x , 0 ≤ x ≤ 3
(ii) f (x) =
0 , elsewhere

3. If any of the two functions or both are pd f s, find the cd f

4. Consider the following additional set of problems

23
STA 2200 Probability and Statistics II

Exercise
1. Let the pmf of x be given as follows.

X 1 2 3 4 5 6

P(X=x) 1/6 1/6 1/6 1/6 1/6 1/6

Obtain he pdf of and

2. Let the pmf of x be given by

find the pmf of

3. Suppose has pmf find the pmf

of .

4. Suppose the random variable x has p.m.f given by;

Find the p.m.f of Y if

5. Suppose x has a pmf given by

find the pmf of

Summary
For a continuous random variable it is not plausible to think of a lone variable
assuming a particular value because they are infinitely many values (even in a small
interval) that a random variable can assume. This implies that the probability of

24
STA 2200 Probability and Statistics II

occurrence of any one value will be

Mi
P(X = x j ) = =0

Where Mi is the number of occurrences of value Xi
But the fact this probability is zero does not mean that this occurrence is impossible,
but suggests that we have another way of expressing probability.
This is done by the use of CDF as follows
ˆ µ
F(µ) = f (x)dx
−∞

25
STA 2200 Probability and Statistics II

LESSON 3
Measures of Central Tendency

Learning outcomes
Upon completing this topic, you should be able to:

• Show that a function is a pm f or a pd f

• Compute the different measures of central tendency and position for both
discrete and continuous random variables.

• Use the properties of expectation to compute expectations of derived or func-


tions of other random variables

1
STA 2200 Probability and Statistics II

3.1. Introduction
In this lesson, we are going to learn further how to show that a function is either
a pm f or a pd f . Further, we will be learning how we can compute the median,
mode and mean (expectation) of both discrete and continuous random variables.
Finally, we will be looking at how we can use the properties of mean (expectation)
to simplify the methods of computing these measures.

3.2. Median
The pth quartile of a random variable X (r its corresponding distribution ) denoted
by ε p is defined as the smallest number ε p such that FX (ε p ) ≥ p, 0 < p < 1
The median of a random variable X is denoted by med(x) or ε0.5 is the 0.5th quartile.
On the other hand the median of a random variable X is a value such that p(X ≤
x) ≥ 21 and the pr(X ≥ x) ≥ 12 . If X is a continuous random variable then median
M of X satisfies
ˆ m ˆ ∞
f (x)dx = f (x)dx = 0.5
−∞ m
Let 0<p<1 .A (100p)th percentile (quartile of order p) of the distribution of a random
variable x is a value ε p such that
p(X ≤ ε p ) ≥ p and p(X ≥ ε p ) ≥ p
Example . Find the median and the 25th percentile of the following pd f

3(1 − x)2 , 0 < x < 1
f (x) =
0 , elswhere

Solution

1. Let the median be m


By definition

ˆ m ˆ 1
1
f (x)dx = f (x)dx =
0 m 2
ˆ m
3 (1 − x)2 dx = 0.5
0

2
STA 2200 Probability and Statistics II

let z=(1-x)
dz = −dx

Thus when x = 0, z = 1 and when x = m, z = 1 − m


ˆ 1
−3 z2 = 0.5
1−m
−z3 |1−m
1 = 0.5

−[(1 − m)3 − 1] = 0.5

−(1 − m)3 = −0.5

1
1−m = 0.5 3

1
=⇒ m = 1 − 0.5 3 = 0.21

2. Let p be the 25th percentile

ˆ p
f (x)dx = 0.25
0
ˆ p
3 (1 − x)2 dx = 0.25
0

let (1-x)=z
−dx = dz
dx = −dz

3
STA 2200 Probability and Statistics II

ˆ 1−p
−3 z2 = 0.25
1−m
−z3 |1−p
1 = 0.25

−[(1 − p)3 − 1] = 0.25

−(1 − p)3 = −0.25

1
1− p = 0.25 3

1
=⇒ p = 1 − 0.25 3 = 0.09

Example . Median from CDF


If we have a CDF, f(x), then the median M is the point where f (m) = 0.5, this says
that 81⁄2 of the population have a value less than M.
E.g. if CDF: f (t) = t 2 , 0 ≤ t ≤ 1, find the median of distribution f (m) = 1/2m2 =
1/2 = t 2 = 1/2
m=(1/2)^1/2 =0.707

3.3. Mode
The mode of a distribution of a random variable X is a value x such that it maximizes
the pd f (pm f ). If x is a continuous random variable we often differentiate the pd f
to get the mode
Example . Find the mode of the following distribution

0.5x , x = 1, 2
pX (x) =
0 , elsewhere

Solution
The value x for which px (x)is minimum is 1

Example . Find the mode of the following distribution

4
STA 2200 Probability and Statistics II

0.5x2 e−x ,0 < x < ∞
f (x) =
0 , elsewhere
Solution
1. f (x) = 0.5x2 e−x
vdv/du + udv/dx
at maximum f 0 (x) = 0

0.5[2xe−x − x2 e−x ] = 0
2xe−x = x2 e−x
x=2

2. Next if f 00 (x) < 0at x=2 then 2 is maximum

f 00 (x) = 0.5[2(e−x − xe−x ) − 2xe−x + x2 e−x ]

at x=2 f 00 (x) = f 00 (2)

f 00 (2) = 0.5[2(e−2 − 2e−2 ) − 4e−2 + 4e−2 ]

= 0.5(−2e−2 )

' −0.135 < 0

Implying that 2 is the maximum value which is the mode of the above distri-
bution.

Example . The random variable X has the pd f





 cx ,0 ≤ x ≤ 1

f (x) = c(2 − x) ,1 ≤ x ≤ 2



0 , elsewhere

5
STA 2200 Probability and Statistics II

Determine the value of the constant c, hence find the cd f ,the median and mode of
f (x)
Solution

1. Since f (x) is a pd f

ˆ ∞
f (x)dx = 1
−∞
´1 ´2
0 cxdx + 1 c(2 − 1)dx =1
cx2 1 2cx−cx2 2
2 |0 + [2cx − 2 ]1 =1

c
2 + 4c − 4c c
2 − 2c + 2 =1



x ,0 ≤ x ≤ 1

Therefore f (x) = 2 − x ,1 ≤ x ≤ 2



0 , elsewhere

The cd f
F(x) = 0,x < x =⇒ f (0) = 0
for 0 ≤ x ≤ 1
´ x2
F(x) = xdx = 2 +c ]
1
Since F(0) = 0 =⇒ 2 (0) + c =⇒ c = 0
=⇒ F(0) = 12 x2 , 0 ≤ x ≤ 1
1
=⇒ F(1) = 2

(a) Next for 1 ≤ x ≤ 2


´ 2
F(x) = (2 − x)dx = 2x − x2 + c2
1 2
But F(1) = 2 =⇒ 2(1) − 12 + c2 = 1
2
= −1
=⇒ c2 
 1 x2 0≤x≤1
F(x) = 2
2x − x2 − 1 ,1 ≤ x ≤ 2
2

6
STA 2200 Probability and Statistics II

(b) median
1
F(x) ≥ 2
1
Now F(1) = 2

 3 x2 (4 − x) ,0 ≤ x < 4
Example . The pd f of a random variable is f (x) = 6
0 , elsewhere
‘Determine the mode
Solution
At maximum f 0 (x) = 0

3 3 2
=⇒ 64 2x[4 − x] − 64 x =0
3
=⇒ 64 [8x − 2x2 − x2 ] = 0
3
=⇒ 64 [8x − 3x2 ]
3
=⇒ 64 x[8 − 3x] = 0
x = 0 or x = 83

Next

3 3
f 00 (x) = (−3)x + (8 − 3x)
64 64
3 3
=⇒ 64 (−3)x + 64 (8 − 3x)

9x 3
− 64 + 64 (8 − 3x)

At x = 0, f 000 (0) = 24
64 > 0
=⇒ x = 0 gives minimum
At x = 83 ,

8 −9 3 3
f 00 ( ) = + −
3 24 8 8
−3
8 <0

8
Hence x = 3 gives the mode

7
STA 2200 Probability and Statistics II

Example . A continuous random variable X has the pd f




x2 0≤x≤1


 1 (2 − x) 1 < x ≤ 2

f (x) = 3
x − 2
 2<x≤3




0 elsewhere
Determine the cd f and the median of this distribution
Solution

1. The cd f
F(x) = 0,x < 0
for 0 ≤ x ≤ 1
´ x2
F(x) = x2 dx = 3 + c1
1 3
Since F(0) = 0 =⇒ 2 (0 ) + c1 =⇒ c1 = 0
1
=⇒ F(1) = 3

(a) Next for 1 < x ≤ 2


´ 2
F(x) = 13 (2 − x)dx = 31 (2x − x2 ) + c2
1 1 1 1
But F(1) = 3 =⇒ 3 (1 − 2 ) + c2 = 3
=⇒ c2 = − 16
Next for 2 < x ≤ 3
´ x2
F(x) = (x − 2)dx = 2 − 2x + c3
1 4 1
But F(2) = 2 =⇒ 2 − 4 + c3 = 2
5
=⇒ c3 = 2
x2
∴ F(x) = 2 − 2x + 52 , 2 < x ≤3

x2


 3 0≤x≤1

 2 x − x2 − 1

1<x≤2

F(x) = 32 6 6
x 5
2 − 2x + 2 2<x≤3






0 elsewhere
(b) median

8
STA 2200 Probability and Statistics II

1
F(x) = 2
1
Now F(1) = 3

But F(x) = 31 x3 only when 0 ≤ x ≤ 1


=⇒ x > 1
Try kx ≤ 2
Solve

2 1 2 1 1
3x − 6x − 6 =
2
=⇒ 4x − x2 − 1 = 3
x2 − 4x + 4 = 0
x2 − 2x − 2x + 4 = 0
x(x − 2) − 2(x − 2) = 0
(x − 2)2 = 0
=⇒ x = 2

The median is 2



cx ,0 ≤ x ≤ 1

Example . The random variable X has the pd f f (x) = c(2 − x) , 1 ≤ x ≤ 2 .



0 , elsewhere
Determine the value of the constant c, hence find the cd f , the median and mode of
f (x)
Solution
Since f (x) is a pd f

ˆ ∞
f (x)dx = 1
−∞
´1 ´2
0 cxdx + 1 c(2 − 1)dx =1
cx2 1 2cx−cx2 2
2 |0 + [2cx − 2 ]1 =1

c
2 + 4c − 4c c
2 − 2c + 2 =1

9
STA 2200 Probability and Statistics II



x ,0 ≤ x ≤ 1

Therefore f (x) = 2 − x ,1 ≤ x ≤ 2



0 , elsewhere

The cd f

F(x) = 0,x < x =⇒ f (0) = 0

for 0 ≤ x ≤ 1
´ x2
F(x) = xdx = 2 +c ]

1
Since F(0) = 0 =⇒ 2 (0) + c =⇒ c = 0

=⇒ F(0) = 12 x2 , 0 ≤ x ≤ 1

1
=⇒ F(1) = 2

Next for 1 ≤ x ≤ 2
´ 2
F(x) = (2 − x)dx = 2x − x2 + c2

1 2
But F(1) = 2 =⇒ 2(1) − 12 + c2 = 1
2

=⇒ c2 = −1

 1 x2 0≤x≤1
2
F(x) =
2x − x2 − 1 , 1 ≤ x ≤ 2
2

Median

1
F(x) ≥ 2

1
Now F(1) = 2

10
STA 2200 Probability and Statistics II

3.4. Expectation of a Random Variable


Expectation gives the average value of a random variable and hence is regarded as
the mean of a random variable, say X.

Definition

Let X be a random variable. The expectation (mean) of X denoted by E(X) is given


by

1. E(X) = ∑all x xi P(X = xi ) if X is discrete with values x1 , x2 , . . . xi and corre-


sponding probabilities P(X = xi )
´∞
2. E(X) = −∞ x f (x)dx if X is continuous with probability density function f (x)

Example . What is the expected value (mean) of number of points obtained in a


single throw of an ordinary die?
Solution
Let the random variable X=no of points on a die.We need to get the probability of
X.
xi 1 2 3 4 5 6
i.e
p(X = xi ) 16 16 16 16 16 16
By definition

E(X) = ∑ xP(X = x)
all x
1 1 1 1 1
= (1) + (2) + (3) + (4) + (6)
6 6 6 6 6
7
=
2

OR
n(a + 1)
sn =
2
1
6( + 1)
= 6
2
6
= 1+
2
7
=
2
11
STA 2200 Probability and Statistics II

Example . Let X be a discrete random variable with pm f



x

21 , x = 1, 2, 3, 4, 5, 6
p(X = x) =
0 , elsewhere

Compute E(X)
Solution
Here we need to use the formula or the definition of E(X)

E(X) = ∑all x P(X = x)

The function can also be represented in a table as follows, given that its a discrete
random variable. That is,
1
when x = 1, P(X = x) = 21 . we do the same for the other realized values of
the random variable X to come up with the table below.

xi 1 2 3 4 5 6
1 2 3 4 5 6
P(X = xi ) 21 21 21 21 21 21

Therefore, E(X) can then be computed as follows; E(X) = ∑all x P(X = x)


1 2 3 6
= 1( 21 ) + 2( 21 ) + 3( 21 ) + · · · + 6( 21 )

= 13
3

3.4.1. Properties of Expectation


Let X be a continuous random variable with pd f f (x). Let g(x) = ax+b (A function
of random variable X) where a and b are constants. Then
1. E[g(x)] = aE(x) + b.
Proof
By definition
´∞
E[g(x)] = −∞ g(x) f (x)dx
but g(x) = ax + b
´∞
=⇒ E[g(x)] = −∞ (ax + b) f (x)dx
´∞ ´∞
= a −∞ x f (x)dx + b −∞ f (x)dx
= aE(X) + b

12
STA 2200 Probability and Statistics II

The constant is not affected by expectation since the expected value of a constant is
the constant itself.

1. Let g(x) and h(x) be two functions of X then for any constants a and b

E[g(x) ± bh(x)] = aE[g(x) ± bE[h(x)]

Proof
By definition
´∞
E[g(x) ± bh(x)] = −∞ (ag(x) ± bh(x) f (x)dx
´∞ ´∞
= −∞ (ag(x) f (x)dx ± −∞ (bh(x) f (x)dx
´∞ ´∞
= a −∞ g(x) f (x)dx ± b −∞ h(x) f (x)dx
= aE[g(x)] ± bE[h(x)]

Example . Consider a random variable X with the following pm f



x , x = 1, 2, 3, 4
P(X = x) = 10
0 elsewhere

Compute E(5x3 − 2x2 )


Solution
E(5x3 − 2x2 ) = 5E(x3 ) − 2E(x2 )
But E(X) = ∑all x xP(X = x)
1 2 3 4
5E(X 3 ) = 5[1( 10 ) + 8( 10 )] + 27( 10 ) + 64( 10 )
= 5[0.1 + 1.6 + 8.1 + 25.6]
= 5(35.4)
= 177
Next
1 2 3 4
2E[X 2 ] = 2[1( 10 + 4( 10 ) + 9( 10 ) + 16( 10 )]
= 2[0.1 + 0.8 + 2.7 + 6.4]
= 20
∴ 5E[X 2 ] − 2E[X 2 ] = 177 − 20 = 157
E XERCISE 4.  In a gambling game, an IT expert is paid Kshs. 500 if he gets all
heads or all tails in 4 tosses of a fair coin. He pays out Kshs. 150 if he gets either
1, 2 or 3 heads.Would you advise him to play the game?

13
STA 2200 Probability and Statistics II

Learning Activities
1. Find the value of c, median, mode of the following distribution

12x2 (1 − x) , 0 < x < 1
pX (x) =
0 , elsewhere

(Ans mode = 2/3)

2. Let X have the pd f



 x+2 , −2 < x < 4
f (x) = 18
0 elsewhere

Find E(X)

Ans : E(X) = 2.44

3. Prove the properties of expectation assuming the random variable X is dis-


crete.

Assignments
Identify at least four functions representing both discrete and continuous random
variables, and show that they are either PMF or PDF. For each function, find the
three measures of central tendency where possible.

14
STA 2200 Probability and Statistics II

LESSON 4
Variance of a Random variable, Moments and Moment
Generating Functions (MGF)

Learning outcomes
Upon completing this topic, you should be able to:

• Compute the variance of a random variable either directly or using the prop-
erties of the variance

• Define and compute the central moment, moments about specified point and
finally the factorial moments.

• Derive the Moment generating function of a random variable and use it to


find the measures of central tendency and spread

15
STA 2200 Probability and Statistics II

4.1. Variance of a Random Variable


The variance is a measure of spread/dispersion of a distribution of a random vari-
able, say, X. Suppose we let X be a random variable, and E[x] be the expected value
of X. Again we let E[x]= µx ; the variance of X, denoted by var[X] is defined as;

var(X) = ∑ [x − µx ]2P(X = x)
all x

if X is a discrete random variable and

ˆ ∞
var(X) = [x − µx )]2 f (x)dx
−∞
if X s a continuous random variable.
That is, the expectation of the square of the deviations of the values of X from
its mean.

Note:

var(x) = E[x − E(x)]2


= E[x2 − 2xE(x) + |E(x)2 ]
= E(x)2 − 2[E(x)]2 + E[x]2
= E[x2 ] − (E[x])2

Or using our definition above,

var(x) = E[x − µ]2


= E[x2 − 2xµ + µ 2 ]
= E(x)2 − 2[µ]2 + µ 2
= E[x2 ] − µ 2

Remark 1. Normally E(x) = µ and var(x) = σ 2


p
Remark 2. Standard deviation is given by σ = var(x)

16
STA 2200 Probability and Statistics II

Remark 3. The variance measures average dispersion from the mean. If it is small
it means that most of the values of X are concentrated near the mean. If it is large it
means that most values are spread far away from the mean.
Variance is normally calculated as follows

var(x) = σ 2 = E(X)2 − [E(X)]2


= E(X)2 − µ 2

where
E(X)2 = ∑all x x2 P(X = x), if X is a discrete random variable and
´∞
E(X)2 = −∞ x2 f (x)dx, if X is a continuous random variable

Proof. By definition

Var(x) = E(X 2 ) − [E(X)]2


= E(X − µ 2 )
but (X − µ 2 ) = X 2 − 2µX + µ 2
σ 2 = E(X 2 − 2µX + µ 2 )
= E(X)2 − 2µE(X) + µ 2
= E(X)2 − 2µ 2 + µ 2
= E(X)2 − µ 2

Remark 4. Suppose X is a random variable and g(x) is a function of a random


variable X , then the variance of this new function will be

var(g(x)) = E[g(X) − E(g(X)]2


= E[g2 (X) − [E(g(X))]2

Remark 5. In particular if g(x) = aX where a is a constant, then

var(X) = var(aX) = a2 var(X)

17
STA 2200 Probability and Statistics II

Proof. Based on this definition,

var(aX) = E[(aX)2 ] − [E(aX)]2


= a2 E(X) − [aE(X)]2
= a2 [E(X)2 − [E(X)]2 ]
= a2 var(X)

Remark 6. Similarly if g(x) = aX + b where a and b are constants,then var(aX +


b) = a2 var(X)

Proof. Once again considering the definitions above;

var(aX + b) = E[(aX + b)2 ] − [E(aX + b)]2


= E[a2 X 2 + 2aXb + b2 ] − [aE(X) + b]2
= a2 E(X 2 ) + 2abE(X) + b2 ] − [a2 E(X) + 2abE(X) + b]2
= a2 E(X)2 − [aE(X)]2
= a2 E(X)2 − a2 [E(X)]2
= a2 var(X)

Example . Let f (x) be a pd f of a random variable X defined as


 x+3 −3 ≤ x ≤ 3
18
f (x) =
0 elsewhere
Find variance of X
Solution
ˆ ∞
E[X] = x f (x)dx
−∞

ˆ 3
x+3
=⇒ E[X] = x( )dx
−3 18

18
STA 2200 Probability and Statistics II

ˆ 3
1
= (x2 + 3x)dx
18 −3

3 ˆ 3 
1 2
= x dx + 3 x dx
18 −3 −3
1 x3 3 3x2 3
 
= | + |
18 3 −3 2 −3
 
1 27 27
= 9+9+ −
18 2 2
=1

Next

ˆ ∞
E(X 2 ) = x2 f (x)dx
−∞
ˆ 3
x+3
= x2 ( )dx
−3 18

ˆ 3
1
= (x3 + 3x2 )dx
18 −3
ˆ 3 ˆ 3 
1 3
= x dx + 3 x2 dx
18 −3 −3
 4 3 
1 x 3 3x 3
= | + |
18 4 −3 3 −3
 
1 81 81
= + 27 − − 27
18 4 4
1
= (27 + 27)
18
54
= =3
18
∴ var(X) = E(X 2 ) − [E(X)]2
= 3−1 = 2

1. Consider the experiment of tossing 2 dice. Let X1 and X2 denote the outcome
of the first die and second die respectively. If Y is the absolute difference of

19
STA 2200 Probability and Statistics II

X1 − X2 , that is Y = X1 − X2 . Find the variance of Y

SOLUTION

X1
1 2 3 4 5 6
1 0 1 2 3 4 5
2 1 0 1 2 3 4
3 2 1 0 1 2 3
X2
4 3 2 1 0 1 2
5 4 3 2 1 0 1
6 5 4 3 2 1 0

The possible values of Y are 0,1,2,3,4,5 as indicated in bold in the table of out-
comes.
Therefore, from this experiment, we obtain the following probability distribution

y 0 1 2 3 4 5
6 10 8 6 4 2
P(Y = y) 36 36 36 36 36 36

Now
var(Y ) = E(Y 2 ) − [E(Y )]2
But

E(Y ) = ∑ yP(Y = y)
all y
6 10 8 6 4 2
= 0( ) + 1( ) + 2( ) + 3( ) + 4( ) + 5( )
36 36 36 36 36 36
1
= (70)
36
= 1.94

Next

20
STA 2200 Probability and Statistics II

E(Y 2 ) = ∑ y2P(Y = y)
all y
6 10 8 6 4 2
= 0( ) + 1( ) + 4( ) + 9( ) + 16( ) + 25( )
36 36 36 36 36 36
210
=
36
35
=
6

So

var(x) = E(Y 2 ) − [E(Y )]2


35
= − (1.94)2
6
= 2.05

Example . A die is tossed once. Let X denote the score, and further let Y =
2X + 3. Find E(X), E(Y ), Var(X), Var(Y )
Solution: This problem can be solved by either calculating the expectation and
variance for X and Y or using the properties shown in the previous sections to find
the expectation and variance of Y.
Therefore, from the experiment

1 , x = 1, 2, 3, 4, 5, 6
6
f (x) =
0 elsewhare

E(X) = ∑ x f (x) = 1( 16 ) + 2( 61 ) + 3( 61 ) + 4( 16 ) + 5( 61 ) + 6( 16 )
=⇒ E(X) = 3.5
E(Y ) = E(2X + 3) = [2(1) + 3]( 61 ) + [2(2) + 3]( 16 ) + [2(3) + 3]( 16 ) + [2(4) + 3]( 16 ) +
[2(5) + 3]( 16 ) + [2(16) + 3]( 16 )
=⇒ E(Y ) = 10
Alternatively, we can use the properties of expectation to find E(Y ) as follows;
E(Y ) = E(2X + 3) = 2E(X) + 3 = 2(3.5) + 3 = 10
Furthermore, var(X) = E(X 2 ) − [E(X)]2
Solving the expression

21
STA 2200 Probability and Statistics II

var(X) = 12 ( 61 ) + 22 ( 16 ) + 32 ( 16 ) + 42 ( 16 ) + 52 ( 16 ) + 62 ( 16 )-3.52
var(X) = 15.17 − 3.52 = 2.92
Finally, var(Y ) = var(2X + 3) = 22 var(X) = 4(2.92) = 11.68 

4.2. Moments
In addition to expectation and variance of a random variable, we can compute ex-
pectations of higher power of a random variable with respect to a given distribution
.
These expectations are useful in determining various characteristics of the corre-
sponding distributions

Definition

If X is a random variable, then the rth moment of X denoted by µr0 is defined as


µr0 = E(X r ) if the expectation exists where r = 1, 2, ..
These higher order moments are computed about zero origin
That is E[(X r )] = E[(X − 0)r ]
Therefore they are referred to as raw moments about the origin or uncorrected mo-
ments
This means that moments can also be computed about another value e.g about the
mean of the probability distribution.
In this case they are referred to as central moments.

Definition

If X is a random variable the rth central moment about the value µ is defined as
µr = E[(X − µ)r ] (corrected moment =mean)
Remark 7. The first raw moment is the expectation of random variable X
That is, E(X) = 1st moment ≈ the origin
0 0
So, if r = 1 µ1 = E(X ) = mean of X
Remark 8. If a = µ then the 1st central moment is zero.
That is;

µ1 = E[(X − µ)]

22
STA 2200 Probability and Statistics II

= E(X) − µ
= µ −µ
=0

Remark 9. If E(X) = µ,the second central moment is always the variance


Thus;

µ2 = E[(x − µ)2 ]
= E(x2 ) − 2µE(x) + µ 2
= E(x2 )2µ 2 + µ 2
= E(x2 ) − µ 2
= E(x2 ) − [E(x)]2
= var(x)

Remark 10. Let X be a discrete random variable with pm f P(X = x).Then


• The rth raw moment (moment about the origin) is defined as

µr0 = E(xr )
= ∑ xr P(X = x)
all x

• The rth central moment (moment about the mean) is defied as

µr0 = E[(x − µ)r ]


= ∑ (x − µ)r P(X = x)
all x

Remark 11. Let X be a continuous random variable with pd f f (x).Then


• The rth raw moment (moment about the origin) is defined as

µr0 = E(xr )
ˆ ∞
= xr f (x)dx
−∞

23
STA 2200 Probability and Statistics II

• The rth central moment (moment about the mean) is defined as

µr0 = E[(x − µ)r ]


ˆ ∞
= (x − µ)r f (x)dx
−∞

4.3. Relationship between Raw and Central moments


By definition

µ2 = E[(x − x)2 ]
W here x = E(x) = µ10
= E[(x − µ10 )2 ]
but (x − µ10 )2 = x2 − 2µ10 x + (µ10 )2
µ2 = E x2 − 2µ10 x + (µ10 )2


= E(x2 ) − 2µ10 E(x) + (µ10 )2


= µ20 − 2(µ10 )(µ10 ) + (µ10 )2
= µ20 − 2(µ10 )2 + (µ10 )2
= µ20 − (µ10 )2

Again the third central moment

24
STA 2200 Probability and Statistics II

µ3 = E[(x − x)3 ]
W here x = E(x) = µ10
= E[(x − µ10 )3 ]
but (x − µ10 )3 = x3 − 3µ10 x2 + 3x(µ10 )2 − (µ10 )3
µ3 = E x3 − 3µ10 x2 + 3x(µ10 )2 − (µ10 )3


= E(x3 ) − 3µ10 E(x2 ) + 3(µ10 )2 E(x) − (µ10 )3


= µ30 − 3µ10 µ20 + 3(µ10 )2 µ10 − (µ10 )3
= µ30 − 3µ10 µ20 + 3(µ10 )3 − (µ10 )3
µ30 − 3µ10 µ20 + 2(µ10 )3

4.4. Factorial Moments


Definition

If X is a random variable, the rth factorial moment of X is defined as


µr = E [X(X − 1)(X − 2) . . . (X − r + 1)]
For discrete random variables, the factorial moments are easy to compute from the
raw moments.
Note that the 1st factorial moment is always the mean E(X)
E XERCISE 5.  A die is thrown once .The possible outcomes are 1, 2, 3, 4, 5, 6
which are at unit intervals. Find the 1st , 2nd and 3rd factorial moments

4.5. Moment Generating Functions (MGF)


Computing all moments for a probability distribution function is normally a tedious
task. It is therefore desirable to have a function from which all the moments can be
derived at will. Such a function is called a moment generating function (mg f ).
When this function exists for a particular distribution it is unique, a property which
enables one to determine the mg f

Definition

Let X be a random variable. The mg f denoted by µx (t) is defined as mx (t) = E(etX )


where t is a small number such that −h < t < h and h > 0

25
STA 2200 Probability and Statistics II

If X is a discrete random variable with pm f P(X = x), then

mx (t) = E(etX )
= ∑ etx P(X = x).........(1)
all x

Similarly if X is a continuous random variable with pd f f (x), then

mx (t) = E(etX )
ˆ ∞
= etx f (x)dx.............(2)
−∞

Note that the mg f in (1) and (2) exists if ∑ is finite and if the integral is finite
respectively.

4.6. Derivation of Moments from MGF


We normally make use of mg f to derive higher moments of a given random variable
. Suppose we let X be a discrete random variable .Then

mx (t) = E(etX )
= ∑ etx P(X = x)
all x

By the Taylor series expansion etx can be expressed as

t 2 x2 t 3 x3
etx = 1 + tx + + +...
2! 3!
=⇒ mx (t) = ∑ etx P(X = x)
all x
t 2 x2 t 3 x3
 
= ∑ 1 + tx + + + . . . P(X = x)
all x 2! 3!
t 2 x2 t 3 x3
= ∑ 1(P(X = x) + ∑ tx(P(X = x) + ∑ (P(X = x) + ∑ (P(X = x) + . . .
all x all x all x 2! all x 3!

Differentiating both sides with respect to t, we then have

26
STA 2200 Probability and Statistics II

2tx2 3t 2 x3 4t 3 x4
m0x (t) = ∑ x(P(X = x) + ∑ 2! (P(X = x) + ∑ 3! (P(X = x) + ∑ 4! (P(X = x) . . .
all x all x all x all x
t 2 x3 t 3 x4
= ∑ x(P(X = x) + ∑ tx2(P(X = x) + ∑ 2! (P(X = x) + ∑ 3! (P(X = x) . . .
all x all x all x all x

Finally, evaluating the differential at t = 0 in (3), we have

(0)2 x3
m0x (0) = 0 + ∑ x(P(X = x) + ∑ (0)x2 (P(X = x) + ∑ (P(X = x).......(3)
all x all x all x 2!
m0x (0) = ∑ x(P(X = x) = E(x) = µ10
all x

Differentiating equation (3) once more with respect to t.

t 2 x3 t 3 x4
m0x (t) = ∑ x(P(X = x) + ∑ tx2 (P(X = x) + ∑ (P(X = x) + ∑ (P(X = x) . . .
all x all x all x 2! all x 3!
2tx3 3t 2 x4 4t 3 x5
m00x (t) = 0 + ∑ x2 (P(X = x) + ∑ (P(X = x) + ∑ (P(X = x) + ∑ (P(X = x) . .
all x all x 2! all x 3! all x 3!
2(0)x3 3(0)2 x4 4(0)3 x5
m00x (0) = 0 + ∑ x2 (P(X = x) + ∑ (P(X = x) + ∑ (P(X = x) + ∑ (P(X = x) .
all x all x 2! all x 3! all x 3!
2
= ∑ x (P(X = x)
all x
= E(X 2 ) = µ20

In general
µr0 = E(X r ) = mrx (t)|t=0

The rth raw moment is derived by differentiating the mg f r times with respect to t
and evaluating at t = 0.

Note:

The results can also be derived for continuous random variables.

27
STA 2200 Probability and Statistics II

t 2 x2 t 3 x3
etx = 1 + tx + + +...
2! ˆ 3!

=⇒ mx (t) = etx f (x)dx
−∞
ˆ ∞
t 2 x2 t 3 x3

= 1 + tx + + + . . . f (x)dx
−∞ 2! 3!
ˆ ∞ ˆ ∞ ˆ ∞ 2 2 ˆ ∞ 3 3
t x t x
= 1 f (x)dx + tx f (x)dx + f (x)dx + f (x)dx + . . .
−∞ −∞ −∞ 2! −∞ 3!
t2 t3 t4
= 1 + tµ10 + µ20 + µ30 + µ40 + . . .
ˆ2! ∞
3! ˆ4! ∞
m0x (t) = 0 + x f (x)dx + tx2 f (x)dx + . . . (1)
−∞ −∞
ˆ ∞
0
mx (0) = x f (x)dx
−∞
0
E(X) = µ1

Differentiating equation (1) once more with respect to t.

ˆ ∞ ˆ ∞
m0x (t) = 0+ x f (x)dx + tx2 f (x)dx + . . .
−∞ −∞
ˆ ˆ ˆ ∞ 2 4 ˆ ∞ 3 5

2tx3 ∞
3t x 4t x
m00x (t) = 0 + 0 + x2 f (x)dx +f (x)dx + f (x)dx + f (x)dx . . .
−∞ −∞ 2! −∞ 3! −∞ 3!
ˆ ∞ ˆ ∞ ˆ ∞ ˆ ∞
00 2 2(0)x3 3(0)2 x4 4(0)3 x5
mx (t) = 0 + x f (x)dx + f (x)dx + f (x)dx + f (x)dx . . .
−∞ −∞ 2! −∞ 3! −∞ 3!
ˆ ∞
m00x (0) = x2 f (x)dx
−∞
= E(X ) = µ20
2

dr
In general the rth raw moment is given by dt r (mx (t)) |t=0
E XERCISE 6.  Let X be a continuous random variable with pd f f (x) given by

λ e−λ x ,x > 0
f (x) =
0 elsewhere
Find its mg f if it exists
Derive the expected value of X and the variance of X from the mg f

28
STA 2200 Probability and Statistics II

Verify the results by computing the above quantities directly from the definition

4.7. Summary
In this section, we have discussed the variance of a random variable and its prop-
erties. We have also introduced the concept of moments and went further to show
how we can compute the central moment, moments about specified point of refer-
ence and finally the factorial moments. We have also shown how we can derive the
mg f of a given function and use it measures of central tendency among other things
for the different distributions.
Learning Activities

• A fair four sided die is thrown once . Find the 1st , 2nd and 3rd factorial
moments

29
STA 2200 Probability and Statistics II

LESSON 5
Theoretical Probability Distributions

Learning outcomes
Upon completing this topic, you should be able to:

• Calculate and interpret the measures of central tendency for grouped data.

• Identify and calculate the measures of variability for grouped data

• Identify and calculate the measures of location for grouped data.

30
STA 2200 Probability and Statistics II

5.1. Introduction
When we talk about theoretical distributions, we are referring to distributions which
are not obtained by actual observations/experiments. They are derived mathemati-
cally on the basis of certain assumptions. These distributions are broadly classified
into two categories
• Discrete probability distributions

• Continuous probability distributions.

5.1.1. Discrete probability distributions


Among the special discrete probability distribution that we shall consider in this
course includes; Bernoulli, Binomial, Poisson, Geometric and Hypergeometric dis-
tribution.

• Bernoulli Distribution
A Bernoulli trial is a random experiment with only two mutually exclusive out-
comes: Success(occurrence of an event) or Failure(Non-occurrence of an event).
for instance, suppose we Toss a coin, we can then get a Head or Tail. Testing a
manufactured item, the item can be either defective or non-defective etc.
Therefore,
Let X be a random variable such that

1 , i f outcome is a success
X=
0 , i f outcome is a f ail

This simple distribution is completely defined, and is characterized by a singe pa-


rameter p where p=Pr(success)

=⇒ Pr(X = 1) = p
P(X = 0) = (1 − p) = q

Therefore a Bernoulli distribution ca be expressed as



 px (1 − p)1−x , x = 0, 1
P(X = x) =
0 , elsewhere

31
STA 2200 Probability and Statistics II

Note: A Bernoulli random variable is therefore defined using only one parameter,
that is p. The random variable X for a Bernoulli distribution takes on only
two possible values at a time, that is 0 or 1.

Moments

The moments of this distribution are very easy to compute, that is

µx0 = E[X r ]
1
= ∑ xr p(X = x)
x=0
= 0 + 1r p1 (1 − p)1−1
= p

The variance of X is given by

var(X) = E(X 2 ) − [E(X)]2


= p − p2
= p(1 − p)
= pq

Moment Generating Function

mx (t) = E[etx ]
1
= ∑ etx p(X = x)
x=0
1
= ∑ etx px (1 − p)1−x
x=0
0
= e (1 − p) + et p
= (1 − p) + pet

which exists for all t.


Example . A random variable whose value represents the outcome of a coin toss
(1 for heads, 0 for tails, or vice-versa) is a Bernoulli variable with parameter p,

32
STA 2200 Probability and Statistics II

where p is the probability that the outcome corresponding to the value 1 occurs.
For an unbiased coin, where heads or tails are equally likely to occur, p = 0.5.

• Binomial Probability Distribution


This distribution is an extension of a Bernoulli distribution. In this case we have
more than one trial as opposed to just a single trial with a Bernoulli distribution.

Characteristics of A Binomial Experiment


1. It consists of n identical and independent trials

2. There are only two possible outcomes on each trial i.e Success or Failure.

3. The probability of an outcome is the same from trial to trial.

4. The random (say X) variable is the number of favorable outcomes.

Definition

Let a trial result in success with a constant probability p and in failure with prob-
ability (1 − p) = q. Then the probability of X successes in n independent trials is
given by
 
 n px qn−x , x = 0, 1, 2, . . . , n
x
P(X = x) =
0 , elsewhere
which is a Binomial distribution with parameters n and p and is denoted as X ∼
Bin(n, p)

Note: The Binomial distribution is characterized by two parameters (n, p) as can


be seen from the denotation above.

Example . It is expected that 10% of production from a continuous process will


be defective. Find the probability that in a sample of 10 units chosen at random;
exactly 2 will be defective
At least 2 will be defective.
Solution:
Before solving this problem, we need to acknowledge that this is a Binomial prob-
lem based on the properties indicated above. For instance, we have p = 0.1, n = 10
and the events are independent based on the information given.

33
STA 2200 Probability and Statistics II

Thus, if we let X represent the number of defectives. The observed number of


defectives can be denoted by x=Defective items
What is required is P(X = 2) and P(X ≥ 2)
Now,
 
n x n−x
P(X = x) = pq
x
 
10
P(X = 2) = (0.1)2 (0.9)10−2
2
10!
= (0.1)2 (0.9)8
8!2!
10 × 9
= (0.1)2 (0.9)8
2
= 0.1937

P(X ≥ 2)
To solve this problem easily, we can use the compliment of the defined event. That
is

p(X ≥ 2) = p(X = 2) + p(X = 3) + · · · + p(X = 10)


= 1 − p(X < 2)
    
10 0 10−0 10 1 10−1
= 1− p q + p q
0 1
    
10 0 10 10 1 9
= 1− (0.1) (0.9) + (0.1) (0.9)
0 1
= 1 − [0.3486 + 0.387]
= 0.264

E XERCISE 7.  Suppose we toss a fair coin ten times. What is the probability of
showing? (a) 5 heads (b) 3 tails (c) At least 3 heads

MGF of a Binomial Distribution


The mg f of a Binomial random variable X with parameters n and p is given by

34
STA 2200 Probability and Statistics II

mx (t) = E(etx )
n  
tx n
= ∑e px qn−x
x=0 x
n  
t x n−x n

= ∑ pe q
x=0 x

But (a + b)n = ∑nx=0 nx ax bn−x , from the binomial expression




Therefore, from this expression if we let pet = a and q = b.


By convention we have
n
 
n x
= ∑ pet qn−x
x=0 x
n
= pet + q

Which is the mg f of a Binomial distribution with parameter n and p.


Using the mg f to obtain the expectation of a binomial random variable.
From the mg f ;

n−1
m0x (t) = n pet + q pet
n−1
= npet pet + q
m0x (0) = E(X)
= np (p + q)n−1
= np1n−1 Since (p + q = 1)
= np

which is the mean of a Binomial distribution with parameter n and p.


Next we obtain Var(X)

35
STA 2200 Probability and Statistics II

Var(X) = E(X 2 ) − [E(X)]2


= E(X 2 ) − n2 p2
but E(X 2 ) = m00x (0)
n−1 n−2
m00x (t) = npet pet + q + n − 1 pet + q npet pet
m00x (0) = np (p + q)n−1 + n − 1 (p + q)n−2 np2
= np + n(n − 1)p2
= np + n2 p2 − p2
∴ Var(X) = E(X 2 ) − n2 p2
= np + n2 p2 − p2 − n2 p2
= np − np2
= np(1 − p)
= npq

• Negative Binomial Distribution:


Consider a sequence of independent repetitions of a random experiment with con-
stant probability of success p. Let k be the number of successes and n − k be the
number of failures in a sample of size n. In this case sampling stops after exactly k
successes and the last draw must be a success. The number of ways this can happen
is n−1

k−1
One of these ways could be pp . . . p}k − 1successes;(1 − p)(1 − p)(1 − p) . . . (1 −
p)p}n − k successes
Therefore the probability of obtaining k − 1 successes in n − 1 trials is
 
n − 1 k−1
P(k) = p (1 − p)n−k p
k−1

 n−1 pk (1 − p)n−k , k > 0, n = k, k + 1, k + 2, . . .
k−1
=
0 , elsewhere

This distribution is what we call a Negative Binomial distribution.

36
STA 2200 Probability and Statistics II

Alternatively,
Consider a repetition of Bernoulli trials until a fixed number r of success has oc-
curred and then stops. E.g. we may be interested in the probability that the 10th
computer to crash with a virus will be the 3rd to be infected.
If the rth success is to occur on the xth trial there must be (r − 1) successes on the
(x − 1) trials.
The probability for this is given by:

n−1 r n−r
P(r − 1, x − 1, p) = r−1 p (1 − p)

The probability of a success on the rth trial is p, and the probability that the rth
success occurs is:

n−1 r n−r
P(X = r) = r−1 p (1 − p)

The random variables representing the number of trials required are called a neg-
ative binomial random variable. The random variable may be interpreted as the
waiting time until the occurrence of the r successes.

The MGF of Negative Binomial Distribution


Let X have a negative Binomial distribution, then

mx (t) = E etX

∞  
n − 1 k n−k t(k)
= ∑ pq e
n=k k − 1
where p = q = 1 − p
= pk + pk qet + pk (qet )2 + . . .
= pk 1 + qet + (qet )2 + . . .
 

pk
=
(1 − qet)k
 k
p
=
(1 − qet)

Finding E(X) using the MGF

37
STA 2200 Probability and Statistics II

−k−1
m0x (t)|t=0 pk (−k) 1 − qet −qet

=
−k−1
kpk qet 1 − qet

=
=⇒ E [X] = m0x (t)|t=0
kq
=
p

Finding Var(X) using the MGF

E X 2 − [E (X)]2


 k2 q2
= E X2 − 2
p
00
E X2

= mx (t)|t=0
−k−1 −k−2
m00x (t) = kpk 1 − qet + kpk qt (−k − 1) 1 − qet −qet


m00x (t)|t=0 = kpk (p)−k−1 + kpk qt (k + 1) (p)−k−2 (q)


2
 kq k2 q2 kq2
E X = 2+ 2 + 2
p p p
2
k q 2
Var(X) = E X 2 − 2

p
kq k q 2 2 kq2 k2 q2
= + + − 2
p2 p2 p2 p
kq kq 2
= + 2
p2 p
kq(p + q)
=
p2
kq
=
p2

If k = 1 then the resulting distribution is



 p (1 − p)n−1 , n = 1, 2, 3, . . .
P(k) =
0 , elsewhere

Which is a Geometric distribution.

38
STA 2200 Probability and Statistics II

• Geometric Distribution
Consider an experiment in which Bernoulli trials are performed until the 1st success
occurs. The sample space is given as S = {s, s f , ss f , sss f , ssss f , ......}
Suppose that a sequence of independent Bernoulli trials each with probability of
success (p) are performed ; let X be the number of trials until the 1st success occurs
then X is said to be a discrete random variable with a pm f given by;

 p (1 − p)n−1 , n = 1, 2, 3, . . .
P(k) =
0 , elsewhere

Example . An expert shot hits a target 95% of the time. What in the probability
that he misses the target for the 1st time in the 15th shot?
Solution:
The information given, p = 0.05 and 1 − p = 0.95
Therefore, P(X = 15) = 0.05(0.95)15−1 = 0.0244 

39
STA 2200 Probability and Statistics II

Moment Generating Function of a Geometric distribution


The mg f of this distribution is

E [X] = m0x (t)|t=0


−2
m0x (t) = (−1) p 1 − qet −qet


m0x (t)|t=0 = (−1) p (1 − q)−2 (−q)


−2
= pqet 1 − qet


= p (p)−2 (q)
pq
E [X] =
p2
q
E [X] =
p
E X2 = m00x (t)|t=0

−2 −3
m00x (t) = pqet 1 − qet + (−2) pqet 1 − qet −qet
  
−2 −3
pqet 1 − qet + (2) pqet qet 1 − qet
  
=
m00x (t)|t=0 = (pq) (1 − q)−2 + (2) (pq) (q) (1 − q)−3
(pq) (p)−2 + (2) (pq) (q) (p)−3
(pq) (2) (pq) (q)
= +
(p)2 (p)3
(q) (2) (q) (q)
= +
(p) (p)2
2
 q 2q2
E X = + 2
p p

40
STA 2200 Probability and Statistics II

Now Var(X)

E X 2 − [E (X)]2

 2
q 2q2 q
= + 2 −
p p p
q 2q 2 q 2
= + 2 − 2
p p p
qp + 2q − q2
2
=
p2
qp + q2
=
p2
q (p + q)
=
p2
q
= 2
p

Again, note that p + q = 1


The distribution arises from the following conditions:

1. There is a sequence of trials.

2. There are only two outcomes of which are possible at each trial.

3. Trials are independent.

4. Probability of success is counted at trial.

5. The random variable X is the number of trials needed for the first success to
appear including the successful trial.

E XERCISE 8.  Given that the mg f of a random variable X is (0.2 + 0.8et )4 . Find


P(X = 2)

5.2. Summary
This lesson generally introduces some of the basic and well known discrete prob-
ability distributions. We have also shown some of the properties that one should
look for in order to decide whether a random variable has a binomial distribution.
Other discrete probability distributions are discussed in the subsequent lessons.

41
STA 2200 Probability and Statistics II

Learning Activities
1. The probability that a computer drawn at random from a batch of computers
is defective is 0.1, If a sample of 6 computers is taken, find the probability
that i will contain

(a) No defective computer


(b) 5 or 6 defective computers
(c) More than 2 defective computers
(d) Less than 3 defective computers

2. According to a study carried out by a computer company in Uganda, the


probability that a randomly selected laptop fan will last longer than 1.2 years
is 0.15. What is the probability that out of six randomly selected fans: (a)
Exactly two last longer than 1.2 years (b)None lasts longer than 1.2 years?

3. Explain the meaning of the probability distribution of a discrete random vari-


able. Give an example of such a probability distribution. What are the ways
to present the probability distribution of a discrete random variable?

4. If the probability of having a male or female child are both 0.5. Calculate the
probability that: (a) The family’s 5th child is their 1st son. (b) A family’s 8th
child is their 3rd son. (c) A family’s 6th child is their 2nd or 3rd son.

42
STA 2200 Probability and Statistics II

LESSON 6
Discrete Probability Distributions cont...

Learning outcomes
Upon completing this topic, you should be able to:

• Define the Poisson and Hypergeometric distributions

• Derive the mean and variance of the Poisson and Hypergeometric distribu-
tions, and show the unique property of the Poisson distribution

• Derive the mg f of Poisson distributions and use the mg f to find the expecta-
tion and variance of the distribution and other related aspects

• Use the Poisson distributions as an approximation to the Binomial distribu-


tion

43
STA 2200 Probability and Statistics II

6.1. Poisson Distribution


This is the classical distribution of the count data; for instance the number of
claims/children e.t.c. If a random variable X takes values x = 0, 1, 2 then the typical
distribution is Poisson.
Consider the following random variables

• The number of emergency calls received to an ambulance control in an hour.

• The number of vehicles approaching a motorway toll bridge in a five - minute


intake.

• The number of flaws in a meter length of material

• Particles emitted by a radioactive device in a given time.

• Telephone calls made to a switch box in a given minute.

• Insurance claims made to a company in a given time.

Assuming that each of these occurs randomly, they are all examples of variables that
can be modeled using a Poisson distribution. Thereafter a Poisson distribution has
two potential uses:

1. Considering the distribution of random events

2. As an approximation to a binomial distribution

Considering a Poisson model

1. Events occur singly and at random in a given interval of time or space.

2. The Poisson distribution is characterized by the parameter, λ , the mean num-


ber of occurrences in the given interval which should be known and finite.
The variable X is the number of occurrence in the given interval.

If these conditions are satisfied, X is then said to follow a Poisson distribution,


x −λ
denoted an P(X = x) = λ x!e , f or x = 1, 2, 3, 4...
A Poisson distribution is derived from a Binomial distribution. It is the limiting
form of a Binomial distribution under the following conditions.

1. Number of trials n is large, that is n → ∞

44
STA 2200 Probability and Statistics II

2. Probabilities of success p in each trial is small i.e p → 0

3. np = λ is finite and a positive real number.

Let n=number of trails


m=Average number of successes per a given time span,
Then given a random variable X
n! m x m n−x
 
p(X = x) = (n−x)!x! n 1 − n

m x
n!
n−x
1 − mn

limn→∞ p (X = x) = limn→∞ (n−x)!x! n

n! m xm
n −x
= limn→∞ (n−x)!x! nx 1 − n 1 − mn

mx n!
n −x
= nx limn→∞ (n−x)!x! × limn→∞ 1 − mn 1 − mn

But
n! n(n−1)(n−2)...(n−x+1)
limn→∞ (n−x)!x! = limn→∞ n.n....n

= nn limn→∞ 1 − n1 limn→∞ 1 − 2n . . . limn→∞ 1 − x+1


  
n

= 1×1×1···×1 = 1

n
limn→∞ 1 − mn = e−m

n
since limn→∞ 1 + n1 =e

and limn→∞ 1 + na = ea


−x
and limn→∞ 1 − mn = 1−x = 1

mx
=⇒ limn→∞ p (X = x) = x! (1) e−m

 λ x e−λ , x = 0, 1, 2 . . .
x!
=
0 , elsewhere

Which is the Poisson distribution with parameter λ

45
STA 2200 Probability and Statistics II

6.1.1. Mean and variance of the Poisson distribution


The mean number of occurrences in the interval, X, is all that is needed to define the
distribution completely, and is the only parameter of the distribution. The unique
property of the mean and variance of the Poisson distribution is that they are equal
and is denoted by λ . That is E(X) = λ and Var(X) = λ
Example . In a region, the number of persons to become seriously ill each year
from eating a given kind of poisonous plant is random variable having the Poisson
distribution with λ = 1.6. Find the probability of getting: (a) Two such illnesses in
a given year. (b) At least 7 such illness in 5 years
Solution:
x −λ
P(X = x) = λ x!e , f or x = 1, 2, 3, 4...∞
Therefore,
2 −1.6
(a) P(X = 2) = 1.6 2!e = 0.2584
(b) For a period of 5yrs, we multiply the value by 5 because it’s the average per one
year. Therefore the new value of λ ∗ = λ ∗ 5 = 1.6 ∗ 5 = 8.0
P(X ≥ 7) = 1 − p(X ≤ 6)
∗ e−8
= 1 − ∑6x=0 8 8!
= 1 − 0.3133 = 0.6867 

Diagrams representing a Poisson Distribution

The mode of the Poisson distribution


The mode is the value of X that is most likely to occur, i.e. the one with the greatest
probability.
Diagrams shows that when X = 1,there are two model 0 and 1 and when X = 2,
there are two model 1 and 2
In general if λ is an integer, there are two models; λ − 1 and λ .

46
STA 2200 Probability and Statistics II

If λ is not an integer, then the mode is the integer below λ . E.g. if X ∼ Poi(4.9) ,
then the mod is 4 and if X ∼ Poi(9) then the mode is 8 and 9.

6.1.2. The MGF of a Poisson Distribution


By definition,

mx (t) = E etX
 


λ x e−λ
= ∑ etx x!
x=0

λx
= e−λ ∑ etx
x=0 x!

(λ et )x
= e−λ ∑
x=0 x!
" #
(λ et )2 (λ et )3
= e−λ t
1+λe + + +...
2! 3!
t
= e−λ eλ e
t
= eλ (e −1)
or
t
= e−λ (1−e )

Mean (X)

E [X] = m0x (t)|t=0


t−1 )
m0x (t) = λ et eλ (e
E [X] = m0x (t)|t=0 = λ

47
STA 2200 Probability and Statistics II

Var(X)

E X2 m00x (t)|t=0

=
t−1 t−1 )
m00x (t) = λ et eλ (e ) + λ et λ et eλ (e
E X 2 = m00x (t)|t=0 = λ + λ 2


Var(X) = E X 2 − [E (X)]2


= λ + λ 2 − (λ )2
Var(X) = λ

Example . The number of calls per to minutes received at a telephone switch


board follows a Poisson distribution with mean 0.6. Find the probability that
No call will be received in the first 10 minutes
More than 2 calls will be received in a period of 40 minutes.
Solution:
From the information given, λ = 0.6
Let X be the random variable representing the number of calls received per 10
minutes

X ∼Poi (0.6)
e−0.6 (0.6)x
P (X = x) = , x = 0, 1, 2
x!
e−0.6 (0.6)0
P (X = 0) = = 0.55
0!

Let X1 be the number of calls received in 40 minutes

48
STA 2200 Probability and Statistics II

X ∼ Poi (0.6)
X1 ∼ Poi (4 × 0.6)
X1 ∼ Poi (2.4)
e−2.4 (2.4)x
P (X1 = x) = , x = 0, 1, 2, . . .
x!
P (X1 > 2) = 1 − [P (X1 = 0) + P (X1 = 1) + P (X1 = 2)]
" #
e−2.4 (2.4)0 e−2.4 (2.4)1 e−2.4 (2.4)2
= 1− + +
0! 1! 2!
e−2.4
= 1−
6.28
= 1 − 0.5697
= 0.430

E XERCISE 9.  A certain hospital usually admits 50 patients per day. On average


3 patients in 100 require rooms provided with special facilities on a certain day. It
is found that there are 3 such rooms available. Assuming that 50 patients will be
admitted. Find the probability that more than 3 patients will require such special
rooms.:

6.1.3. Poisson approximation to the Binomial distribution:


An important property of the Poisson distribution is that it may be used to approx-
imate a binomial random variables, when the binomial parameter n is large and p
is small. Implying that np remains constant. i.e. when n is large (n>50) and p is
small (p < 0.1), the binomial distribution X B(n, 1), can be approximated using the
Poisson distribution with the same mean, i.e. X Poi(np). This approximation gets
better n gets larger and p gets smaller.

49
STA 2200 Probability and Statistics II

Example . A factory packs bolts in boxes of 500 . The probability that a bolt is
defective is 0.002. Find the probabilities that a box 2 defective bolts.
Solution:
Since n is large and p is small we can use a Poisson approximation.

λ = np
= 500(0.002)
= 1
e−1 (1)2
P(X = 2) =
2!
= 0.184

Example . The mean number of bacteria per milliliter of liquid is known to be


4. Assuming that the number of bacteria follows a Poisson distribution
Find the probability that in 1 ml of liquid there will be no bacteria.
Find the probability that
In 3 ml of liquid there will be less than 2 bacteria.
In 1/2 ml of liquid there will be more than 2 bacteria.
Solution:

50
STA 2200 Probability and Statistics II

X ∼ Poi (4)
e−4 (4)x
P (X = x) = , x = 0, 1, 2, . . .
x!
e−4 (4)0
P (X = 0) = =
0!

X ∼ Poi (3 × 4)
e−12 (12)x
P (X = x) = , x = 0, 1, 2, . . .
x!
P (X < 2) = P (X = 0) + P (X = 1)
e−12 (12)0 e−12 (12)1
= +
0! 1!
−12
= e (1 + 12)
= 0.000079

 
1
X ∼ Poi ×4
2
X2 ∼ Poi (2)
e−2 (2)x
P (X = x) = , x = 0, 1, 2, . . .
x!
P (X > 2) = 1 − [P (X2 = 0) + P (X2 = 1) + P (X2 = 2)]
" #
e−2 (2)0 e−2 (2)1 e−2 (2)2
= 1− + +
0! 1! 2!
= 1 − e−2 (1 + 2 + 2)
= 0.3233

Example . If 2% of the fuses delivered to a company are defective, using Poisson


approximately, what is the probability that a random sample of 400 of this fuses will
contain exactly 6 defective fuses?

51
STA 2200 Probability and Statistics II

Solution
p = 0.02, n = 400
λ = n ∗ p = 400 ∗ 0.02 = 8
−8 6
P(X = 6) = e 6!(8) = 0.122
Using binomial approach.
p = 0.02, n = 400
(P(X = 6) = 400 6 394 = 0.121

6 0.02 0.98

E XERCISE 10.  Eggs are packed into boxes of 500. On average, 0.7% of the eggs
are found are found to be broken when the eggs are unpacked. Find the correct to 2
sig. figure, the probability that in a box of 500; (a) Exactly three are broken. (b)At
least two are broken.

6.2. Hypergeometric Distribution


Strictly speaking, the binomial distribution could only be used for sampling ‘with
replacement’ and so in previous examples, each trial was assumed to have a con-
stant probability for success and failure. However, we often have sampling ‘with-
out replacement’. In cases where there is a large batch with a small sample, the
binomial distribution could still be used since removal of a few items from the large
batch or tray would not alter the probability.
Suppose that we have a total of N, items of which M are defective. If we pick
a sample of size n without replacement from this batch, then the probability of
number of defective selected in the sample will follow a hypergeometric distribution
which have a pm f given by:

M  N−M 
n n−x
P(X = x) = N
n

Alternatively,
We can derive hypergeometric distribution as follows:
Let N be the population size from which a sample of size n is to be drawn. Let the
proportion of individuals in this finite population who possess certain property of
interest be denoted by p. If X is a random variable corresponding to the number of
individuals in the sample possessing the property of interest, then the problem is to
find the distribution function of X.

52
STA 2200 Probability and Statistics II

Since x individuals must come from N p individuals in the population with property
of interest and the remaining n-x individuals come from N − N p who do not possess
the property of interest, then by using the idea of combinations the distribution of
X is given by

N p N−N p
x n−x
P(X = x) = N
n
which is the hypergeometric distribution
Suppose then that N p = k then
 k N−k
 (x)(Nn−x ) , x = 0, 1, . . . , n
(n)
0 , elsewhere

6.2.1. Mean and Variance of Hypergeometric distribution


Mean (X)

E [X] = ∑ xP(X = x)
allx
k N−k
= ∑ x x Nn−x

x=0 n
k−1 N−k
nk n x−1 n−x
= ∑ N−1
N x=1 n−1
let x − 1 = y
=⇒ n − x = n − 1 − y
=⇒ x = y + 1
k−1 N−1−k+1
nk n−1 y n−1−y
E [X] = ∑ N−1 
N y=0 n−1
nk
E [X] =
N

Var(X)

53
STA 2200 Probability and Statistics II

E X 2 − [E (X)]2

Var(X) =
But
n k N−k
E [X (X − 1)] = ∑ x(x − 1) x Nn−x
x=0 n
n−2 k−2 N−2−k+2
 
n(n − 1)k(k − 1) x−2 n−2−y
= ∑ N−2
N(N − 1)

y=0 n−2
let x − 2 = y
=⇒ n − x = n − y − 2
=⇒ x = y + 2
k−2 N−2−k+2
n(n − 1)k(k − 1) n−2 y n−2−y
E [X (X − 1)] = ∑ N−2
N(N − 1) y=0 n−2
n(n − 1)k(k − 1)
=
N(N − 1)
= E X 2 − [E (X)]2

Var(X)
= E [X (X − 1)] + E (X) + [E (X)]2
 2
n(n − 1)k(k − 1) nk nk
= + +
N(N − 1) N N
n(n − 1)k(k − 1) nk n k2 2
= + + 2
N(N − 1) N N
 
nk (k − 1) nk
(n − 1) +1−
N N −1 N
 
nk (n − K) (N − n)
=
N N (N − 1)

NB: The mg f of the hypergeometric distribution is not useful.


Example . A box of 20 spare parts for a certain type of a machine contains 15
good items and 5 defective items. If 4 parts selected by chance from the box, what
is the probability that exactly 3 of them will be good?

Solution
Let the random variable X=No of good items. Using the hypergeometric distribu-
tion

54
STA 2200 Probability and Statistics II

k N−k
x n−x
p(X = x) = N
n
N = 20, k = 15, N − k = 5, n = 4, x = 3, n − x = 1
15 5
3 1
p(X = 3) = 20
4
5! 5!
12!3! × 4!1!
= 20!
16!4!
(15)(14)(13) 5 (4)(3)(2)(1)
= × ×
(3)(2)(1) 1 (20)(19)(18)(17)
(5)(7)(13)
=
(19)(3)(17)
455
= = 0.4696
969

Example . Suppose a population consists of 100 individuals whom 10% have


high blood pressure.What is the probability of getting atmost two individuals with
high blood pressure when choosing 8 individuals?

Solution
Let the random variable X be the number of individuals with high blood pressure

k N−k
x n−x
p(X = x) = N
n
N = 100, k = 10, N − k = 90, n = 8
10 90 
2
8−x
x
p(X ≤ 2) = ∑100
x=0 8
10 90  10 90  10 90 
0 8−0
= 100
+ 1 1008−1
 + 2 8−2
100
8 8 8
= 0.97

Example . A JKUAT messenger is asked to deliver 10 out of 16 letters to the


Institute of computer science department and the remaining to Engineering depart-

55
STA 2200 Probability and Statistics II

ment.He gets the letters mixed up and on arrival in the computer science department
delivers 10 letters at random to the department.What is the probability that only six
of the letters to be delivered to the computer science department actually arrive
there?

Solution
Let the random variable X=No of letters delivered to the computer science depart-
ment.

k N−k
x n−x
p(X = x) = N
n
N = 16, k = 10, N − k = 6, n = 10, x = 6, n − x = 4
10 6
6 4
p(X = 6) = 16
10
5! 5!
12!3! × 4!1!
= 20!
16!4!
(10)(9)(7)(6) 6 × 5 (6)(5)(4)(3)(2)(1)
= × ×
(4)(3)(2)(1) 2 × 1 (16)(15)(14)(13)(12)(11)
= 0.393

Example . A batch of 10 rocker cover gaskets contains 4 defective gaskets. If


we draw samples of size 3 without replacement, from the batch of 10, find the
probability that a sample contains 2 defective gaskets

Example . In the manufacture of car tyres, a particular production process is


know to yield 10 tyres with defective walls in every batch of 100 tyres produced.
From a production batch of 100 tyres, a sample of 4 is selected for testing to de-
struction. Find:
(a) the probability that the sample contains 1 defective tyre
(b) the expectation of the number of defectives in samples of size 4

56
STA 2200 Probability and Statistics II

(c) the variance of the number of defectives in samples of size 4.

Solution

E XERCISE 11.  In a school there are 20 students .6 are compulsive smokers and
keep cigarettes in their lockers all the time. One day prefects decide to check at
random on 10 lockers . What is the probability that the Will find cigarettes in at
least 3 of the lockers?

6.3. Summary
In this lesson, we have mainly looked at two very important discrete distributions,
i.e. Poisson and Hypergeometric distributions. We have indicated that the Poisson
distribution has one unique property and that is, its mean and variance are equal,
and denoted by λ . We have also derived the mg f of the Poisson distribution and
t t
shown that the mg f is given as eλ (e −1) or e−λ (1−e ) . Given the mg f of a random
t
variable as e−4(1−e ) , we can be able to use this information (knowing the its the mg f
of a Poisson distribution) to find some defined probability of a random variable e.g
P(X = 2). This is because, from this information, λ = 4, while e is a constant.
Finally, we have also indicated that the mg f of the Hypergeometric is not useful!

57
STA 2200 Probability and Statistics II

Learning Activities

1. Show that E(X) = Var(X) = λ for a Poisson distribution.

2. An insurance company attends 50 clients per day. On average 3 in 100 require


special services. On a certain day it is found that there are 3 special service
providers available. Assuming that 50 clients will be attended to,find the
probability that more than 3 clients require special services.

3. A large shipment of books contains 2% with imperfect bindings. Using the


Poisson approximate to the binomial, determine the probability that among
600 books;

(a) At most 15will have imperfect binding.


(b) Exactly 15 will have imperfect binding
(c) At least 10 will have imperfect binding.

4. A box contains 6 blue marbles and 4 red marbles. An experiment is per-


formed in which a marble is chosen at random and its colour quoted but the
marble is not replaced. Find the probability that after 5 trials of the experi-
ment, 3 blue marbles will have been chosen.

5. In 500 independent calculations, a student has made 25 errors. His instruc-


tor randomly checks 7 calculations out of 500. Find the probability that the
instructor detects:

(a) Exactly 2 errors.


(b) At most 2 errors.

6. Out of 60 applications in a university, 40 are from the East. If 20 applications


are to be selected at random, find the prob. That:

(a) 10 will be from east


(b) Not more than 2 will be from east.

58
STA 2200 Probability and Statistics II

LESSON 7
Continuous Probability Distributions

Learning outcomes
Upon completing this topic, you should be able to:

• Distinguish between a discrete and continuous distribution

• Identify the various continuous distributions discussed in this lesson, their


pd f s, mg f s

• Find the expected values of the gamma, beta, chi-square and exponential dis-
tributions from the pd f s and mg f s

59
STA 2200 Probability and Statistics II

7.1. Uniform (Rectangular) Distribution


The uniform distribution is a very simple distribution and is particularly useful in
theoretical statistics because it is convenient to deal with mathematically .
The uniform distribution has a random variable X restricted to a finite interval [a, b]
and has f (x) constant density over the interval. Consider the following figure;

From the figure, it represents a random variable with a uniform distribution. If we


want to find the mid-point of the interval [a, b], we can find the mean of the two
points, that is a+b
2 , as indicated in the figure. Similarly, we shall see that the mean
of the uniform distribution is obtained as a+b2 .

Definition

A continuous random variable X has a uniform distribution over h interval [a, b]


with pd f given by

1

b−a , −∞ < a ≤ x ≤ b < ∞
f (x) =
0 elsewhere

This is also denoted as X ∼ Uni f (a, b), read as X has a uniform distribution with
the interval [a, b]
Example . Show that this is a pd f .
Solution:
´∞
We have previously shown that for a continuous function to be a pd f , −∞ f (x)dx =
1
´b ´b 1
Therefore, a f (x)dx= a b−a dx = 1
b a b−a
=⇒ ( b−a − b−a ) = b−a = 1 
If the random variable X is uniformly distributed over [a, b] then

60
STA 2200 Probability and Statistics II

ˆ ∞
E [X] = x f (x)dx
−∞
ˆ b
1
= x dx
a b−a
ˆ b
1
= xdx
b−a a
 2 b
1 x
=
b−a 2 a
 2
b − a2

1
=
b−a 2
1 (a + b)(a − b)
=
b−a 2
a+b
=
2

a + b 2  2
 
 2 2  2
Var(X) = E X − (E [X]) = E X − E X
2
ˆ b  3 b  3
b − a3 b3 − a3

2 1 1 x 1
= x dx = = =
a b−a b−a 3 a b−a 3 3 (b − a)

b3 − a3 a+b 2
 
∴ Var (X) = −
3 (b − a) 2

2
(a + b)2 4 b3 − a3 − (a + b) 3 (b − a)

b3 − a3
= − =
3 (b − a) 4 12 (b − a)

4 (b − a) b + a2 + ab − (a + b)2 3 (b − a)

=
12 (b − a)
2
4 b2 + a2 + ab − 3 (a + b)2 4b2 + 4a2 + 4ab − 3 a2 + b2 + 2ab

= =
12 12

4b2 + 4a2 + 4ab − 3a2 − b2 − 6ab b2 + a2 − 2ab


= =
12 12
Note: By symmetry, the mean and median of the uniform distribution are both

61
STA 2200 Probability and Statistics II

equal to a+b
2
Similarly, we can find the variance as follows
Example . The change in depth of a river from one day to the next, measured at
a specific
location is a random variable Y with pd f
k , −2 ≤ x ≤ 2
f (x) =
0 elsewhere
(a) Find k
(b) Whats the distribution function of Y ?
Solution:
1 1
(a) Y ∼ Uni f (−2, 2)=⇒ k = b−a = 2−(−2) = 14
(b) Here, the question is about the cumulative distribution (cdf) of the random vari-
able Y, i.e. F(y). 

 0 , y < −2
´

y 1
Therefore, F(y) = dy , −2 ≤ y ≤ 2
 −2 4

y≥2

1 ,
Evaluating
 the integral above, we have


 0 , y < −2

F(y) = 14 (y + 2) , −2 ≤ y ≤ 2 


y≥2

1 ,

E XERCISE 12.  Beginning at 12.00 am, a computer center is up for 1 hour and
then down for 2 hours on a regular cycle. Wayua who was unaware of this sched-
ule dials the center at a random time between 12.00 am and 5.00 am. Find the
probability that the center will be up when her call comes in.

7.1.1. The mg f of random variable uniformly distributed


By definition

62
STA 2200 Probability and Statistics II

mx (t) = E etX
 
ˆ b
1
= etx dx
a b−a
ˆ b !
1
= etx dx
b−a a
 tx b
1 e
=
b−a t a
 bt
e − eat

1
=
b−a t
bt
e −e at
=
t (b − a)

7.2. The Gamma, Chi-square, Exponential and Beta distributions


7.2.1. The Gamma and Beta Distributions
Before looking at two important types of distributions known as the Gamma and
Beta distribution, it is necessary to define the Gamma and Beta functions.

Gamma Function
The Gamma function is denoted by Γα and is defined by
ˆ ∞
Γα = xα−1 e−x dx
0

for α > 0. If α = 1 then ˆ ∞


Γ1 = e−x dx = 1
0
If α > 1, an integration by parts shows that
ˆ ∞
Γα = (α − 1) xα−1 e−x dx
0
= (α − 1) Γ (α − 1)

Therefore if αis a positive integer then the gamma functions have the following
properties
´∞
(a) Γ1 = 1, since 0 e−x dx = 1

63
STA 2200 Probability and Statistics II

(b) Γα = (α − 1)Γ(α − 1)
(c) Γα = (α − 1) (α − 2) . . . (3)(2)Γ1 = (α − 1)!
=⇒ Γ (α + 1)=α! and
Γ (α + 2) = (α + 1)! etc

Beta Function
The Beta function B (α, β ) with two parameters α, β (given as powers within the
integral) is defined as

ˆ 1
B (α, β ) = xα−1 (1 − x)β −1 dx, α > 0, β > 0
0
Properties

B (α, β ) = B (β , α)
ˆ 1
i.e B (α, β ) = xα−1 (1 − x)β −1 dx
0
let 1−x =µ
du = −dx
x = 1−µ
ˆ 1
B (α, β ) = − (1 − u)α−1 uβ −1 du
0
ˆ 1
= uβ −1 (1 − u)α−1 du = B (β , α)
0

Relationship between the beta and gamma functions


The beta function is related to the gamma function according to the following for-
mula making use of the properties of the two functions

ΓαΓβ
B (α, β ) =
Γ (α + β )

Gamma Distribution
There are two types of gamma distributions. The first (having only one parameter,
α) and the second type (having two parameters α and β ). The two types of gamma
distributions are defined as follows:

64
STA 2200 Probability and Statistics II

Type I
A random variable X has a gamma distribution with parameter α > 0, if the pd f is
given by


 xα−1 e−x , x ≥ 0, α > 0
Γα
f (x) =
0 , elsewhere

Proof
We can derive the gamma distribution from the gamma function as follows
ˆ ∞
Γα = xα−1 e−x dx
0
dividing this function by Γα , we have
´∞
Γα 0 xα−1 e−x dx
=
Γα Γα
´∞
xα−1 e−x dx
0
=1
Γα
It can be shown using the properties of the gamma function, that the mean and
variance of this distribution are both equal to α.
Type II
A random variable X has a gamma distribution with parameters α > 0 and β if its
pd f is given by
 −x
1

Γαβ α xα−1 e β , 0 < x < ∞, β > 0, α > 0,
f (x) =
0 , elsewhere
Proof
We derive the gamma distribution from the gamma function as follows
ˆ ∞
Γα = xα−1 e−x dx
0

65
STA 2200 Probability and Statistics II

y
Introduce a new variable y by letting x = β where β > 0,then

ˆ
y α−1 − βy 1
∞

Γα = e dy
0 β β
ˆ ∞ α−1 −y
y
= α
e β dy = 1
0 Γαβ

Now since α > 0β > 0 and Γα > 0 then


−y

yα−1

Γαβ α e
β , 0<y<∞
f (x) =
0 , elsewhere

For this type of the gamma distribution, the mean, E(X) = αβ and var(X) = αβ 2
Example . Show that, for a gamma distribution, with parameters α and β , given
by the pd f
 −x
xα−1

Γαβ α e
β , 0<x<∞
f (x) =
0 , elsewhere

The mean E(X) = αβ and var(X) = αβ 2


Solution:
´∞
The E(X) = 0 x f (x)dx
´ ∞ xα−1 −x
= 0 Γαβ αe
β dx

Substituting u = βx such that du = dx


β , then the limits for u remains the same as for
the variable x.
Then dx = duβ
´ ∞ α e−u
Thus, mean= 0 u Γα β du
β ´ ∞ α −u
= Γα 0 u e du
´∞
We recall that, Γα = 0 xα−1 e−x dx and comparing to the expression above, we can
then simplify our expression as follows
β β
= Γα Γ(α + 1) = Γα αΓα = αβ since Γα = (α − 1)Γ(α − 1)
Thus, E(X) = αβ
For the variance
´∞
var(X) = 0 x2 f (x)dx − (αβ )2

66
STA 2200 Probability and Statistics II

x dx
making the same substitution as we did for the mean, u = β such that du = β , the
variance can be expressed in the form
´ ∞ α+1 e−u β 2
var(X) = 0 u Γα du − (αβ )2
2 ´ ∞ α+1 −u
β
= Γα 0 u e du − (αβ )2
β2
= Γα Γ(α + 2) − (αβ )2
2
β
= Γα (α + 1)Γ(α + 1) − (αβ )2 simplifying further
2
β
= Γα (α + 1)αΓ(α) − (αβ )2
= β 2 α 2 + αβ 2 − α 2 β 2
= αβ 2
Hence, var(X) = αβ 2 

7.2.2. mg f of a Gamma Distribution

mx (t) = E(etX )

ˆ ∞
1 −x
= etx x α−1 β
e dx
0 Γαβ α

ˆ ∞
1 α−1 tx β
−x
= x e e dx
0 Γαβ α

1
= xα−1 e−x(1−βt)/β dx
Γαβ α

1
let y = x (1 − βt) /β , t <
β

 
1 − βt
=⇒ dy = dx
β

67
STA 2200 Probability and Statistics II

β dy
=⇒ dx =
1 − βt

 α ˆ ∞
1 1 α−1 −y
= y e dy
1 − βt 0 Γα

βy
and x=
1 − βt

ˆ ∞  α−1
β / (1 − βt) βy
∴ mx (t) = e−y dy
0 Γαβ α 1 − βt

ˆ ∞
1 α−1 −y
but y e dy = 1(pd f )
0 Γα

 α
1 1
= ,t <
1 − βt β

= (1 − βt)−α

E(X) = m0x (t)|t=0

but m0x (t) = −α (−β ) (1 − βt)−α−1

= αβ (1 − βt)−α−1

68
STA 2200 Probability and Statistics II

E(X) = m0x (t)|t=0 = αβ

E X 2 − (E [X])2
 
var(X) =
E X 2 − α 2β 2
 
=
m00x (t) = αβ (−α − 1) (−β ) (1 − βt)α−2
E X 2 = m00x (t)|t=0 = αβ (αβ + β )
 
=
= α 2 β 2 + αβ 2
E X 2 − α 2β 2
 
∴ var(X) =
= α 2 β 2 + αβ 2 − α 2 β 2
= αβ 2

7.2.3. Chi-Square distribution


Chi-square distribution is a special case for gamma distribution where α = α2 , r >
0, β = 2
 r −x
 x 2 e r2 ,0 < x < ∞
Γr/22 /2
f (x) =
0 elsewhere
Likewise the mg f of a Chi-square distribution is

mx (t) = (1 − 2t)− /2 ,t < 1/2


r

The mean and variance of a chi-square distribution is αβ = 2r (2) = r and variance


= αβ 2 = 2r (4) = 2r respectively.

7.2.4. Exponential distribution


Exponential distribution is also a special case of gamma distribution where α =
1and β = θ = λ1

69
STA 2200 Probability and Statistics II


 1 e−x/θ , 0 < x < ∞
f (x) = θ
0 , elsewhere
or

λ e−λ x ,0 < x < ∞
f (x) =
0 , elsewhere
The mg f of an exponential distribution is

1
1 − θt

or λ 1−t
E(x) = λ1 or θ
var(X)= λ12 or θ 2

7.2.5. Beta Distribution


A continuous random variable X is said to have a beta distribution with parameters
αand β if its pd f is given by
 α−1
 x (1−x)β −1 , 0 ≤ x ≤ 1, β > 0, α > 0
B(α,β )
f (x) =
0 , elsewhere

or

 Γ(α+β ) xα−1 (1 − x)β −1 , 0 ≤ x ≤ 1, β > 0, α > 0
ΓαΓβ
f (x) =
0 elsewhere
The mg f of a beta distribution is not useful but we can as well get the mean and
variance for the distribution

70
STA 2200 Probability and Statistics II

ˆ 1
E(X) = (x)1 f (x)dx
0
ˆ 1
(x)1 xα−1 (1 − x)β −1
= dx
0 B (α, β )
´1 x (1 − x)β −1
α
= 0 dx
B (α, β )
ˆ 1
but B (α, β ) = xα−1 (1 − x)β −1 dx
0

by convention
ˆ 1
xα (1 − x)β −1 dx = B (α + 1, β )
0
B (α + 1, β )
=⇒
B (α, β )
Γ (α + 1) Γβ Γ(α + β )
= ×
Γ(α + β + 1) ΓαΓβ
αΓαΓβ Γ (α + β )
= ×
(α + β ) Γ (α + β ) ΓαΓβ
α
=
α +β

Next var(X)

71
STA 2200 Probability and Statistics II

ˆ
x2 xα−1 (1 − x)β −1
1

E(X 2 ) = dx
0 B (α, β )
x α+2−1 (1 − x)β −1
= dx
B (α, β )
B (α + 2, β )
=
B (α, β )
Γ (α + 2) Γβ Γ (α + β )
= ×
Γ (α + β + 2) ΓαΓβ
(α + 1)Γ(α + 1)Γβ × Γ (α + β )
=
(α + β + 1) Γ (α + β + 1) ΓαΓβ
(α + 1)αΓαΓβ × Γ (α + β )
=
(α + β + 1) (α + β ) Γ (α + β ) ΓαΓβ
(α + 1) α
=
(α + β + 1) (α + β )
 2
(α + 1) α α
E X 2 − (E [X])2
 
= −
(α + β + 1) (α + β ) α +β
2 2
(α + α)(α + β ) − α (α + β + 1)
=
(α + β + 1) (α + β )2
α 3 + α 2 + α 2 β + αβ − α 3 − α 2 β − α 2
=
(α + β + 1) (α + β )2
αβ
=
(α + β + 1) (α + β )2

72
STA 2200 Probability and Statistics II

7.3. Summary
We have mainly looked at some of the continuous distribution and derived the var-
ious probability distributions. Where possible, we have derived the mg f s of the
different continuous distributions and also used the mg f s to obtain the mean and
variance of the random variables. We have also seen that the Chi-square and the
exponential distributions are special cases of the beta distributions.

73
STA 2200 Probability and Statistics II

7.4. Revision questions


1. Your friend travels regularly by plane. From past rewards he feels that take
off time is uniformly distributed between 80 and 120 minutes after his check
in at the airport from which he is departing. Determine the probability that:

(a) He waits more than 500 minutes for the take off after checking in.
(b) His waiting time for the take off after checking in will be within 1.5
standard deviations of the mean waiting time

74
STA 2200 Probability and Statistics II

LESSON 8
Change of Variable and Distribution function Techniques

Learning outcomes
Upon completing this topic, you should be able to:

• Know how to find the probability distribution of functions of random vari-


ables using the following two approaches, in addition to the mg f approach
previously considered

– Distribution function technique


– Change of variable technique

75
STA 2200 Probability and Statistics II

8.1. Distribution Functions of Random Variables


Here we learn how to find the pd f of random variables. For example,we might
know the pd f of X but instead want to know the pd f of u(x) = x2 , a function of X.
There are three techniques of finding the distribution functions of random variables
including;

• The distribution functions technique

• The change of variable technique and

• The mg f technique.

We explore functions of random variables that are independent and identically dis-
tributed. For instance if X1 is the weight of a randomly selected individuals from
the population of males, X2 is the weight of another randomly selected individual
from the population of male...Xn ; then we might be interested in learning how the
random function; X̄ = X1 +X2 +....+X
n
n
is distributed

8.1.1. Functions of one random variable.


We begin the exploitation of the distribution functions of random variables by focus-
ing on simple functions of one random variable. For instance, if X is a continuous
random variable, we take a function of X say: y = u(x), then y is also continuous
random variable that has its own problem distribution. We learn how to find the
pd f of y using the distribution for technique and change of variable technique.
Simply put, Let X denote a random variable with density function f (x), and and
define Y = g(X) for some function g(X). We want to find the pd f of g(X).

8.2. Distribution Function Technique


For finding the probability density function with a given joint probability density,
the probability density function of Y = µ(X1 , X2 , X3 , ...Xn ) can be obtained by first
finding the cumulative probability or distribution function. Thus, the distribution
function technique is used to find the pd f of the random variable y = u(x) by :

1. First, finding the cumulative distribution function Fy (y) = P(Y ≤ y)

2. Then, differentiating the cumulative distribution function F(y) to get the


0
probability density function f (y). that is fy (y) = Fy (y)

76
STA 2200 Probability and Statistics II

Example . Let X be a continuous random variable with the following probability


function; f (x) = 3x2 for 0 < x < 1. What is the pd f of Y = X 2 ?
Solution:
Consider the following graph

Looking at the graph of the function y = x2 , we note that the function is increasing
in X and 0 < y < 1
That noted, let’s now use the distribution function technique to find the pd f of Y.
First, we find the cumulative distribution function of Y:
1 1
Fy (y) = P(Y ≤ y) = P(X 2 ≤ y) = P(X ≤ y 2 ) = FX (y 2 )
´ y1 1
3
= 0 2 3t 2 dt = [t 3 ]y0 = y 2 ,0 < y < 1
2

Having shown that the cumulative distribution function of Y is:


3
FY (y) = y 2 for 0 < y < 1,
we now just need to differentiate F(y) to get the probability density function f (y).
Doing so, we get:
1
fY (y) = F´Y (y) = 23 y 2 f or 0 < y < 1.
Our calculation is complete! We have successfully used the distribution function
technique to find the pd f of Y, when Y was an increasing function of X. (By the
way, you might find it reassuring to verify that f (y) does indeed integrate to 1 over
the support of y. In general, that’s not a bad thing to check.) 

E XERCISE 13.  Let X be a continuous random variable with the following prob-
ability density function: f (x) = 3(1−x)2 for 0 < x < 1. What is the probability
density function of Y = (1−X)3 ?
µ x e−µ
E XERCISE 14.  Let X have the Poisson pd f , f (x) = x! for x = 0, 1, 2.... Find
the pd f of Y = 4X

77
STA 2200 Probability and Statistics II

Consider the following problems

78
STA 2200 Probability and Statistics II

Similarly

Next

8.3. Change Of Variable Technique


On the previous pages, we used the distribution function technique in examples 1
and 2. In the first example, the transformation of X involved an increasing func-
tion, while in the second example, the transformation of X involved a decreasing
function. Here, we’ll generalize what we did there first for an increasing function
and then for a decreasing function. The generalizations lead to what is called the
change-of-variable technique.

79
STA 2200 Probability and Statistics II

8.3.1. Generalization for an Increasing Function


Let X be a continuous random variable with a generic pd f f (x) defined over the
support c1 < x < c2 . And, let Y = u(X) be a continuous, increasing function of X
with inverse function X = v(Y ). Here’s a picture of what the continuous, increasing
function might look like:

The blue curve, of course, represents the continuous and increasing function Y =
u(X). If you put an x-value, such as c1 and c2 , into the function Y = u(X), you get
a y-value, such as u(c1 ) and u(c2 ).
But, because the function is continuous and increasing, an inverse function X =
v(Y ) exists. In that case, if you put a y-value into the function X = v(Y ), you get an
x-value, such as v(y).
Okay, now that we have described the scenario, let’s derive the distribution function
of Y. It is:
´ v(y)
FY (y) = P(Y ≤ y) = P(u(X) ≤ y) = P(X ≤ v(y)) = c1 f (x)dx

for d1 = u(c1 ) < y < u(c2 ) = d2 . The first equality holds from the definition of the
cumulative distribution function of Y. The second equality holds because Y = u(X).
The third equality holds because, as shown in red on the following graph, for the
portion of the function for whichu(X) ≤ y, it is also true that X ≤ v(Y ):

And, the last equality holds from the definition of probability for a continuous ran-
dom variable X. Now, we just have to take the derivative of FY (y), the cumulative
distribution function of Y, to get fY (y), the probability density function of Y. The

80
STA 2200 Probability and Statistics II

Fundamental Theorem of Calculus, in conjunction with the Chain Rule, tells us that
the derivative is:
fY (y) = FY0 (y) = fx (v(y)) · v0 (y), for d1 = u(c1 ) < y < u(c2 ) = d2

8.3.2. Generalization for a Decreasing Function


Let X be a continuous random variable with a generic pd f f (x) defined over the
support c1 < x < c2 . And, let Y = u(X) be a continuous, decreasing function of X
with inverse function X = v(Y ). Here’s a picture of what the continuous, decreasing
function might look like:

The blue curve, of course, represents the continuous and decreasing function Y =
u(X). Again, if you put an x-value, such as c1 and c2 , into the function Y = u(X),
you get a y-value, such as u(c1 ) and u(c2 ). But, because the function is continuous
and decreasing, an inverse function X = v(Y ) exists. In that case, if you put a
y-value into the function X = v(Y ), you get an x-value, such as v(y).
That said, the distribution function of Y is then:
´ v(y)
FY (y) = P(Y ≤ y) = P(u(X) ≤ y) = P(X ≥ v(y)) = 1−P(X ≤ v(y)) = 1− c1 f (x)dx
for d2 = u(c2 ) < y < u(c1 ) = d1 . The first equality holds from the definition of the
cumulative distribution function of Y . The second equality holds because Y = u(X).
The third equality holds because, as shown in red on the following graph, for the
portion of the function for which u(X) ≤ y, it is also true that X ≥ v(Y ):

The fourth equality holds from the rule of complementary events. And, the last
equality holds from the definition of probability for a continuous random variable

81
STA 2200 Probability and Statistics II

X. Now, we just have to take the derivative of FY (y), the cumulative distribution
function of Y, to get fY (y), the probability density function of Y. Again, the Funda-
mental Theorem of Calculus, in conjunction with the Chain Rule, tells us that the
derivative is:
fY (y) = FY0 (y) = − fx (v(y)) · v0 (y)
for d2 = u(c2 ) < y < u(c1 ) = d1 . You might be alarmed in that it seems that the pd f
f (y) is negative, but note that the derivative of v(y) is negative, because X = v(Y )
is a decreasing function in Y . Therefore, the two negatives cancel each other out,
and therefore make f (y) positive.
Finally....We have now derived what is called the change-of-variable technique
first for an increasing function and then for a decreasing function. But, continu-
ous, increasing functions and continuous, decreasing functions, by their one-to-one
nature, are both invertible functions. Let’s, once and for all, then write the change-
of-variable technique for any generic invertible function

Definition

Let X be a continuous random variable with generic probability density function


f (x) defined over the support c1 < x < c2 . And, let Y = u(X) be an invertible
function of X with inverse function X = v(Y ). Then, using the change-of-variable
technique, the probability density function of Y is:

fY (y) = fX (v(y)) × |v0 (y)|

defined over the support u(c1 ) < y < u(c2 )


Example . Let’s return to our example in which X is a continuous random vari-
able with the following probability density function:
f (x) = 3x2
for 0 < x < 1. Use the change-of-variable technique to find the probability density
function of Y = X 2 .
Solution:
Note that the function: Y = X 2 defined over the interval 0 < x < 1 is an invertible
function.

The inverse function is: x = v(y) = y = y1/2 for 0 < y < 1. (That range is because,
whenx = 0, y = 0; and when x = 1, y = 1).

82
STA 2200 Probability and Statistics II

1
Now, taking the derivative of v(y), we get: v0 (y) = y−1/2
2
Therefore, the change-of-variable technique:
fY (y) = fX (v(y)) × |v0 (y)|
tells us that the probability density function of Y is:
1
fY (y) = 3[y1/2 ]2 · y−1/2
2
And, simplifying we get that the probability density function of Y is:
3
fY (y) = y1/2
2
We shouldn’t be surprised by this result, as it is the same result that we obtained
using the distribution function technique. 

E XERCISE 15.  Let’s return to our example in which X is a continuous random


variable with the following probability density function: f (x) = 3(1 − x)2 for 0 <
x < 1. Use the change-of-variable technique to find the probability density function
of Y = (1−X)3

83
STA 2200 Probability and Statistics II

8.4. Learning Activities


Let the probability density function of X be given by

6x(1 − x), 0 < x < 1
f (x) =
0 Otherwise

Using the distribution function technique and the change of variable technique, find
the probability density of Y = X 3 and comment on the results obtained using the
two approaches.

84
STA 2200 Probability and Statistics II

LESSON 9
Normal Distribution

Learning outcomes
Upon completing this topic, you should be able to:

• Be familiar with the normal distribution and the standard normal distribution

• Be able to calculate probabilities using the standard normal distribution

• Recognize when it is appropriate to use the normal distribution to solve par-


ticular problems

• Recognize when it is appropriate to use the normal approximation to the bi-


nomial distribution

• Solve problems using the normal approximation to the binomial distribution.

• Interpret the answer (after the approximation) in terms of the original prob-
lem.

85
STA 2200 Probability and Statistics II

9.1. Introduction
This is the most popular and commonly used distribution. A continuous random
variable X is said to have a normal distribution if its pd f is given by

− 1 (x−µ)2
 √1 e 2σ 2 , −∞ < x < ∞, −∞ < µ < ∞, σ > 0
f (x) = σ 2π
0 , elsewhere

Sometimes we use the notation X ∼ N µ, σ 2 .




From the above pd f and notation, we observe that the normal distribution depends
on two unknown parameters µ and σ . Later we will show that these parameters µ
and σ are mean and standard deviation respectively.
Once the parameters µ and σ are specified the distribution is completely determined
hence the values of f (x) can be evaluated from the values of X and normal curve
plotted; which is normally bell-shaped and symmetric about x = µ.
Note that the total area under the curve is always1 .
ˆ ∞
1 − 1 (x−µ)2
f (x) = √ e 2σ 2 dx = 1
−∞ σ 2π

The probability that X lies between a and b is given by


´b − 1 (x−µ)2
p (a < x < b) = a σ √12π e 2σ 2 dx
However, it is not easy to evaluate the above integral and so we make use of the
standard normal variable.
i.e if µ =
0 and σ 1= 21 the density function for X becomes
 √1 e− 2σ 2 x , −∞ < x < ∞
f (x) = 2π
0 , elsewhere
In this situation the random variable is referred to as a standard normal variable
and its distribution is referred to as normal distribution i.e X ∼ N (0, 1) .
To simplify the computation of probabilities involving the normal variable we first
of all standardize the random variable.i.e we make it possess the standard normal
density. This is done by the following transformation

X −µ
z=
σ
In this case z is a standardized variable. Note that if X ∼ N µ, σ 2 then the stan-


86
STA 2200 Probability and Statistics II

dardized variable
X −µ
z= ∼ N (0, 1) .
σ
E(Z) = E(X−µ)
σ = E(X)−E(µ)
σ = σµ − σµ = 0
σ2
var(z) = var (X−µ) = var σX − var σµ = var(X) µ
 
σ σ2
− var σ = σ2 =1
Note var σµ =variance of constant =0.


If z ∼ N(0, 1) then ˆz
1 − 1 z2
P (Z < z) = e 2 dz
−∞ 2π
These probabilities have been evaluated and given in form of a table.
Now the probability i.e
P(x ≤ b) can be evaluated by
 
x−µ b−µ
= P ≤
σ σ
=⇒ P(x ≤ b) = P(Z < a∗) − − − −(1)

where a∗ = b−µ
σ
This probability on the RHS of (1) can be read from tabulated values of the CDF of
the normal distribution Z.

87
STA 2200 Probability and Statistics II

9.2. Use of Standard Normal Tables

Example . If Z is a standard normal variable. Find the following probabilities.


(a) P(Z < 0)
(b) P (−1 < Z < 1)
(c) P (Z > 2.54)
Solution:
(a) If we sketch the curve and indicate region specified by the probability
Because of symmetry P(Z < 0) = P(Z > 0) = 1 − P(Z > 0) = 0.5
(b) P(−1 < z < 1)

P(−1 < Z < 1)


= P(Z < 1) − [1 − P(Z < 1)]
= 0.8413 − (1 − 0.8413)
= 0.6826

(c) P(Z > 2.54)


= 1 − 0.9945 = 0.0055 

Example . A random variable X has a normal distribution with mean 9 and stan-
dard deviation 3. Find pP [5 < X < 11]
Solution:
We need to standardize X
i.e z = X−µ
σ
for X = 5
z = 5-9
3 = −1.333
for X = 11
z = 11−9
3 = 0.6667
=⇒ P(−1.33 < Z < 0.667)
=P(Z < 0.667) − (1 − P(Z < 1.333))
= 0.7486 − 0.0918 = 0.6568 

Example . If X ∼ N 10, σ 2 and P(X > 12) = 0.1587. Find P(9 < X < 11)


Solution

88
STA 2200 Probability and Statistics II

P(X > 12) ≡ P Z > 12−10



σ = 0.1587
From the tables the value of Z which corresponds to probability 0.1587 is 1.(i.e
1 − 0.1587 = 0.8413 ≡ 1from tables)
NextP(q < X < 11) = P(−0.5 < Z < 0.5)
P(Z < 0.5) − [1 − P(Z < 0.5)]
= 0.6915 − 0.3085 = 0.83
E XERCISE 16.  The time required to perform a certain job is a random variable
having a normal distribution with mean 55 and s.d of 10 minutes. Find the proba-
bility that (a) The job will take more than 75 min (b) The job will take less than 60
min (c) The job will take between 45-60min
Example . Given that X ∼ N µ, σ 2 . Find the constant b so that


P(−b < Z < b) = P(Z < b) − [1 − P(Z < b)] = 0.95


let P(Z < b) = y
=⇒ y − [1 − y] = 0.95
= 2y − 1 = 0.95
y = 0.9750

The value of Z that corresponds to the probability 0.9750 is 1.96


∴ i f P(Z < b) = 0.975 then b = 1.96

9.3. Properties of the Normal distribution


1. The normal density curve is bell-shaped and symmetric about the value x = µ
since f (x) satisfies f (µ + a) = f (µ − a)
=⇒ P(X < µ) = P(x > µ) = 0.5
µ is the median of the density.

From the curve f (x) is a maximum when X = µ


As X → ∞ f (x) → 0
As X → −∞ f (x) → 0
All the three measures of central tendency, mean, median and mode coincide at
x=µ

89
STA 2200 Probability and Statistics II

2. Since f (x) is a pd f , the area under the normal curve is 1

That is ˆ ∞
1 1 2
√ e− 2 z dx = 1
−∞ 2π
´∞ 1
− 2σ (x−µ)2
or √1 = 1if z ∼ N (0, 1)
−∞ σ 2π e dx

Note

The normal distribution is widely used in statistics despite the fact that populations
hardly follow the exact normal distribution. This is because;

1. If a variable does not follow a normal distribution it can be made to follow it


after making suitable transformations

2. Whatever the original distribution the distribution of the sample can be in


most cases approximated to normal distribution so long as the sample size is
sufficiently increased (CLT i.central limit theorem)

9.4. The mg f of a Normal distribution


By definition

ˆ ∞
1 − 1 (x−µ)2 dx
mx (t) = etx √ e 2σ 2
σ 2π
ˆ−∞

1 − 1 (x−µ)2 +tx
= √ e 2σ 2 dx......(1)
−∞ σ 2π
1 2
but − 2σ1 2 (x − µ)2 + tx = − 2 x2 + µ 2 − 2µx − 2σ 2tx

1
− 2 x2 − µ 2 − 2x(µ + σ 2t

=

1  µ2
= − 2 x2 − 2x(µ + σ 2t − 2 ......(2)
2σ 2σ

90
STA 2200 Probability and Statistics II

we complete the square of

x2 − 2x(µ + σ 2t
2 2
i.e x2 − 2x(µ + σ 2t = x − (µ + σ 2t − µ − σ 2t

2
(µ + σ 2t

1 2 2
 µ2
f rom(2) − 2 x − (µ + σ t + −
2σ 2σ 2 2σ 2
 2 2
1 2 2
 (µ + 2µσ 2t + σ 4t 2 − µ 2
= − 2 x − (µ + σ t +
2σ 2σ 2
1 2 σ 2t 2
= − 2 x − (µ + σ 2t + µt +
2σ 2

From (1)
ˆ ∞ 2 22
1 − 1 x−(µ+σ 2 t ) +µt+ σ 2t
√ e 2σ 2 ( dx
−∞ σ 2π
ˆ ∞ 2
22 1 − 1 x−(µ+σ 2 t )
=e µt+ σ 2t
√ e 2σ 2 (
−∞ σ 2π
ˆ ∞ 2
1 − 1 x−(µ+σ 2 t )
but √ e 2σ 2 ( dx = 1
−∞ σ 2π
σ 2t 2
= eµt+ 2

´∞ − 1 x−(µ+σ 2 t )
2
√1 e 2σ 2 ( dx is a normal density with mean = µ + σ 2t
−∞ σ 2π
and var= σ 2
We note that if we replace µwith 0 and σ 2 with 1 we get the mg f of standard normal
distribution
i.e

1 2
mz (t) = e0+ 2 t
1 2
= e2t

Mean (X) where X ∼ N(µ, σ 2 )

91
STA 2200 Probability and Statistics II

E(X) = m0x (t)|t=0


1 2 2
= µ + σ 2t eµt+ 2 σ t

mx (t)
E(X) =µ
var(X) = E(X 2 ) − (E [X])2
E X 2 = m00x (t)|t=0
 
1 2t 2 1 2 2
m00x (t) = σ 2 eµt+ 2 σ + µ + σ 2t µ + σ 2t eµt+ 2 σ t
 

m00x (t)|t=0 = σ 2 + µ 2
var(X) = E(X 2 ) − (E [X])2
= σ 2 + µ2 − µ2 = σ 2

For z ∼ N(0, 1) we can as well get the mg f of X


i.e

mZ (t) = E(etZ )
ˆ ∞
1 z2
= etz √ e 2
−∞ 2π
ˆ ∞ 2
1 (−z −2tz)
= √ e 2 dz
−∞ 2π
but z − 2tz = (z − t)2
2
ˆ ∞
1 (z−t)2 t 2
=⇒ √ e− 2 + 2
−∞ 2π
ˆ ∞
t2 1 (z−t)2
=e2 √ e− 2 dz
−∞ 2π
t2
=e2

which is the mg f of a standard normal variable.

Alternatively
We can get the mg f of a standard normal distribution from the mg f of a normal
density.
i.e
1 2 2
mx (t) = eµt+ 2 σ t ....(3)

92
STA 2200 Probability and Statistics II

Since z ∼ N(0, 1) we can replace the µ and σ with 0 and 1 respectively from (3).
t2 t2
=⇒ mz (t) = e0+ 2 = e 2
Mean(Z)

E(z) = m0z (t)|t=0


t2
m0z (t) = te 2
m0z (t)|t=0 = 0

Var(X)

= E(z2 ) − (E [z])2
E z2 = m00z (t)|t=0
 
but
now m00z (t)|t=0 = 1 + 0 = 1
∴ var(z)1 + 0 − 02 = 1

9.5. Normal Approximation to the Binomial


We have already seen that the Poisson distribution can be used to approximate the
binomial distribution for large values of n and small values of p provided that the
correct conditions exist. The approximation is only of practical use if just a few
terms of the Poisson distribution need be calculated. In cases where many - some-
times several hundred (compound event)- terms need to be calculated the arithmetic
involved becomes very tedious indeed and we turn to the normal distribution for
help. It is possible, of course, to use high-speed computers to do the arithmetic but
the normal approximation to the binomial distribution negates this in a fairly elegant
way. In the problem situations following this introduction the normal distribution
is used to avoid very tedious arithmetic while at the same time giving a very good
approximate solution.
In order for a continuous distribution (like the normal) to be used to approximate a
discrete one (like the binomial), a continuity correction should be used. There are
two major reasons to employ such a correction.

• First, recall that a discrete random variable can only take on only specified

93
STA 2200 Probability and Statistics II

values, whereas a continuous random variable used to approximate it can take


on any values whatsoever within an interval around those specified values.
Hence, when using the normal distribution to approximate the binomial, more
accurate approximations are likely to be obtained if a continuity correction is
used.

• Second, recall that with a continuous distribution (such as the normal), the
probability of obtaining a particular value of a random variable is zero. On
the other hand, when the normal approximation is used to approximate a dis-
crete distribution, a continuity correction can be employed so that we can
approximate the probability of a specific value of the discrete distribution.
Consider an experiment where we toss a fair coin 12 times and observe the
number of heads. Suppose we want to compute the probability of obtaining
exactly 4 heads. Whereas a discrete random variable can have only a spec-
ified value (such as 4), a continuous random variable used to approximate it
could take on any values within an interval around that specified value, as
demonstrated in this figure:

• The continuity correction requires adding or subtracting 0.5 from the value or
values of the discrete random variable X as needed. Hence to use the normal
distribution to approximate the probability of obtaining exactly 4 heads (i.e.,
X = 4), we would find the area under the normal curve from X = 3.5 to
X = 4.5, the lower and upper boundaries of 4. Moreover, to determine the
approximate probability of observing at least 4 heads, we would find the area
under the normal curve from X = 3.5 and above since, on a continuum, 3.5 is
the lower boundary of X. Similarly, to determine the approximate probability

94
STA 2200 Probability and Statistics II

of observing at most 4 heads, we would find the area under the normal curve
from X = 4.5 and below since, on a continuum, 4.5 is the upper boundary of
X.

• When using the normal distribution to approximate discrete probability dis-


tribution functions, we see that semantics become important. To determine
the approximate probability of observing fewer than 4 heads, we would find
the area under the normal curve from 3.5 and below; to determine the approx-
imate probability of observing at most 4 heads, we would find the area under
the normal curve from 4.5 and below, since the latter event includes the value
X = 4.

Example . Now consider the binomial distribution in particular. Let H be the


number of heads in 12 flips of a fair coin. We know that the probability of observing
exactly k heads in 12 flips is
P(H = k) = 12 k 12−2 for k = 0, 1, 2, ...12

k (0.5) (0.5)
We also know that the mean of this binomial distribution is given by
m = np = (12)(0.5) = 6
while the standard deviation is given by
p p
sd = np(1 − p)= (12)(0.5)(0.5)=1.732
Say we are interested in the probability of observing between 3 and 5 heads, inclu-
sive; that is, P(3 ≤ H ≤ 5). We can calculate this exactly, of course:
P(3 ≤ H ≤ 5) = P(H = 3)+P(H = 4)+P(H = 5) = 0.05371+0.12085+0.19336 =
0.36792
What about the normal approximation to this value? First, we should check whether
the normal approximation is likely to work very well in this case. A useful rule of
thumb is that the normal approximation should work well enough if both npand
n(1−p) are greater than 5. For this example, both equal 6, so we’re about at the
limit of usefulness of the approximation.
Back to the question at hand. Since H is a binomial random variable, the following
statement (based on the continuity correction) is exactly correct:

P(3 ≤ H ≤ 5) = P(2.5 < H < 5.5).

Note that this statement is not an approximation — it is exactly correct! The reason
for this is that we are adding the events 2.5 < H < 3 and 5 < H < 5.5 to get from

95
STA 2200 Probability and Statistics II

the left side of the equation to the right side of the equation, but for the binomial
random variable, these events have probability zero. The continuity correction
is not where the approximation comes in; that comes when we approximate H using
a normal distribution with mean µ = 6and standard deviation σ = 1.732: thus,

Note that the approximation is only off by about 1%, which is pretty good for such
a small sample size!
E XERCISE 17.  Suppose that a sample of n = 1, 600 tires of the same type are
obtained at random from an ongoing production process in which 8% of all such
tires produced are defective. What is the probability that in such a sample not more
than 150 tires will be defective?
E XERCISE 18.  Based on past experience, 7% of all luncheon vouchers are in
error. If a random sample of 400 vouchers is selected, what is the approximate
probability that (a) exactly 25 are in error? (b) fewer than 25 are in error? (c)
between 20 and 25 (inclusive) are in error?
Example . A particular production process used to manufacture ferrite magnets
used to operate reed switches in electronic meters is known to give 10% defective
magnets on average. If 200 magnets are randomly selected, what is the probability
that the number of defective magnets is between 24 and 30?

96
STA 2200 Probability and Statistics II

E XERCISE 19.  Overbooking of passengers on intercontinental flights is a com-


mon practice among airlines. Aircraft which are capable of carrying 300 passengers
are booked to carry 320 passengers. If 10% of passengers who have a booking fail
to turn up for their flights, what is the probability that at least one passenger who
has a booking, will end up without a seat on a particular flight?

97
STA 2200 Probability and Statistics II

9.6. Learning Activities


1. Let X be a normally distributed variable with mean µand variance σ 2 . Sup-
pose that P(X < 69) = 0.9 and P(X < 74) = 0.95.Determine µand σ
[Ans σ = 13.88 and µ = 51.2]

2. In an examination the average mark was 76.5 and s.d was 9.5 if 15% of the
class second grades A what is the lowest possible grade A mark? Assume the
marks are normally distributed [Ans: The lowest possible score for grade A
was 87. The lowest possible score for grade B is 86]

3. Lengths of metal strip produced by a machine are normally distributed with


mean length of 150cm and standard derivative of 10cm find the probability
that the length of a randomly selected strip is

(a) Shorter than 165cm


(b) Lies between 145 and 155cm

4. The diameter of an electric cable is normally distributed with mean 0.8cm


and variance 0.0004cm2.

(a) What is the probability that the diameter will exceed 0.81cm?
(b) The cable is considered defective if the diameter differs from the mean
by more than 0.025cm. What is the probability of obtaining a defective
cable?

5. A machine packs sugar in what are nominally 2kg bags. However there is a
variation in the actual weight which is described by the normal distribution.

(a) Previous records indicate that the standard deviation of the distribution
is 0.02 kg and the probability that the bag is underweight is 0.01. Find
the mean value of the distribution.
(b) It is hoped that an improvement to the machine will reduce the standard
deviation while allowing it to operate with the same mean value. What
value standard deviation is needed to ensure that the probability that a
bag is underweight is 0.001?

98
STA 2200 Probability and Statistics II

LESSON 10
Statistical Inference

Learning outcomes
Upon completing this topic, you should be able to:

• Define the terms used in statistical inference

• Compute the different confidence intervals of the parameter µ

• Carry out hypothesis test based on either the normal distribution or using the
t-distribution

• Distinguish when to use the z-test or the t-test

99
STA 2200 Probability and Statistics II

10.1. Introduction
We begin by defining some of the terms used in statistical inference

• Statistical Decision: Involves making decisions about the population on the


basis of sample information

• Statistical Hypothesis :

– An assumption made in decision making about the population involved


which may or may not be true OR
– It is a statement about the parameters of a probability distribution OR
– A statement or assertion about the distribution of one or more random
variables.

* Null Hypothesis: Is a statistical hypothesis formulated for the sole


purpose of rejecting /nullifying or failing to reject (accepting it)
It proposes that no statistical significance exists in a set of given
observations. It is usually denoted by Ho e.g. if we want to test
whether a coin is biased, we formulate the null hypothesis that the
coin is fair i.e p = 0.5.
* Alternative Hypothesis: Any hypothesis that differs from the null
hypothesis or that is accepted whenever we reject the .It is usually
denoted by Ha or H1

• Two tailed test: Let θ be a population parameter to be tested, then if the


Ho : θ = θo against Ho : θ 6= θo . Then we say that the test is a two tailed test.

• One tailed test: If we were to test Ho : θ = θo vs. Ho : θ > θo or Ho : θ < θo


. Then we say that the case is one tailed test.

• Significance: If there is a marked difference between the observed results


from the sample and the expected results, then we say the differences are
significant.

• Hypothesis Testing : Procedure which enables us to decide whether (or not


) to reject Ho

100
STA 2200 Probability and Statistics II

• Type I error : This is the probability of rejecting Ho when it is in fact true


(usually denoted by α)

• Type II error : The probability of accepting H1 when it is false (usually


denoted by β )

• Test Statistic: A sample statistic or a value based on the sample.

• Critical region: The set of all the values of the test that would cause us to
reject Ho .

• Critical value: This is the value(s) that separate the critical region from the
accepted region.

• Significance level : The maximum probability with which we would be will-


ing to risk a type I error in hypothesis testing .It is usually denoted by α.

• 100(1 − α) confidence interval : This are interval say [a,b] for which prob-
ability Xε[a, b] is 1 − α. That is p(a < x < b) = 1 − α

To make a test of significance you need to


Set up the null hypothesis
Pick a test statistic
Compute the observed significance level p.
The choice of tests statistic depends on the model and the hypothesis being consid-
ered.
As indicated above, the level of significance is the quantity of risk of the type I error
which we are ready to tolerate in making a decision about H0
It is conventionally chosen as 5% or 1%, where 5% is moderate precision and 1%
is high precision

10.2. Hypothesis Testing


Testing a single sample value
Example . Test at 5% level whether a single sample value 54 has been drawn
from N(65,30), that is whether the mean is less than 65.
Solution:
Let µ and σ 2 be the mean and variance of the population

101
STA 2200 Probability and Statistics II

Therefore, we need to test


Ho : µ = 65
versus
Ho : µ < 65 (this is a one tailed test)
Under Ho : µ = 65 and X ∼ N(65, 30)
The critical region is as shown below

From the standard normal table, Zt = 1.65


Therefore, the test statistic is obtained as
Z = x−µ
σ =
54−65

30
= −2.01
The value −2.01 falls in the rejection region
Conclusion: Reject Ho and conclude that the sample value was not drawn from the
population with mean 65. 

10.3. Steps for classical hypothesis test


The procedure consists of five steps the first four of which are completed before the
data to be used for the test are gathered and relate to probabilistic calculation that
set up the statistical inference process.

Step 1
The first step in hypothesis testing procedure is to declare the relevant hypothesis,
Ho and the relevant alternative hypothesis H1 (before data are seen) Our aim is to
eventually to “accept” or “reject” the null hypothesis as a result of an objective
statistical procedure.

Step 2
Since the decision to “accept” or “reject “ Ho will be made on the basis of data
derived from some random process it is possible that an incorrect decision will be
made, that is to reject Ho when it is indeed true ( a Type I error) or to accept , when it
is false ( Type II error). Hence in hypothesis testing we ensure that the probability
of the both errors is arbitrarily small unless we are able to make the number of

102
STA 2200 Probability and Statistics II

observations as large as we please. A frequently adopted procedure is to focus on


Type I error and fix the numerical value of α of this error at some acceptably low
level .Usually 1% or 5%, and not to attempt to control the numerical value of the
Type II error (simply put, in step 2 we are choosing the numerical value of α).

Step 3
This step consist of determining a test statistic. This is the quantity calculated from
the data whose numerical value leads to “acceptance or rejection” of, Ho

Step 4
This step consist of determining those observed values of the test statistic that lead
to rejection of Ho . Choice is made so as to ensure that the test has numerical value
for the Type I error when chosen in Step 2

Step 5
The final step in hypothesis test procedure is to obtain the data, and to determine
whether the observed value of the test statistic is equal to or more extreme than the
significance point calculated in Step 4 and to reject Ho if it is, else fail to reject Ho
.
Example . Jam is produced in tins labeled 1 kg . The machine fitting the tins
is set to give a mean of 1030gms and s.d of 16gms . What is the probability that
a customer buys a tin which in (a) <982gms (b) <1000gms (c) Test at 1% level
whether a tin of 1000gms has been previously produced assuming the weights of
the Jam tins is normally distributed.
Solution:

X ∼ N(1030, 162 )
=⇒ Z ∼ N(0, 1) where Z = x−103016
Therefore
(a) P(X < 982) = P(Z < 982−1030
16 ) = P(Z < −3)
P(Z < −3) = 1 − Φ(3) = 1 − 0.999 = 0.001
(b) P(X < 1000) = P(Z < 1000−1030
16 ) = P(Z < −1.88)
P(Z < −1.88) = 1 − Φ(31.88) = 1 − 0.970 = 0.030
(c) Hypothesis test, we need

103
STA 2200 Probability and Statistics II

Ho : µ = 1030
versus
Ho : µ < 1030 (this is a one tailed test)
Since, α = 0.01, Zt = −2.33

Test statistic
Zc = 1000−1030
16 = −1.88
=⇒ Since Zc > −2.33 
Decision and conclusion
We do not reject Ho and conclude that the production of tins has been accurate.
E XERCISE 20.  From past record it is known that 20% of a certain seedling will
survive and grow into strong trees. In a batch of 400 seedlings planted on a planting
day, only 60 survived. Is this poor survival rate? Use 1% level of significance.
E XERCISE 21.  A six sided die is rolled 120 times only nine 4’s appeared when
the score of the upper most face of the die was recorded. Is there evidence to suggest
bias in the die (use α = 0.05)

10.4. Central Limit Theorem


Let X1 , X2 , X3 , ....Xn be independent random variables which are identically dis-
tributed (i.e. they all have the same probability densities) and have finite mean
µ and variance σ 2 . Then
Sn = X1 + X2 + ... + Xn (for n = 1, 2, 3...)
´ b −Z 2 /2
limn→∞ P(a ≤ Sσn/−nµ √ ≤ b)= √1
n 2π a
e dz
That is Sn is asymptotically normal.

In simple terms:
1. For large samples ( n ≥ 30) from any population with mean µ and variance
σ 2 the sample mean X̄ ∼ N(µ, σ 2 /n)

2. For a sample of any size n taken from a N(µ, σ 2 ) population, the sample
mean X̄ ∼ N(µ, σ 2 /n)

104
STA 2200 Probability and Statistics II

10.5. Estimation of µ and σ based on a sample of size n


• The sample mean X̄ is the unbiased estimate of the population mean µ.
Therefore we use it whenever µ is unknown i.e. µ̂ = X̄
nS2
• The unbiased estimate of the population variance σˆ2 = n−1 wherever S2 is the
sample variance. Therefore we use it whenever σ 2 is unknown. However,
n
if n ≥ 30, n−1 ≈ 1. Therefore σˆ2 ≈ S2 . These estimates are called point
estimates.

Definitions

• Point Estimates - A point estimate is a single value estimate of a population


parameter calculated from sample data.

• Interval Estimates - When we obtain a point estimate for a parameter we


only have a single value the value in and of itself says nothing about how
accurate or precise the estimate is. Interval estimates provide an alternative
method for estimating parameter by providing a probable range of values the
parameter is likely to take.

• Confidence level: The probability that an interval estimate encloses the pop-
ulation parameter expressed as a percentage. For large samples 100(1- α)%
confidence intervals for µ is given by X̄ ± Zα/2 ( √σn ). For n ≥ 30, this interval
is approximately X̄ ± Zα/2 ( √Sn )

Example . If the population mean µ is unknown, but the variance σ 2 is known,


determine the 95% confidence interval (CI) for µ based on a large sample with
mean X̄.
Solution

105
STA 2200 Probability and Statistics II

10.6. Students t-Distribution


We have discussed the normal distribution and observed that so we need µ and σ
to define it.
z = X−µ
σ/ n is a normal variate with mean = 0 and variance = 1 .i.e z ∼ N(0, 1)

In practice σ is not known and in such a case, the only option is use S sample
estimate
√ of standard deviation
n(X−µ )
S is approximately normal if n is large.

n(X−µ )
If n is not large then S is distributed as t.
X−µ 1
2
t = S/√n where S2 = n−1 ∑ Xi − X
t is widely used and the distribution of t is called t−distribution
The density function of variable t with k = n − 1 degrees of freedom is

1 1
f (t) = √ 1 k 
  k+1 , −∞ < t < ∞
kB 2, 2 t2
1+ k
2

Degrees of freedom are the number of independent observations in a set of obser-


vations.
Γ1ΓK ΓπΓ k
where B 12 , 2k = Γ 2K+12 = Γ K+12

( 2 ) ( 2 )

10.7. Properties of the t-distribution


• It is bell shaped just like the normal with its tails a bit higher than the normal.

106
STA 2200 Probability and Statistics II

• It is uni-modal

• The probability distribution symmetrical about t = 0

• It tends to normal as k increases.

10.8. Hypothesis test (t-Test)


Student t is the deviation of estimated mean from its population mean expressed in
standard deviation
E.g if we want to test H0 : µ = µ0 vs H1 : µ 6= µ0
µ0 is an assumed value considered to fit µ

X − µ0
tn−1 =
s √ 
n X − µ0
= tn−1 =
s s
 n (n − 1)
= tn−1 = X − µ0 2
∑ X −X

Whatever value we get we compare it with a tabulated value (from the t-table).
If the calculated value is greater than the tabulated, we reject the null hypothesis.
Example . The life expectancy of people in the year 1970 in Brazil was expected
to be 50 years. A survey was conducted in 11 regions of Brazil and the following
data obtained. Do the data confirm the expected view? Life expectancy (yrs): 54.2,
50.4, 44.2, 49.7, 55.4, 57.0, 58.2, 56.6, 61.9, 57.5, 53.4
Solution
We wish to test
H0 : µ = 50
vs
HA : µ 6= 50
The test statistic

107
STA 2200 Probability and Statistics II

√ 
n X − µ0
tn−1 =
s
54.2+50.4+. . . +53.4
X=
11
598.5
= = 54.41
11

1 2
s2 = ∑ n−1 X −X
1
32799.91 − 54.412

=
10
= 23.607

S2 =
√ 23.607 = 4.859
11 (54.41 − 50)
t=
4.859
= 3.01

From tables value of t − distribution at α = 5% and 10 d. f is 2.228


Since t calculated =3.01>t tabulated=2.228 we reject H0 .
This means that the life expectancy is more than 50 years.

Example . A bleacher claims that his variety of Cotton contains at most 40% lint
in seed cotton. Eighteen (18) samples of 100 grams each were take n and after gin-
36.3 37.0 36.6 37.5 37.5 37.9
ning the following quantity of lint was found in each sample.
38.5 37.9 38.8 37.5 37.1 37.0
1
Check the bleacher’s claim.Use 100 level of significance
Solution: We wish to test
H0 : µ = 40 vs HA : µ < 50
The test statistic

108
STA 2200 Probability and Statistics II

√ 
n X − µ0
tn−1 =
s
36.3+37.0+36.6+. . . +36.7+35.7
X=
18
669.7
= = 37.206
18 (  2 )
1 ∑ Xi
S2 = X2 −
n−1 ∑ i n

1 2
s2 = ∑ n−1 X −X
1
= (27.33 − 37.206)
17
= 0.633

S2 =
√ 0.633 = 0.796
18 (37.206 − 40)
t=
0.796
= −14.49

From tables value of t at α = 1% and 17 d. f is 2.567


Since t calculated =−14.49 < t tabulated=2.567 we do not reject H0 . This means
that the average percentage of lint in this cotton contains less than 40%. 

109
STA 2200 Probability and Statistics II

10.9. Revision Questions


1. Research in an area shows that an inherent Freshian bull ( which served a
large number of Hereford cows ) possesses a dominant gene with respect to
the color of the skin of the resulting calves. Research shows that 75% of
the calve are Freshian. During the last 2 years , the Freshian bull served
30 Hereford cows on JKUAT farm resulting in 40 calves of which 35 are
Freshian. Is there any reason to suspect that the proportion of Freshian calve
has changed? (Use α = 2.5%)

2. Kazeze is as student at the Zimbabwe national university .he is one of the 2


candidates for ZUSO chairman. He claims that more than half of the students
will vote for him some friends took a random sample of 120 students and
only 53 students stated that they will vote for him. Does Kazeze claim still
hold in the light of the findings of the sample? What advice would you give
him? Use α = 5%

3. A sample of size 80 has a mean of 600gms and s.d of 5gms .find the 99% for
the mean of the population.

4. In a factory the distribution of bolts produced by a machine has a known s.d


of 0.2cm how large should a sample be in order that the manager is 98%
confident that the absolute error in estimating population mean diameter will
be < 0.05cm

110
STA 2200 Probability and Statistics II

Solutions to Exercises
Exercise 1. (a) ∑ P(W = w) = 1,=⇒ 0.1+0.25+0.3+0.15+d = 1. Thus solving
all w
for d, we have d = 0.2
(b) P(−3 ≤ W ≤ 0) = P(W = −3) + P(W = −2) + P(W = −1) + P(W = 0)
=⇒P(−3 ≤ W ≤ 0) = 0.1 + 0.25 + 0.3 + 0.5 = 0.8
or simply
P(−3 ≤ W ≤ 0) = 1 − P(W = 1) = 1 − d = 0.8
(c) P(W > −1) = P(W = 0) + P(W = 1)
=⇒ P(W > −1) = 0.15 + 0.2 = 0.35
(d) P(−1 < W < 1) = P(W = 0) = 0.15 Note: Check the value that is satisfied
by the range given in the question. Its only the value W = 0
(e) The mode is always the value with the highest frequency. In probability, the
mode is the value with the highest probability. this for this case, the mode is the
value corresponding to the largest probability i.e. 0.3 from the table. Hence our
mode is W = −1
Exercise 1
Exercise 2. Let us consider a table to show the probabilities associated with each
of the X values
x 1 2 3 4 5 6
P(X = x) C 2C 3C 4C 5C 6C
Solving
∑ P(X = x) = 1
=⇒ C + 2C + 3C + 4C + 5C + 6C = 1
1
21C = 1 =⇒ C = 21
Next,
P(X < 4) = P(X = 1) + P(X = 2) + P(X = 3)
1 2 3 6
= 21 + 21 + 21 = 21
Similarly, we can obtain P(3 ≤ X < 6) as follows;
P(3 ≤ X < 6) = P(X = 3) + P(X = 4) + P(X = 5)
3 4 5 12
=⇒P(3 ≤ X < 6) = 21 + 21 + 21 = 21
Exercise 2
´3 ´1 ´2 ´3
Exercise 3. (a) Since f (x) is a pd f then 0 f (x)dx = 0 axdx + 1 adx + 2 (−ax +
3a)dx = 1

111
STA 2200 Probability and Statistics II

2 2
=⇒ ax2 |10 + ax|21 + − ax2 | + 3ax|32 = 1
a 5
2 + a − 2 a + 3a = 1
2a = 1
=⇒ a = 12
(b) p(X ≤ 1.5)
ˆ 1.5 ˆ 1 ˆ 1.5
f (x)dx = axdx + adx
0 0 1
ˆ 1 ˆ 1.5
1 1
= xdx + dx
2 0 2 1

1 x2 1 1 1.5
= | + x|
22 0 2 1
1 1 3
= + ( − 1)
4 2 2

1 1
= + = 0.5
4 4

Exercise 3
Exercise 4. See Lesson 1, Example 2 for the experiment.
We have the following possible outcomes
P0 = 1/16, P1 = 4/16, P2 = 6/16, P3 = 4/16, P4 = 1/16
Now, assuming that X represents the number of heads. This implies that when he
gets 4 heads, we have P4 and for four tails, we have P0 . Both have a probability of
1
16 . The rest of the outcomes where he gets 1,2 and 3 heads have their respective
probabilities. We can then represent the information as follows in a table.
No. Heads 0 1 2 3 4
X 500 −150 −150 −150 500
1 4 6 4 1
P(X = x) 16 1616 16 16
1 4 1
E(X) = ∑ xP(X = x) = 500 ∗ ( 16 ) + (−150 ∗ 16 ) + ... + 500 ∗ 16
= − 1100 275
16 = − 4
Therefore, our conclusion would be; we wouldn’t advise him to play the game since
he expects on average to loose − 275
4
Note that, while solving the problem, we took 4 tails to be represented as Zero(0)
heads.

112
STA 2200 Probability and Statistics II

Exercise 4
Exercise 5.

1st factorial moment= E(X)


2nd factorial moment=E[X(X − 1)]
3rd factorial moment=E[X(X − 1)(X − 2)]
Therefore,
x 1 2 3 4 5 6
1 1 1 1 1 1
P(X=x) 6 6 6 6 6 6
X-1 0 1 2 3 4 5
X(X-1) 0 2 6 12 20 30
X-2 -1 0 1 2 3 4
x(X-1)(X-2) 0 0 6 24 60 120
Thus
1st Factorial Moment

E(X) = ∑ P(X = x)
all x
1 1 1 1 1 1
= 1( ) + 2( ) + 3( ) + 4( ) + 5( ) + 6( )
6 6 6 6 6 6
7
=
2

2nd Factorial moment

E[X(X − 1)] = ∑ X(X − 1)P(X = x)


all x
1 1 1 1 1 1
= 0( ) + 2( ) + 6( ) + 12( ) + 20( ) + 30( )
6 6 6 6 6 6
70
=
6

3rd Factorial moment

113
STA 2200 Probability and Statistics II

E[X(X − 1)(X − 2)] = ∑ X(X − 1)(X − 2)P(X = x)


all x
1 1 1 1 1 1
= 0( ) + 0( ) + 6( ) + 24( ) + 60( ) + 120( )
6 6 6 6 6 6
210
=
6

Exercise 5
Exercise 6.

By definition

114
STA 2200 Probability and Statistics II

ˆ ∞
mx (t) = etx f (x)dx
−∞
ˆ ∞
= xλ e−λ x dx
ˆ−∞

= λ e(t−λ )x dx
ˆ−∞

= λ e−(λ −t)x dx
−∞
λ
= e−(λ −t)x |∞
0
−(λ − t)
λ  
= e−(λ −t)∞ − e−(λ −t)0
−(λ − t)
λ
e−∞ − e0

=
−(λ − t)
 
λ 1
= −1
−(λ − t) e∞
λ
=
(λ − t)
= λ (λ − t)−1
E(X) = m0x (t)|0
λ
=
(λ − t)2
1
=
λ
Var(X) = E(x2 ) − (E(x))2
2
Var(X) = m00x (t)|0 − m0x (t)|0
 
00 d λ
mx (t) =
dt (λ − t)2

=
(λ − t)3

m00x (t)|0 =
(λ )3
2
= 2
λ
 2
2 1
Var(X) = 2 −
λ λ
1
= 2
λ 115
STA 2200 Probability and Statistics II

´∞
We need to verify that E(X) = −∞ xλ e−λ x dx = λ1 and
´∞ ´ 2
var(X) = −∞ x2 λ e−λ x dx − −∞ xλ e−λ x dx = λ12

´∞
E(X) = λ −∞ xe−λ x dx
´
Using integration by parts udv = uv − vdu
Let u = x and dv = e−λ x dx
−λ x
=⇒ du = dx and v = −eλ

ˆ ∞
E(x) = λ xe−λ x dx
−∞
ˆ ∞ −λ x !
xe−λ x e
=λ + dx
λ −∞ λ
" #∞ !
xe−λ x e−λ x
=λ + 2
λ λ
0
" # ∞
h
−λ x
i∞ e−λ x
= −xe −
0 λ
0
( ∞1 −−1)
1
= − 0−
λ
 1
1
=
λ
1
=
λ

Next
 2
1
2
var(x) = E(x ) −
λ
ˆ ∞
2
but E(X ) = x2 f (x)dx
−∞
ˆ ∞
= x2 λ e−λ x dx
−∞
ˆ ∞
=λ x2 e−λ x dx
−∞

Let u = x2 and dv = e−λ x dx


−λ x
=⇒ du = 2xdx and v = −eλ

116
STA 2200 Probability and Statistics II

ˆ ∞
2
E(x ) = λ x2 e−λ x dx
−∞
ˆ !
−x2 e−λ x 2 ∞ −λ x
=λ + xe dx
λ λ −∞
" !#∞ !
−x2 e−λ x 2 −xe−λ x e−λ x
=λ + − 2
λ λ λ λ
" #∞ " !#0∞
2
−x e −λ x 2 −xe −λ x e−λ x
= − − 2
λ λ λ λ
0 0
h i∞ 2 h i∞ 2 h i∞
= −x2 e−λ x + −xe−λ x + 2 −e−λ x
0
 λ  0 λ 0
2 1
= 0 + 0 + 2 −( ∞ − 1)
λ e
2
= 2
λ

∴ var(x) = E(x2 ) − (E(x))2


 2
2 1
= 2−
λ λ
1
= 2
λ

Exercise 6
Exercise 7.
:
Let X be the number of heads in n = 10 tosses, and p = 0.5
P(X = k) = nk pk (1 − p)n−k


(a) P(X = 5) = 10
 5 5
5 0.5 0.5 = 0.2461
(b) P(3tails) = P(X = 7) = 10 7 3
7 0.5 0.5 = 0.117
(c) P(X ≥ 3) = 1 − [P(X = 0) + P(X = 1) + P(X = 2)]
1 − [ 10 10 10
 0 10 1 9 2 8
0 0.5 0.5 + 1 0.5 0.5 + 2 0.5 0.5 ] = 0.9453 Exercise 7
Exercise 8. :
The given mg f is for a binomial distribution.

117
STA 2200 Probability and Statistics II

Therefore, we have n = 4 and p = 0.8


Thus, solving the required probability,
We have P(X = 2) = 42 (0.8)2 (0.2)2 = 0.1536

Exercise 8
Exercise 9.
p = 3/100 = 0.03
n = 50
p = 0.03 is the probability of the event of interest. We then need to find the mean,
which is obtained using the relationship shown from the binomial distribution as
λ = np. Therefore;
λ = np = 0.03 × 50 = 1.5
Let the random variable X be the number of patients that require rooms with specific
facilities

X ∼ Poi (1.5)

e−1.5 (1.5)x
P (X = x) = , x = 0, 1, 2 . . .
x!
P (X > 3) = 1 − [P (X = 0) + P (X = 1) + P (X = 2) + P (X = 3)]
" #
e−1.5 (1.5)0 e−1.5 (1.5)1 e−1.5 (1.5)2 e−1.5 (1.5)3
= 1− + + +
0! 1! 2! 3!
(1.5)2 (1.5)3
  
= 1 − e−1.5 1 + 1.5 + +
2! 3!
= 1 − (0.2231 × 4.1879)
= 1 − 0.934
= 0.066

Note here that we have used the compliment to obtain the probability.
Exercise 9
Exercise 10. Let X be the number of broken eggs in a box of 500.
P(no.o f broken eggs) = 0.007, soX B(500)
E(X) = np = 500 ∗ 0.007 = 3.5
Since n > 50 and p < 0.1, we can use Poisson approximate, i.e X Poi(3.5)
−3.5 x
(a) P(X = 3) = e 3!(3.5) = 0.22

118
STA 2200 Probability and Statistics II

(b) P(X ≥ 2) = 1 − [P(x = 0) + P(x = 1)]


1 − (e−3.5 ) + 3.53 ) = 0.06 Exercise 10
Exercise 11. Let the random variable X be the number of lockers with cigarettes

k N−k
x n−x
p(X = x) = N
n
N = 20, k = 6, N − k = 14, n = 10
3 6 14 
10−x
x
p(X ≥ 3) = ∑ 20
x=0 10
10 90  10 90  10 90 
0 8−0
= 100
+ 1 1008−1
 + 2 8−2
100
8 8 8
= 0.686

Exercise 11
Exercise 12.
Let Y be the time when the call comes in
Y ∼ Uni f (0, 5), since the time interval she call shall be 12.00 am to 5.00 am
Thus, 
1 , 1≤y≤5
F(y) = 5
0 , elsewhere
In 5 hours, the center is opened as follows
12.00 to 1.00am, that is 0 → 1hour, then clossed for 2 hours
Center is again opened from 3.00 to 4.00am, the next 1 hour
Thus, we have
P(0 < y < 1) + P(3 < y < 4) = 51 + 15 = 25
Hence, the probability that she will find the center open is 25 or 0.4 Exercise 12
Exercise 13.

Consider the following graph

119
STA 2200 Probability and Statistics II

If we look at the graph of the function Y = (1−X)3 , we might note that (1) the
function is a decreasing function of X, and (2) 0 < y < 1.
That noted, let’s now use the distribution function technique to find the pd f of Y .
First, we find the cumulative distribution function of Y :
3 1
FY (y) = P(Y ≤ y) = P((1 − x) ≤ y) = P(1 − x ≤ y 3 )
1 1 1
P(−x ≤ −1 + y 3 ) = P(x ≥ 1 − y 3 ) = 1 − Fx (1 − y 3 )
´ (1−y 31 ) 1−y 1
= 1− ◦ 3(1 − t)2 dt = 1 + [(1 − t)3 ]o 3
1
= 1 + [(1 − (1 − y 3 ))3 − (1 − 0)3 ]
= 1+y−1 = y
Having shown that the cumulative distribution function of Y is:
FY (y) = y
for 0 < y < 1, we now just need to differentiate F(y) to get the probability density
function f (y). Doing so, we get:
fY (y) = F´Y (y) = 1 for 0 < y < 1.
That is, Y ∼ Uni f (0, 1) random variable. (Again, you might find it reassuring to
verify that f(y) does indeed integrate to 1 over the support of y.) Exercise 13
Exercise 14.
Solution of the problem is as follows

120
STA 2200 Probability and Statistics II

Exercise 14
Exercise 15.
Note that the function: Y = (1−X)3 defined over the interval 0 < x < 1 is an invert-
ible function.
The inverse function is:
x = v(y) = 1 − y1/3
for 0 < y < 1. (That range is because, when x = 0, y = 1; and when x = 1, y = 0).
Now, taking the derivative of v(y), we get:
1
v0 (y) = − y−2/3
3
Therefore, the change-of-variable technique:
fY (y) = fX (v(y)) × |v0 (y)|
tells us that the probability density function of Y is:
1 1
fY (y) = 3[1 − (1 − y1/3 )]2 · | − y−2/3 | = 3y2/3 · y−2/3
3 3
And, simplifying we get that the probability density function of Y is:
fY (y) = 1
for 0 < y < 1. Again, we shouldn’t be surprised by this result, as it is the same result
that we obtained using the distribution function technique. Exercise 15
Exercise 16.
Let the random variable X=time required to do the job.

121
STA 2200 Probability and Statistics II

(a) What is required is


 
75 − 55
P X>
10
P(Z < 2)
= 1 − 0.977
= 0.0228

(b) We need to compute

60 − 55
P(X < 60) ≡
10
= P(Z < 0.5) = 0.6915

(c) P(45 ≤ X ≤ 60)

 
45 − 55 60 − 55
≡ P ≤Z≤
10 10
= P(−1 ≤ Z ≤ 0.5)
= P(Z ≤ 0.5) − P(Z < −1)
= P(Z ≤ 0.5) − [1 − P(Z < 1)]
= 0.6915 − 0.1587

The solution is therefore 0.5328. Exercise 16


Exercise 17.

We approximate the B(1600, 0.08) random variable T with a normal, with mean
p
(1600)(0.08) = 128 and standard deviation (1600)(0.08(0.92) = 10.85. The
probability calculation is thus

Exercise 17
Exercise 18.
We approximate the B(400, 0.07) random variable V with a normal, with mean

122
STA 2200 Probability and Statistics II
p
(400)(0.07) = 28 and standard deviation (400)(0.07)(0.93) = 5.103. The prob-
ability calculations are thus
(a) P(V = 25) = P(24.5 < V < 25.5)
≈ P( 24.5−28 25.5−28
5.103 < Z < 5.103 )
= P(−0.69 < Z < −0.49)
= 0.3121 − 0.2451 = 0.0670
(b) P(V < 25) = P(V < 24.5)
≈ P(Z < 24.5−28
5.103 )
= P(Z < −0.69)
= 0.2451
(c) P(20 ≤ V ≤ 25) = P(19.5 < V < 25.5)
≈ P( 19.5−28 25.5−28
5.103 < Z < 5.103 )
= P(−.67 < Z < −0.49)
= 0.3121 − 0.0475 = 0.2646
Exercise 18
Exercise 19. Solution of the problem is as follows

Note further how we have been changing the variables (e.g from X to Y) when
approximating the probability. For instance P(X ≥ 300) ≈ P(Y > 300.5). The
change is useful for our understanding of the concept illustrated. Exercise 19
Exercise 20.

123
STA 2200 Probability and Statistics II

The number of seedlings is distributed as X ∼ B(400, 0.2); Implying normal ap-


proximation would be X ∼ N(80, 64) and X = 60 =⇒ 59.5 ≤ x ≤ 60.5
Hypothesis
Ho : p = 0.2
Vs
H1 : p < 0.2
Rejection region

Zt = −2.33
=⇒Reject Ho if Zc < −2.33
Test statistic
60.5−80
Zc = x−80
8 = 8 = −2.44
Since Zc < Zt , we therefore reject Ho and conclude that there is a significantly poor
germination rate.
Note: In hypothesis the question asked should help us know whether to use a one-
tailed or 2- tailed test!!
Exercise 20
Exercise 21.

Hypothesis
Ho : p = 61
Vs
H1 : p < 16
Under Ho ,X ∼ B(120, 16 ). We can now apply the normal approximations to binomial
=⇒ X ∼ N(20, 100 x−20
6 ) =⇒ Z = 10/ 6 ∼ N(0, 1)

X = 9 =⇒ 8.5 ≤ x ≤ 9.5
Since our interest is in less than, we take x ≤ 9.5
Therefore, rejection region
Zt = −1.65=⇒reject Ho if Zc < −1.65

124
STA 2200 Probability and Statistics II

Test statistic
Zc = 9.5−20
√ ∼ −2.57 =⇒Zc < Zt = −1.65
10/ 6
Hence, reject Ho
There is significant evidence that the die is biased to give too few fours.
Exercise 21

125

You might also like