Sta 2200 Probability & Statistics II (Course Outline With Notes)
Sta 2200 Probability & Statistics II (Course Outline With Notes)
Course Purpose
At the end of the course, students will be able to handle probability distributions
of both discrete and continuous random variable, in addition perform some simple
hypothesis tests.
Course description
1. Random variables: discrete and continuous, probability mass, density and
distribution functions, expectation, variance, percentiles and mode. Moments
and moment generating function. Moment generating function and transfor-
mation of variable technique for univariate distribution.
Learning outcomes
At the end of the course the student should be able to:
4. Define basic discrete and continuous distributions, be able to apply them and
simulate them in simple cases.
ii
Instruction methodology
• Online Tutorials
• Case studies
• Self Reading
• Discussions
Core References:
1. Uppal, S. M. , Odhiambo, R. O. & Humphreys, H. M. Introduction to Proba-
bility and Statistics. JKUAT Press, 2005, ISBN 9966923950
Additional Reference:
• RV Hogg, JW McKean & AT Craig Introduction to Mathematical Statistics,
6th ed., Prentice Hall, 2003 ISBN 0-13-177698-3
Assessment information
The module will be assessed as follows;
• 20% of marks from one written CAT to be administered at JKUAT main cam-
pus or one of the approved centres
iii
Contents
1 Gentle Introduction 1
1.1 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Definitions and Motivational examples . . . . . . . . . . . . . . . . 2
iv
CONTENTS CONTENTS
v
CONTENTS CONTENTS
9 Normal Distribution 85
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
9.2 Use of Standard Normal Tables . . . . . . . . . . . . . . . . . . . . 88
9.3 Properties of the Normal distribution . . . . . . . . . . . . . . . . . 89
9.4 The mg f of a Normal distribution . . . . . . . . . . . . . . . . . . 90
9.5 Normal Approximation to the Binomial . . . . . . . . . . . . . . . 93
9.6 Learning Activities . . . . . . . . . . . . . . . . . . . . . . . . . . 98
10 Statistical Inference 99
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
10.2 Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . 101
10.3 Steps for classical hypothesis test . . . . . . . . . . . . . . . . . . 102
10.4 Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . 104
10.5 Estimation of µ and σ based on a sample of size n . . . . . . . . . . 105
10.6 Students t-Distribution . . . . . . . . . . . . . . . . . . . . . . . . 106
10.7 Properties of the t-distribution . . . . . . . . . . . . . . . . . . . . 106
10.8 Hypothesis test (t-Test) . . . . . . . . . . . . . . . . . . . . . . . . 107
10.9 Revision Questions . . . . . . . . . . . . . . . . . . . . . . . . . . 110
Solutions to Exercises . . . . . . . . . . . . . . . . . . . . . . . . . 111
vi
STA 2200 Probability and Statistics II
LESSON 1
Gentle Introduction
Learning outcomes
Upon completion of this section, you should be able to:
• Be familiar with some of the terminologies that shall be used in this course
1
STA 2200 Probability and Statistics II
• It is a discrete variable
We recall from Probability and Statistics I that a variable X was said to be discrete
wherever it takes;
Either;
• A finite set of distinct values, for instance the set of single digit whole num-
bers S = 0, 1, . . . .9
OR
• A countable infinite set of values, for instance the set of natural numbers
N = 1, 2, . . . ..
Therefore, a discrete random variable is a quantity which may take up any value
within a discrete set of numbers having a given probability which may vary accord-
ing to the value of the variable. To specify the probability distribution of a discrete
2
STA 2200 Probability and Statistics II
random variable, we need to know both the set of values of the random variable and
the probability associated with each of these values. The probability distribution of
a discrete random variable say(Y ) may be specified using a table or a formula.
NOTE: Random variables are denoted by capital letters while the particular (spec-
ified) values are denoted by lowercase letters.
Probability Distribution
1. 0 ≤ P(X) ≤ 1
2. ∑ P(X) = 1
all x
1. A table
2. Formula
Example . The table below illustrate how we can specify a probability distribu-
tion
y 1 2 3 4
P(Y=y) 0.2 0.5 02 01
From this table is the probability that e.g. P(Y = 2) = P(2) = P2 = 0.5
On the other hand, we can define a probability distribution using a formula as fol-
lows:
P(X = x) = p(x) = 10 x 10−x , for x = 0, 1, 2, ..10
x (0.6) (0.4)
We recall that for a random variable X with particular values of x that the vari-
able may take (the variate), and with associated probability P(X = x) = p(x) and
∑ P(X) = 1
all x
Example . A balanced coin is tossed 4 times. Show that the number of tails
realized is a random variable.
Solution
3
STA 2200 Probability and Statistics II
Let X be the number of tails realized. This implies that X takes the values 0, 1, 2, 3, 4.
The number of possible values/outcomes is 2n = 24 = 16. This is because there
are two possible outcomes in any single trial and the experiment is carried out four
times, hence the power of the number of outcomes to obtain the total sample space.
Consider the following table illustrating the possible outcomes in the four tosses
of a coin. A single toss of a coin will give H,T with two tosses of a coin leads to
HH,HT,TH and TT. These are the two different tosses of the coin twice that we are
using to illustrate the four tosses of a coin as shown in the table (indicated in bold).
First two tosses
HH HT TH TT
HH HHHH HHHT HHTH HHTT
Next two tosses HT HHHT HTHT HTTH HTTT
TH THHH THHT THTH THTT
TT TTHH TTHT TTTH TTTT
Therefore, suppose we denote Zero tails with the subscript 0, one tail with subscript
1 and so on, then the probabilities will be denoted as
P0 = 1/16, P1 = 4/16, P2 = 6/16, P3 = 4/16, P4 = 1/16
Hence, looking at these probabilities, ∑ Pi = 1 implying that X is indeed a random
all i
variable.
Solution
Consider the following table illustrating the outcome of two dice rolled ones/together.
In each cell, we have for instance 3,2 (indicated in bold) meaning we get a 3 in Die
1 and a 2 in Die 2
4
STA 2200 Probability and Statistics II
Die 2
1 2 3 4 5 6
1 1,1 1,2 1,3 1,4 1,5 1,6
2 2,1 2,2 2,3 2,4 2,5 2,6
Die 1 3 3,1 3,2 3,3 3,4 3,5 3,6
4 4,1 4,2 4,3 4,4 4,5 4,6
5 5,1 5,2 5,3 5,4 5,5 5,6
6 6,1 6,2 6,3 6,4 6,5 6,6
Since the random variable X represents the number of fours, it implies that from
this table, we can get ;
x 0 1 2
25 10 1
P(X=x) 36 36 36
2 1 x 5 2−x
P(X = x) = x (6) (6) , for x = 0, 1, 2
Example . Two tetrahedral dice are rolled and the sum of the scores facing up
noted. Find the Probability Mass Function (pm f ) of the random variable, the sum
facing up.
Solution
Before solving this problem, let us first define a pm f .
Definition
5
STA 2200 Probability and Statistics II
1 2 3 4
1 2 3 4 5
2 3 4 5 6
3 4 5 6 7
4 5 6 7 8
From the table, we see that the sums can be {2, 3, 4, 5, 6, 7, 8}, therefore the random
variable X = {2, 3, 4, 5, 6, 7, 8}
The pm f is therefore given by
x 2 3 4 5 6 7 8
1 2 3 4 3 2 1
P(X = x) 16 16 16 16 16 16 16
x−1
P(X = x) = , f or x = 2, 3, 4, 5
16
9−x
P(X = x) = , f or x = 6, 7, 8
16
or simple as
x−1 , f or x = 2, 3, 4, 5
16
P(X = x) =
9−x , f or x = 6, 7, 8
16
6
STA 2200 Probability and Statistics II
∑ P(X = x) = 1,
all x
This implies that 0 +C + 4C + 9C + 16C = 1
So, how do we get these values above?
Since P(X = x) = Cx2 , it implies that
when X = 0, P(X = 0) = C ∗ 02 = 0.
Similarly, when X = 1, P(X = 1) = C ∗ 12 = C and so on till we reach
X = 4, P(X = 4) = C ∗ 42 = 16C
1
Therefore, 30C = 1 =⇒ C = 30
Learning Activities
Given a fair coin, perform an experiment and then define a variable from the results
of your experiment. Show that the variable defined is a random variable. If the
variable is a random variable, then find the pm f of the random variable. Carry out
a similar exercise described in part (1) above with an unfair coin. Carry out the
exercise in part (1) using a six sided die and a four sided die
Summary
In this lesson, we have focused mainly on the definitions of the random variables-
both discrete and continuous. We have also tried to show what a probability dis-
tribution is and conditions necessary for a random variable to have a probability
distribution. The lesson has mainly focused on the discrete random variables.
Revision Questions
List the following sets; N denotes the set of natural numbers while Z denotes the
set of integers.
E XERCISE 1. A discrete random variable has a PMF shown in the following
table;
w -3 -2 -1 0 1
P(W = w) 0.1 0.25 0.3 0.15 d
Find; (a) The value of d (b) P(−3 ≤ W ≤ 0) (c) P(W > −1) (d) P(−1 < W < 1)
(e) The mode
E XERCISE 2. The pm f of a discrete random variable is given by Cx = P(X = x)
for x = 1, 2, 3, 4, 5, 6. Find the value of C and hence find P(X < 4) and P(3 ≤ X < 6)
7
STA 2200 Probability and Statistics II
Assignment :
2x
1. Verify that P(X = x) = k(k+1) f or x = 1, 2, 3, .., k can serve as a PMF of
a random variable X
8
STA 2200 Probability and Statistics II
LESSON 2
Random Variables Continued
Learning outcomes
Upon completing this topic, you should be able to:
9
STA 2200 Probability and Statistics II
For instance
1. f (x) ≥ 0 f or all x
´b
2. P(a ≤ X ≤ b) = a f (x)dx
´∞
3. −∞ f (x)dx = 1
1. p(x ≤ a) = p(x ≥ a)
10
STA 2200 Probability and Statistics II
ˆ
´a 1
= 0 f (x)dx = f (x)dx
a
ˆ
´a 1
= 0 3x2 dx = 3x2 dx
0
3 1
= x3 |a0 = x |a
= a3 − 0 = 1 − a3
= 2a3 = 1
1
= a = 0.5 3
ˆ 1
= f (x)dx = 0.5
b
ˆ 1
= = 3x2 dx = 0.5
b
3 1
= = x |b = 0.5
= b3 =⇒ = 1 − b3 = 0.5
= b3 = 0.95
1
= b = 0.95 3
= 0.983
11
STA 2200 Probability and Statistics II
12
STA 2200 Probability and Statistics II
Next we obtain
´2 1 x2 1
P(1 ≤ x ≤ 2) = 1 ( 5x + 30 )dx = [ 10 x 2
+ 30 ]1 = 3
Definition
13
STA 2200 Probability and Statistics II
F(x) = ∑ f (t)
t<x
Solution
(a) First, we need to take note that this function is for a discrete random variable
that can be computed as follows
The above function can also be represented with the following sketch, as you only
need two points to sketch a linear equation.
14
STA 2200 Probability and Statistics II
x
(b) F(x) = P(X = x) = ∑ f (t)for any real number x
t=1
Since X is a real number it can be in any of the following mutually exclusive inter-
vals
−∞ ≤ x < 1; 1 ≤ x < 2; 2 ≤ x < 3; 3 ≤ x < 4; 4 ≤ x < 5 and 5 ≤ x < ∞,
definition of f (x), we obtain
Using the
0 , <1
1
1≤x<2
10 ,
1 ,
2≤x<3
F(x) = 4
9
20 , 3≤x<4
7
4≤x<5
10 ,
≥5
1 ,
From the above cd f , the resultant sketch is as shown below
15
STA 2200 Probability and Statistics II
• F(x) is a step function which assumes a certain value in every interval con-
taining 1,2,3,4,5 (see the bold lines)
• F(x) is everywhere continuous the right of any point - take note that the line
showing the jump from say at x = 2, is doted.
´x
= 0.5 0 t dt
1 2 x
= 4 t |0
x2
= 4
x2
= F(x) = 4 ,0 < x < 2
T hus
0 , x<0
F(x) = x2
4 , 0≤x<2
x≥2
1 ,
16
STA 2200 Probability and Statistics II
d
f (x) = F(X)
dx
D
= [1 − e−2x ]
dx
= 2e−2x
2e−2x ,x > 0
= f (x) =
0 , elsewhere
Now
17
STA 2200 Probability and Statistics II
f (x) ≥ 0, e∼
= 2.71
ˆ ∞
2e−2x dx = −e−2x |∞
0
0
= −[e−∞ − 1] = −[0 − 1]
= 1
18
STA 2200 Probability and Statistics II
Solution
Solution
For
CDF
For
But
For
But
So,
Hence
19
STA 2200 Probability and Statistics II
A pdf is given by
b)
i.e. 1 e
40
dx
0
40
-∞ -0
= [e -e ] =-[0-1] =1
= [0.2865-0.3679]=0.081
In general
i)
i.e.
x x
1 40u
F ( x) f (u )du e du
0 0
40
F 50 1 =0.7135
F 40 1 =0.6321
√
y = X 2, y = x+c
and Y = X 2
The only value of Y with non-zero probabilities are 1, 4, and 9.
Now
P(Y = 4) = P(x2 = 4) = P(x = −2) + P(x = 2) = 0 + 1/3 = 1/3
Similarly,
P(Y = 1) = 1/6 and P(Y = 9) = 1/2
Note
20
STA 2200 Probability and Statistics II
When there is a one-to-one relationship between X and Y , f (x) and g(y) yields
exactly the same probabilities; and only the random variables, and the set of
values it can assume with non-zero probabilities changes.
Example . Consider the following problem and solution
of
Solution
X Y
0 0
1
2 1
3
4 4
3 1
P( y 0) P( x 2)
15 15
P y 1 P x 1 or P x 3 2 4 2
15 15 3
P y 4 P x 0 or P x 4 1 5 2
15 15 5
Therefore, the p.m.f of Y can then be written as
Y 0 1 4
P(Y=y) 1/5 2/5 2/5
Note
In general, if x1 , x2 , x3 ,.......xk all yields the same value Y then
g ( y) P(Y y) P( X x1 ) P X x2 ...... P X xk that is
in some cases several values of X will give rise to the same value of
Y. The procedure of finding p.m.f of Y is the same as before but it is
necessary to add the several probabilities that are associated with
each value of X that produces the several value of Y.
21
STA 2200 Probability and Statistics II
22
STA 2200 Probability and Statistics II
Learning Activities
Find the value of k and Sketch y = f (x) , hence compute P(1 ≤ x ≤ 3) and P(x ≥ 3)
23
STA 2200 Probability and Statistics II
Exercise
1. Let the pmf of x be given as follows.
X 1 2 3 4 5 6
of .
Summary
For a continuous random variable it is not plausible to think of a lone variable
assuming a particular value because they are infinitely many values (even in a small
interval) that a random variable can assume. This implies that the probability of
24
STA 2200 Probability and Statistics II
Mi
P(X = x j ) = =0
∞
Where Mi is the number of occurrences of value Xi
But the fact this probability is zero does not mean that this occurrence is impossible,
but suggests that we have another way of expressing probability.
This is done by the use of CDF as follows
ˆ µ
F(µ) = f (x)dx
−∞
25
STA 2200 Probability and Statistics II
LESSON 3
Measures of Central Tendency
Learning outcomes
Upon completing this topic, you should be able to:
• Compute the different measures of central tendency and position for both
discrete and continuous random variables.
1
STA 2200 Probability and Statistics II
3.1. Introduction
In this lesson, we are going to learn further how to show that a function is either
a pm f or a pd f . Further, we will be learning how we can compute the median,
mode and mean (expectation) of both discrete and continuous random variables.
Finally, we will be looking at how we can use the properties of mean (expectation)
to simplify the methods of computing these measures.
3.2. Median
The pth quartile of a random variable X (r its corresponding distribution ) denoted
by ε p is defined as the smallest number ε p such that FX (ε p ) ≥ p, 0 < p < 1
The median of a random variable X is denoted by med(x) or ε0.5 is the 0.5th quartile.
On the other hand the median of a random variable X is a value such that p(X ≤
x) ≥ 21 and the pr(X ≥ x) ≥ 12 . If X is a continuous random variable then median
M of X satisfies
ˆ m ˆ ∞
f (x)dx = f (x)dx = 0.5
−∞ m
Let 0<p<1 .A (100p)th percentile (quartile of order p) of the distribution of a random
variable x is a value ε p such that
p(X ≤ ε p ) ≥ p and p(X ≥ ε p ) ≥ p
Example . Find the median and the 25th percentile of the following pd f
3(1 − x)2 , 0 < x < 1
f (x) =
0 , elswhere
Solution
ˆ m ˆ 1
1
f (x)dx = f (x)dx =
0 m 2
ˆ m
3 (1 − x)2 dx = 0.5
0
2
STA 2200 Probability and Statistics II
let z=(1-x)
dz = −dx
1
1−m = 0.5 3
1
=⇒ m = 1 − 0.5 3 = 0.21
ˆ p
f (x)dx = 0.25
0
ˆ p
3 (1 − x)2 dx = 0.25
0
let (1-x)=z
−dx = dz
dx = −dz
3
STA 2200 Probability and Statistics II
ˆ 1−p
−3 z2 = 0.25
1−m
−z3 |1−p
1 = 0.25
1
1− p = 0.25 3
1
=⇒ p = 1 − 0.25 3 = 0.09
3.3. Mode
The mode of a distribution of a random variable X is a value x such that it maximizes
the pd f (pm f ). If x is a continuous random variable we often differentiate the pd f
to get the mode
Example . Find the mode of the following distribution
0.5x , x = 1, 2
pX (x) =
0 , elsewhere
Solution
The value x for which px (x)is minimum is 1
4
STA 2200 Probability and Statistics II
0.5x2 e−x ,0 < x < ∞
f (x) =
0 , elsewhere
Solution
1. f (x) = 0.5x2 e−x
vdv/du + udv/dx
at maximum f 0 (x) = 0
0.5[2xe−x − x2 e−x ] = 0
2xe−x = x2 e−x
x=2
= 0.5(−2e−2 )
Implying that 2 is the maximum value which is the mode of the above distri-
bution.
5
STA 2200 Probability and Statistics II
Determine the value of the constant c, hence find the cd f ,the median and mode of
f (x)
Solution
1. Since f (x) is a pd f
ˆ ∞
f (x)dx = 1
−∞
´1 ´2
0 cxdx + 1 c(2 − 1)dx =1
cx2 1 2cx−cx2 2
2 |0 + [2cx − 2 ]1 =1
c
2 + 4c − 4c c
2 − 2c + 2 =1
x ,0 ≤ x ≤ 1
Therefore f (x) = 2 − x ,1 ≤ x ≤ 2
0 , elsewhere
The cd f
F(x) = 0,x < x =⇒ f (0) = 0
for 0 ≤ x ≤ 1
´ x2
F(x) = xdx = 2 +c ]
1
Since F(0) = 0 =⇒ 2 (0) + c =⇒ c = 0
=⇒ F(0) = 12 x2 , 0 ≤ x ≤ 1
1
=⇒ F(1) = 2
6
STA 2200 Probability and Statistics II
(b) median
1
F(x) ≥ 2
1
Now F(1) = 2
3 x2 (4 − x) ,0 ≤ x < 4
Example . The pd f of a random variable is f (x) = 6
0 , elsewhere
‘Determine the mode
Solution
At maximum f 0 (x) = 0
3 3 2
=⇒ 64 2x[4 − x] − 64 x =0
3
=⇒ 64 [8x − 2x2 − x2 ] = 0
3
=⇒ 64 [8x − 3x2 ]
3
=⇒ 64 x[8 − 3x] = 0
x = 0 or x = 83
Next
3 3
f 00 (x) = (−3)x + (8 − 3x)
64 64
3 3
=⇒ 64 (−3)x + 64 (8 − 3x)
9x 3
− 64 + 64 (8 − 3x)
At x = 0, f 000 (0) = 24
64 > 0
=⇒ x = 0 gives minimum
At x = 83 ,
8 −9 3 3
f 00 ( ) = + −
3 24 8 8
−3
8 <0
8
Hence x = 3 gives the mode
7
STA 2200 Probability and Statistics II
x2 0≤x≤1
1 (2 − x) 1 < x ≤ 2
f (x) = 3
x − 2
2<x≤3
0 elsewhere
Determine the cd f and the median of this distribution
Solution
1. The cd f
F(x) = 0,x < 0
for 0 ≤ x ≤ 1
´ x2
F(x) = x2 dx = 3 + c1
1 3
Since F(0) = 0 =⇒ 2 (0 ) + c1 =⇒ c1 = 0
1
=⇒ F(1) = 3
8
STA 2200 Probability and Statistics II
1
F(x) = 2
1
Now F(1) = 3
2 1 2 1 1
3x − 6x − 6 =
2
=⇒ 4x − x2 − 1 = 3
x2 − 4x + 4 = 0
x2 − 2x − 2x + 4 = 0
x(x − 2) − 2(x − 2) = 0
(x − 2)2 = 0
=⇒ x = 2
The median is 2
cx ,0 ≤ x ≤ 1
Example . The random variable X has the pd f f (x) = c(2 − x) , 1 ≤ x ≤ 2 .
0 , elsewhere
Determine the value of the constant c, hence find the cd f , the median and mode of
f (x)
Solution
Since f (x) is a pd f
ˆ ∞
f (x)dx = 1
−∞
´1 ´2
0 cxdx + 1 c(2 − 1)dx =1
cx2 1 2cx−cx2 2
2 |0 + [2cx − 2 ]1 =1
c
2 + 4c − 4c c
2 − 2c + 2 =1
9
STA 2200 Probability and Statistics II
x ,0 ≤ x ≤ 1
Therefore f (x) = 2 − x ,1 ≤ x ≤ 2
0 , elsewhere
The cd f
for 0 ≤ x ≤ 1
´ x2
F(x) = xdx = 2 +c ]
1
Since F(0) = 0 =⇒ 2 (0) + c =⇒ c = 0
=⇒ F(0) = 12 x2 , 0 ≤ x ≤ 1
1
=⇒ F(1) = 2
Next for 1 ≤ x ≤ 2
´ 2
F(x) = (2 − x)dx = 2x − x2 + c2
1 2
But F(1) = 2 =⇒ 2(1) − 12 + c2 = 1
2
=⇒ c2 = −1
1 x2 0≤x≤1
2
F(x) =
2x − x2 − 1 , 1 ≤ x ≤ 2
2
Median
1
F(x) ≥ 2
1
Now F(1) = 2
10
STA 2200 Probability and Statistics II
Definition
E(X) = ∑ xP(X = x)
all x
1 1 1 1 1
= (1) + (2) + (3) + (4) + (6)
6 6 6 6 6
7
=
2
OR
n(a + 1)
sn =
2
1
6( + 1)
= 6
2
6
= 1+
2
7
=
2
11
STA 2200 Probability and Statistics II
Compute E(X)
Solution
Here we need to use the formula or the definition of E(X)
The function can also be represented in a table as follows, given that its a discrete
random variable. That is,
1
when x = 1, P(X = x) = 21 . we do the same for the other realized values of
the random variable X to come up with the table below.
xi 1 2 3 4 5 6
1 2 3 4 5 6
P(X = xi ) 21 21 21 21 21 21
= 13
3
12
STA 2200 Probability and Statistics II
The constant is not affected by expectation since the expected value of a constant is
the constant itself.
1. Let g(x) and h(x) be two functions of X then for any constants a and b
Proof
By definition
´∞
E[g(x) ± bh(x)] = −∞ (ag(x) ± bh(x) f (x)dx
´∞ ´∞
= −∞ (ag(x) f (x)dx ± −∞ (bh(x) f (x)dx
´∞ ´∞
= a −∞ g(x) f (x)dx ± b −∞ h(x) f (x)dx
= aE[g(x)] ± bE[h(x)]
13
STA 2200 Probability and Statistics II
Learning Activities
1. Find the value of c, median, mode of the following distribution
12x2 (1 − x) , 0 < x < 1
pX (x) =
0 , elsewhere
Find E(X)
Assignments
Identify at least four functions representing both discrete and continuous random
variables, and show that they are either PMF or PDF. For each function, find the
three measures of central tendency where possible.
14
STA 2200 Probability and Statistics II
LESSON 4
Variance of a Random variable, Moments and Moment
Generating Functions (MGF)
Learning outcomes
Upon completing this topic, you should be able to:
• Compute the variance of a random variable either directly or using the prop-
erties of the variance
• Define and compute the central moment, moments about specified point and
finally the factorial moments.
15
STA 2200 Probability and Statistics II
var(X) = ∑ [x − µx ]2P(X = x)
all x
ˆ ∞
var(X) = [x − µx )]2 f (x)dx
−∞
if X s a continuous random variable.
That is, the expectation of the square of the deviations of the values of X from
its mean.
Note:
16
STA 2200 Probability and Statistics II
Remark 3. The variance measures average dispersion from the mean. If it is small
it means that most of the values of X are concentrated near the mean. If it is large it
means that most values are spread far away from the mean.
Variance is normally calculated as follows
where
E(X)2 = ∑all x x2 P(X = x), if X is a discrete random variable and
´∞
E(X)2 = −∞ x2 f (x)dx, if X is a continuous random variable
Proof. By definition
17
STA 2200 Probability and Statistics II
x+3 −3 ≤ x ≤ 3
18
f (x) =
0 elsewhere
Find variance of X
Solution
ˆ ∞
E[X] = x f (x)dx
−∞
ˆ 3
x+3
=⇒ E[X] = x( )dx
−3 18
18
STA 2200 Probability and Statistics II
ˆ 3
1
= (x2 + 3x)dx
18 −3
ˆ
3 ˆ 3
1 2
= x dx + 3 x dx
18 −3 −3
1 x3 3 3x2 3
= | + |
18 3 −3 2 −3
1 27 27
= 9+9+ −
18 2 2
=1
Next
ˆ ∞
E(X 2 ) = x2 f (x)dx
−∞
ˆ 3
x+3
= x2 ( )dx
−3 18
ˆ 3
1
= (x3 + 3x2 )dx
18 −3
ˆ 3 ˆ 3
1 3
= x dx + 3 x2 dx
18 −3 −3
4 3
1 x 3 3x 3
= | + |
18 4 −3 3 −3
1 81 81
= + 27 − − 27
18 4 4
1
= (27 + 27)
18
54
= =3
18
∴ var(X) = E(X 2 ) − [E(X)]2
= 3−1 = 2
1. Consider the experiment of tossing 2 dice. Let X1 and X2 denote the outcome
of the first die and second die respectively. If Y is the absolute difference of
19
STA 2200 Probability and Statistics II
SOLUTION
X1
1 2 3 4 5 6
1 0 1 2 3 4 5
2 1 0 1 2 3 4
3 2 1 0 1 2 3
X2
4 3 2 1 0 1 2
5 4 3 2 1 0 1
6 5 4 3 2 1 0
The possible values of Y are 0,1,2,3,4,5 as indicated in bold in the table of out-
comes.
Therefore, from this experiment, we obtain the following probability distribution
y 0 1 2 3 4 5
6 10 8 6 4 2
P(Y = y) 36 36 36 36 36 36
Now
var(Y ) = E(Y 2 ) − [E(Y )]2
But
E(Y ) = ∑ yP(Y = y)
all y
6 10 8 6 4 2
= 0( ) + 1( ) + 2( ) + 3( ) + 4( ) + 5( )
36 36 36 36 36 36
1
= (70)
36
= 1.94
Next
20
STA 2200 Probability and Statistics II
E(Y 2 ) = ∑ y2P(Y = y)
all y
6 10 8 6 4 2
= 0( ) + 1( ) + 4( ) + 9( ) + 16( ) + 25( )
36 36 36 36 36 36
210
=
36
35
=
6
So
Example . A die is tossed once. Let X denote the score, and further let Y =
2X + 3. Find E(X), E(Y ), Var(X), Var(Y )
Solution: This problem can be solved by either calculating the expectation and
variance for X and Y or using the properties shown in the previous sections to find
the expectation and variance of Y.
Therefore, from the experiment
1 , x = 1, 2, 3, 4, 5, 6
6
f (x) =
0 elsewhare
E(X) = ∑ x f (x) = 1( 16 ) + 2( 61 ) + 3( 61 ) + 4( 16 ) + 5( 61 ) + 6( 16 )
=⇒ E(X) = 3.5
E(Y ) = E(2X + 3) = [2(1) + 3]( 61 ) + [2(2) + 3]( 16 ) + [2(3) + 3]( 16 ) + [2(4) + 3]( 16 ) +
[2(5) + 3]( 16 ) + [2(16) + 3]( 16 )
=⇒ E(Y ) = 10
Alternatively, we can use the properties of expectation to find E(Y ) as follows;
E(Y ) = E(2X + 3) = 2E(X) + 3 = 2(3.5) + 3 = 10
Furthermore, var(X) = E(X 2 ) − [E(X)]2
Solving the expression
21
STA 2200 Probability and Statistics II
var(X) = 12 ( 61 ) + 22 ( 16 ) + 32 ( 16 ) + 42 ( 16 ) + 52 ( 16 ) + 62 ( 16 )-3.52
var(X) = 15.17 − 3.52 = 2.92
Finally, var(Y ) = var(2X + 3) = 22 var(X) = 4(2.92) = 11.68
4.2. Moments
In addition to expectation and variance of a random variable, we can compute ex-
pectations of higher power of a random variable with respect to a given distribution
.
These expectations are useful in determining various characteristics of the corre-
sponding distributions
Definition
Definition
If X is a random variable the rth central moment about the value µ is defined as
µr = E[(X − µ)r ] (corrected moment =mean)
Remark 7. The first raw moment is the expectation of random variable X
That is, E(X) = 1st moment ≈ the origin
0 0
So, if r = 1 µ1 = E(X ) = mean of X
Remark 8. If a = µ then the 1st central moment is zero.
That is;
µ1 = E[(X − µ)]
22
STA 2200 Probability and Statistics II
= E(X) − µ
= µ −µ
=0
µ2 = E[(x − µ)2 ]
= E(x2 ) − 2µE(x) + µ 2
= E(x2 )2µ 2 + µ 2
= E(x2 ) − µ 2
= E(x2 ) − [E(x)]2
= var(x)
µr0 = E(xr )
= ∑ xr P(X = x)
all x
µr0 = E(xr )
ˆ ∞
= xr f (x)dx
−∞
23
STA 2200 Probability and Statistics II
µ2 = E[(x − x)2 ]
W here x = E(x) = µ10
= E[(x − µ10 )2 ]
but (x − µ10 )2 = x2 − 2µ10 x + (µ10 )2
µ2 = E x2 − 2µ10 x + (µ10 )2
24
STA 2200 Probability and Statistics II
µ3 = E[(x − x)3 ]
W here x = E(x) = µ10
= E[(x − µ10 )3 ]
but (x − µ10 )3 = x3 − 3µ10 x2 + 3x(µ10 )2 − (µ10 )3
µ3 = E x3 − 3µ10 x2 + 3x(µ10 )2 − (µ10 )3
Definition
25
STA 2200 Probability and Statistics II
mx (t) = E(etX )
= ∑ etx P(X = x).........(1)
all x
mx (t) = E(etX )
ˆ ∞
= etx f (x)dx.............(2)
−∞
Note that the mg f in (1) and (2) exists if ∑ is finite and if the integral is finite
respectively.
mx (t) = E(etX )
= ∑ etx P(X = x)
all x
t 2 x2 t 3 x3
etx = 1 + tx + + +...
2! 3!
=⇒ mx (t) = ∑ etx P(X = x)
all x
t 2 x2 t 3 x3
= ∑ 1 + tx + + + . . . P(X = x)
all x 2! 3!
t 2 x2 t 3 x3
= ∑ 1(P(X = x) + ∑ tx(P(X = x) + ∑ (P(X = x) + ∑ (P(X = x) + . . .
all x all x all x 2! all x 3!
26
STA 2200 Probability and Statistics II
2tx2 3t 2 x3 4t 3 x4
m0x (t) = ∑ x(P(X = x) + ∑ 2! (P(X = x) + ∑ 3! (P(X = x) + ∑ 4! (P(X = x) . . .
all x all x all x all x
t 2 x3 t 3 x4
= ∑ x(P(X = x) + ∑ tx2(P(X = x) + ∑ 2! (P(X = x) + ∑ 3! (P(X = x) . . .
all x all x all x all x
(0)2 x3
m0x (0) = 0 + ∑ x(P(X = x) + ∑ (0)x2 (P(X = x) + ∑ (P(X = x).......(3)
all x all x all x 2!
m0x (0) = ∑ x(P(X = x) = E(x) = µ10
all x
t 2 x3 t 3 x4
m0x (t) = ∑ x(P(X = x) + ∑ tx2 (P(X = x) + ∑ (P(X = x) + ∑ (P(X = x) . . .
all x all x all x 2! all x 3!
2tx3 3t 2 x4 4t 3 x5
m00x (t) = 0 + ∑ x2 (P(X = x) + ∑ (P(X = x) + ∑ (P(X = x) + ∑ (P(X = x) . .
all x all x 2! all x 3! all x 3!
2(0)x3 3(0)2 x4 4(0)3 x5
m00x (0) = 0 + ∑ x2 (P(X = x) + ∑ (P(X = x) + ∑ (P(X = x) + ∑ (P(X = x) .
all x all x 2! all x 3! all x 3!
2
= ∑ x (P(X = x)
all x
= E(X 2 ) = µ20
In general
µr0 = E(X r ) = mrx (t)|t=0
The rth raw moment is derived by differentiating the mg f r times with respect to t
and evaluating at t = 0.
Note:
27
STA 2200 Probability and Statistics II
t 2 x2 t 3 x3
etx = 1 + tx + + +...
2! ˆ 3!
∞
=⇒ mx (t) = etx f (x)dx
−∞
ˆ ∞
t 2 x2 t 3 x3
= 1 + tx + + + . . . f (x)dx
−∞ 2! 3!
ˆ ∞ ˆ ∞ ˆ ∞ 2 2 ˆ ∞ 3 3
t x t x
= 1 f (x)dx + tx f (x)dx + f (x)dx + f (x)dx + . . .
−∞ −∞ −∞ 2! −∞ 3!
t2 t3 t4
= 1 + tµ10 + µ20 + µ30 + µ40 + . . .
ˆ2! ∞
3! ˆ4! ∞
m0x (t) = 0 + x f (x)dx + tx2 f (x)dx + . . . (1)
−∞ −∞
ˆ ∞
0
mx (0) = x f (x)dx
−∞
0
E(X) = µ1
ˆ ∞ ˆ ∞
m0x (t) = 0+ x f (x)dx + tx2 f (x)dx + . . .
−∞ −∞
ˆ ˆ ˆ ∞ 2 4 ˆ ∞ 3 5
∞
2tx3 ∞
3t x 4t x
m00x (t) = 0 + 0 + x2 f (x)dx +f (x)dx + f (x)dx + f (x)dx . . .
−∞ −∞ 2! −∞ 3! −∞ 3!
ˆ ∞ ˆ ∞ ˆ ∞ ˆ ∞
00 2 2(0)x3 3(0)2 x4 4(0)3 x5
mx (t) = 0 + x f (x)dx + f (x)dx + f (x)dx + f (x)dx . . .
−∞ −∞ 2! −∞ 3! −∞ 3!
ˆ ∞
m00x (0) = x2 f (x)dx
−∞
= E(X ) = µ20
2
dr
In general the rth raw moment is given by dt r (mx (t)) |t=0
E XERCISE 6. Let X be a continuous random variable with pd f f (x) given by
λ e−λ x ,x > 0
f (x) =
0 elsewhere
Find its mg f if it exists
Derive the expected value of X and the variance of X from the mg f
28
STA 2200 Probability and Statistics II
Verify the results by computing the above quantities directly from the definition
4.7. Summary
In this section, we have discussed the variance of a random variable and its prop-
erties. We have also introduced the concept of moments and went further to show
how we can compute the central moment, moments about specified point of refer-
ence and finally the factorial moments. We have also shown how we can derive the
mg f of a given function and use it measures of central tendency among other things
for the different distributions.
Learning Activities
• A fair four sided die is thrown once . Find the 1st , 2nd and 3rd factorial
moments
29
STA 2200 Probability and Statistics II
LESSON 5
Theoretical Probability Distributions
Learning outcomes
Upon completing this topic, you should be able to:
• Calculate and interpret the measures of central tendency for grouped data.
30
STA 2200 Probability and Statistics II
5.1. Introduction
When we talk about theoretical distributions, we are referring to distributions which
are not obtained by actual observations/experiments. They are derived mathemati-
cally on the basis of certain assumptions. These distributions are broadly classified
into two categories
• Discrete probability distributions
• Bernoulli Distribution
A Bernoulli trial is a random experiment with only two mutually exclusive out-
comes: Success(occurrence of an event) or Failure(Non-occurrence of an event).
for instance, suppose we Toss a coin, we can then get a Head or Tail. Testing a
manufactured item, the item can be either defective or non-defective etc.
Therefore,
Let X be a random variable such that
1 , i f outcome is a success
X=
0 , i f outcome is a f ail
=⇒ Pr(X = 1) = p
P(X = 0) = (1 − p) = q
31
STA 2200 Probability and Statistics II
Note: A Bernoulli random variable is therefore defined using only one parameter,
that is p. The random variable X for a Bernoulli distribution takes on only
two possible values at a time, that is 0 or 1.
Moments
µx0 = E[X r ]
1
= ∑ xr p(X = x)
x=0
= 0 + 1r p1 (1 − p)1−1
= p
mx (t) = E[etx ]
1
= ∑ etx p(X = x)
x=0
1
= ∑ etx px (1 − p)1−x
x=0
0
= e (1 − p) + et p
= (1 − p) + pet
32
STA 2200 Probability and Statistics II
where p is the probability that the outcome corresponding to the value 1 occurs.
For an unbiased coin, where heads or tails are equally likely to occur, p = 0.5.
2. There are only two possible outcomes on each trial i.e Success or Failure.
Definition
Let a trial result in success with a constant probability p and in failure with prob-
ability (1 − p) = q. Then the probability of X successes in n independent trials is
given by
n px qn−x , x = 0, 1, 2, . . . , n
x
P(X = x) =
0 , elsewhere
which is a Binomial distribution with parameters n and p and is denoted as X ∼
Bin(n, p)
33
STA 2200 Probability and Statistics II
P(X ≥ 2)
To solve this problem easily, we can use the compliment of the defined event. That
is
E XERCISE 7. Suppose we toss a fair coin ten times. What is the probability of
showing? (a) 5 heads (b) 3 tails (c) At least 3 heads
34
STA 2200 Probability and Statistics II
mx (t) = E(etx )
n
tx n
= ∑e px qn−x
x=0 x
n
t x n−x n
= ∑ pe q
x=0 x
n−1
m0x (t) = n pet + q pet
n−1
= npet pet + q
m0x (0) = E(X)
= np (p + q)n−1
= np1n−1 Since (p + q = 1)
= np
35
STA 2200 Probability and Statistics II
36
STA 2200 Probability and Statistics II
Alternatively,
Consider a repetition of Bernoulli trials until a fixed number r of success has oc-
curred and then stops. E.g. we may be interested in the probability that the 10th
computer to crash with a virus will be the 3rd to be infected.
If the rth success is to occur on the xth trial there must be (r − 1) successes on the
(x − 1) trials.
The probability for this is given by:
n−1 r n−r
P(r − 1, x − 1, p) = r−1 p (1 − p)
The probability of a success on the rth trial is p, and the probability that the rth
success occurs is:
n−1 r n−r
P(X = r) = r−1 p (1 − p)
The random variables representing the number of trials required are called a neg-
ative binomial random variable. The random variable may be interpreted as the
waiting time until the occurrence of the r successes.
mx (t) = E etX
∞
n − 1 k n−k t(k)
= ∑ pq e
n=k k − 1
where p = q = 1 − p
= pk + pk qet + pk (qet )2 + . . .
= pk 1 + qet + (qet )2 + . . .
pk
=
(1 − qet)k
k
p
=
(1 − qet)
37
STA 2200 Probability and Statistics II
−k−1
m0x (t)|t=0 pk (−k) 1 − qet −qet
=
−k−1
kpk qet 1 − qet
=
=⇒ E [X] = m0x (t)|t=0
kq
=
p
E X 2 − [E (X)]2
k2 q2
= E X2 − 2
p
00
E X2
= mx (t)|t=0
−k−1 −k−2
m00x (t) = kpk 1 − qet + kpk qt (−k − 1) 1 − qet −qet
38
STA 2200 Probability and Statistics II
• Geometric Distribution
Consider an experiment in which Bernoulli trials are performed until the 1st success
occurs. The sample space is given as S = {s, s f , ss f , sss f , ssss f , ......}
Suppose that a sequence of independent Bernoulli trials each with probability of
success (p) are performed ; let X be the number of trials until the 1st success occurs
then X is said to be a discrete random variable with a pm f given by;
p (1 − p)n−1 , n = 1, 2, 3, . . .
P(k) =
0 , elsewhere
Example . An expert shot hits a target 95% of the time. What in the probability
that he misses the target for the 1st time in the 15th shot?
Solution:
The information given, p = 0.05 and 1 − p = 0.95
Therefore, P(X = 15) = 0.05(0.95)15−1 = 0.0244
39
STA 2200 Probability and Statistics II
= p (p)−2 (q)
pq
E [X] =
p2
q
E [X] =
p
E X2 = m00x (t)|t=0
−2 −3
m00x (t) = pqet 1 − qet + (−2) pqet 1 − qet −qet
−2 −3
pqet 1 − qet + (2) pqet qet 1 − qet
=
m00x (t)|t=0 = (pq) (1 − q)−2 + (2) (pq) (q) (1 − q)−3
(pq) (p)−2 + (2) (pq) (q) (p)−3
(pq) (2) (pq) (q)
= +
(p)2 (p)3
(q) (2) (q) (q)
= +
(p) (p)2
2
q 2q2
E X = + 2
p p
40
STA 2200 Probability and Statistics II
Now Var(X)
E X 2 − [E (X)]2
2
q 2q2 q
= + 2 −
p p p
q 2q 2 q 2
= + 2 − 2
p p p
qp + 2q − q2
2
=
p2
qp + q2
=
p2
q (p + q)
=
p2
q
= 2
p
2. There are only two outcomes of which are possible at each trial.
5. The random variable X is the number of trials needed for the first success to
appear including the successful trial.
5.2. Summary
This lesson generally introduces some of the basic and well known discrete prob-
ability distributions. We have also shown some of the properties that one should
look for in order to decide whether a random variable has a binomial distribution.
Other discrete probability distributions are discussed in the subsequent lessons.
41
STA 2200 Probability and Statistics II
Learning Activities
1. The probability that a computer drawn at random from a batch of computers
is defective is 0.1, If a sample of 6 computers is taken, find the probability
that i will contain
4. If the probability of having a male or female child are both 0.5. Calculate the
probability that: (a) The family’s 5th child is their 1st son. (b) A family’s 8th
child is their 3rd son. (c) A family’s 6th child is their 2nd or 3rd son.
42
STA 2200 Probability and Statistics II
LESSON 6
Discrete Probability Distributions cont...
Learning outcomes
Upon completing this topic, you should be able to:
• Derive the mean and variance of the Poisson and Hypergeometric distribu-
tions, and show the unique property of the Poisson distribution
• Derive the mg f of Poisson distributions and use the mg f to find the expecta-
tion and variance of the distribution and other related aspects
43
STA 2200 Probability and Statistics II
Assuming that each of these occurs randomly, they are all examples of variables that
can be modeled using a Poisson distribution. Thereafter a Poisson distribution has
two potential uses:
44
STA 2200 Probability and Statistics II
m x
n!
n−x
1 − mn
limn→∞ p (X = x) = limn→∞ (n−x)!x! n
n! m xm
n −x
= limn→∞ (n−x)!x! nx 1 − n 1 − mn
mx n!
n −x
= nx limn→∞ (n−x)!x! × limn→∞ 1 − mn 1 − mn
But
n! n(n−1)(n−2)...(n−x+1)
limn→∞ (n−x)!x! = limn→∞ n.n....n
= 1×1×1···×1 = 1
n
limn→∞ 1 − mn = e−m
n
since limn→∞ 1 + n1 =e
and limn→∞ 1 + na = ea
−x
and limn→∞ 1 − mn = 1−x = 1
mx
=⇒ limn→∞ p (X = x) = x! (1) e−m
λ x e−λ , x = 0, 1, 2 . . .
x!
=
0 , elsewhere
45
STA 2200 Probability and Statistics II
46
STA 2200 Probability and Statistics II
If λ is not an integer, then the mode is the integer below λ . E.g. if X ∼ Poi(4.9) ,
then the mod is 4 and if X ∼ Poi(9) then the mode is 8 and 9.
mx (t) = E etX
∞
λ x e−λ
= ∑ etx x!
x=0
∞
λx
= e−λ ∑ etx
x=0 x!
∞
(λ et )x
= e−λ ∑
x=0 x!
" #
(λ et )2 (λ et )3
= e−λ t
1+λe + + +...
2! 3!
t
= e−λ eλ e
t
= eλ (e −1)
or
t
= e−λ (1−e )
Mean (X)
47
STA 2200 Probability and Statistics II
Var(X)
E X2 m00x (t)|t=0
=
t−1 t−1 )
m00x (t) = λ et eλ (e ) + λ et λ et eλ (e
E X 2 = m00x (t)|t=0 = λ + λ 2
Var(X) = E X 2 − [E (X)]2
= λ + λ 2 − (λ )2
Var(X) = λ
X ∼Poi (0.6)
e−0.6 (0.6)x
P (X = x) = , x = 0, 1, 2
x!
e−0.6 (0.6)0
P (X = 0) = = 0.55
0!
48
STA 2200 Probability and Statistics II
X ∼ Poi (0.6)
X1 ∼ Poi (4 × 0.6)
X1 ∼ Poi (2.4)
e−2.4 (2.4)x
P (X1 = x) = , x = 0, 1, 2, . . .
x!
P (X1 > 2) = 1 − [P (X1 = 0) + P (X1 = 1) + P (X1 = 2)]
" #
e−2.4 (2.4)0 e−2.4 (2.4)1 e−2.4 (2.4)2
= 1− + +
0! 1! 2!
e−2.4
= 1−
6.28
= 1 − 0.5697
= 0.430
49
STA 2200 Probability and Statistics II
Example . A factory packs bolts in boxes of 500 . The probability that a bolt is
defective is 0.002. Find the probabilities that a box 2 defective bolts.
Solution:
Since n is large and p is small we can use a Poisson approximation.
λ = np
= 500(0.002)
= 1
e−1 (1)2
P(X = 2) =
2!
= 0.184
50
STA 2200 Probability and Statistics II
X ∼ Poi (4)
e−4 (4)x
P (X = x) = , x = 0, 1, 2, . . .
x!
e−4 (4)0
P (X = 0) = =
0!
X ∼ Poi (3 × 4)
e−12 (12)x
P (X = x) = , x = 0, 1, 2, . . .
x!
P (X < 2) = P (X = 0) + P (X = 1)
e−12 (12)0 e−12 (12)1
= +
0! 1!
−12
= e (1 + 12)
= 0.000079
1
X ∼ Poi ×4
2
X2 ∼ Poi (2)
e−2 (2)x
P (X = x) = , x = 0, 1, 2, . . .
x!
P (X > 2) = 1 − [P (X2 = 0) + P (X2 = 1) + P (X2 = 2)]
" #
e−2 (2)0 e−2 (2)1 e−2 (2)2
= 1− + +
0! 1! 2!
= 1 − e−2 (1 + 2 + 2)
= 0.3233
51
STA 2200 Probability and Statistics II
Solution
p = 0.02, n = 400
λ = n ∗ p = 400 ∗ 0.02 = 8
−8 6
P(X = 6) = e 6!(8) = 0.122
Using binomial approach.
p = 0.02, n = 400
(P(X = 6) = 400 6 394 = 0.121
6 0.02 0.98
E XERCISE 10. Eggs are packed into boxes of 500. On average, 0.7% of the eggs
are found are found to be broken when the eggs are unpacked. Find the correct to 2
sig. figure, the probability that in a box of 500; (a) Exactly three are broken. (b)At
least two are broken.
M N−M
n n−x
P(X = x) = N
n
Alternatively,
We can derive hypergeometric distribution as follows:
Let N be the population size from which a sample of size n is to be drawn. Let the
proportion of individuals in this finite population who possess certain property of
interest be denoted by p. If X is a random variable corresponding to the number of
individuals in the sample possessing the property of interest, then the problem is to
find the distribution function of X.
52
STA 2200 Probability and Statistics II
Since x individuals must come from N p individuals in the population with property
of interest and the remaining n-x individuals come from N − N p who do not possess
the property of interest, then by using the idea of combinations the distribution of
X is given by
N p N−N p
x n−x
P(X = x) = N
n
which is the hypergeometric distribution
Suppose then that N p = k then
k N−k
(x)(Nn−x ) , x = 0, 1, . . . , n
(n)
0 , elsewhere
E [X] = ∑ xP(X = x)
allx
k N−k
= ∑ x x Nn−x
x=0 n
k−1 N−k
nk n x−1 n−x
= ∑ N−1
N x=1 n−1
let x − 1 = y
=⇒ n − x = n − 1 − y
=⇒ x = y + 1
k−1 N−1−k+1
nk n−1 y n−1−y
E [X] = ∑ N−1
N y=0 n−1
nk
E [X] =
N
Var(X)
53
STA 2200 Probability and Statistics II
E X 2 − [E (X)]2
Var(X) =
But
n k N−k
E [X (X − 1)] = ∑ x(x − 1) x Nn−x
x=0 n
n−2 k−2 N−2−k+2
n(n − 1)k(k − 1) x−2 n−2−y
= ∑ N−2
N(N − 1)
y=0 n−2
let x − 2 = y
=⇒ n − x = n − y − 2
=⇒ x = y + 2
k−2 N−2−k+2
n(n − 1)k(k − 1) n−2 y n−2−y
E [X (X − 1)] = ∑ N−2
N(N − 1) y=0 n−2
n(n − 1)k(k − 1)
=
N(N − 1)
= E X 2 − [E (X)]2
Var(X)
= E [X (X − 1)] + E (X) + [E (X)]2
2
n(n − 1)k(k − 1) nk nk
= + +
N(N − 1) N N
n(n − 1)k(k − 1) nk n k2 2
= + + 2
N(N − 1) N N
nk (k − 1) nk
(n − 1) +1−
N N −1 N
nk (n − K) (N − n)
=
N N (N − 1)
Solution
Let the random variable X=No of good items. Using the hypergeometric distribu-
tion
54
STA 2200 Probability and Statistics II
k N−k
x n−x
p(X = x) = N
n
N = 20, k = 15, N − k = 5, n = 4, x = 3, n − x = 1
15 5
3 1
p(X = 3) = 20
4
5! 5!
12!3! × 4!1!
= 20!
16!4!
(15)(14)(13) 5 (4)(3)(2)(1)
= × ×
(3)(2)(1) 1 (20)(19)(18)(17)
(5)(7)(13)
=
(19)(3)(17)
455
= = 0.4696
969
Solution
Let the random variable X be the number of individuals with high blood pressure
k N−k
x n−x
p(X = x) = N
n
N = 100, k = 10, N − k = 90, n = 8
10 90
2
8−x
x
p(X ≤ 2) = ∑100
x=0 8
10 90 10 90 10 90
0 8−0
= 100
+ 1 1008−1
+ 2 8−2
100
8 8 8
= 0.97
55
STA 2200 Probability and Statistics II
ment.He gets the letters mixed up and on arrival in the computer science department
delivers 10 letters at random to the department.What is the probability that only six
of the letters to be delivered to the computer science department actually arrive
there?
Solution
Let the random variable X=No of letters delivered to the computer science depart-
ment.
k N−k
x n−x
p(X = x) = N
n
N = 16, k = 10, N − k = 6, n = 10, x = 6, n − x = 4
10 6
6 4
p(X = 6) = 16
10
5! 5!
12!3! × 4!1!
= 20!
16!4!
(10)(9)(7)(6) 6 × 5 (6)(5)(4)(3)(2)(1)
= × ×
(4)(3)(2)(1) 2 × 1 (16)(15)(14)(13)(12)(11)
= 0.393
56
STA 2200 Probability and Statistics II
Solution
E XERCISE 11. In a school there are 20 students .6 are compulsive smokers and
keep cigarettes in their lockers all the time. One day prefects decide to check at
random on 10 lockers . What is the probability that the Will find cigarettes in at
least 3 of the lockers?
6.3. Summary
In this lesson, we have mainly looked at two very important discrete distributions,
i.e. Poisson and Hypergeometric distributions. We have indicated that the Poisson
distribution has one unique property and that is, its mean and variance are equal,
and denoted by λ . We have also derived the mg f of the Poisson distribution and
t t
shown that the mg f is given as eλ (e −1) or e−λ (1−e ) . Given the mg f of a random
t
variable as e−4(1−e ) , we can be able to use this information (knowing the its the mg f
of a Poisson distribution) to find some defined probability of a random variable e.g
P(X = 2). This is because, from this information, λ = 4, while e is a constant.
Finally, we have also indicated that the mg f of the Hypergeometric is not useful!
57
STA 2200 Probability and Statistics II
Learning Activities
58
STA 2200 Probability and Statistics II
LESSON 7
Continuous Probability Distributions
Learning outcomes
Upon completing this topic, you should be able to:
• Find the expected values of the gamma, beta, chi-square and exponential dis-
tributions from the pd f s and mg f s
59
STA 2200 Probability and Statistics II
Definition
This is also denoted as X ∼ Uni f (a, b), read as X has a uniform distribution with
the interval [a, b]
Example . Show that this is a pd f .
Solution:
´∞
We have previously shown that for a continuous function to be a pd f , −∞ f (x)dx =
1
´b ´b 1
Therefore, a f (x)dx= a b−a dx = 1
b a b−a
=⇒ ( b−a − b−a ) = b−a = 1
If the random variable X is uniformly distributed over [a, b] then
60
STA 2200 Probability and Statistics II
ˆ ∞
E [X] = x f (x)dx
−∞
ˆ b
1
= x dx
a b−a
ˆ b
1
= xdx
b−a a
2 b
1 x
=
b−a 2 a
2
b − a2
1
=
b−a 2
1 (a + b)(a − b)
=
b−a 2
a+b
=
2
a + b 2 2
2 2 2
Var(X) = E X − (E [X]) = E X − E X
2
ˆ b 3 b 3
b − a3 b3 − a3
2 1 1 x 1
= x dx = = =
a b−a b−a 3 a b−a 3 3 (b − a)
b3 − a3 a+b 2
∴ Var (X) = −
3 (b − a) 2
2
(a + b)2 4 b3 − a3 − (a + b) 3 (b − a)
b3 − a3
= − =
3 (b − a) 4 12 (b − a)
4 (b − a) b + a2 + ab − (a + b)2 3 (b − a)
=
12 (b − a)
2
4 b2 + a2 + ab − 3 (a + b)2 4b2 + 4a2 + 4ab − 3 a2 + b2 + 2ab
= =
12 12
61
STA 2200 Probability and Statistics II
equal to a+b
2
Similarly, we can find the variance as follows
Example . The change in depth of a river from one day to the next, measured at
a specific
location is a random variable Y with pd f
k , −2 ≤ x ≤ 2
f (x) =
0 elsewhere
(a) Find k
(b) Whats the distribution function of Y ?
Solution:
1 1
(a) Y ∼ Uni f (−2, 2)=⇒ k = b−a = 2−(−2) = 14
(b) Here, the question is about the cumulative distribution (cdf) of the random vari-
able Y, i.e. F(y).
0 , y < −2
´
y 1
Therefore, F(y) = dy , −2 ≤ y ≤ 2
−2 4
y≥2
1 ,
Evaluating
the integral above, we have
0 , y < −2
F(y) = 14 (y + 2) , −2 ≤ y ≤ 2
y≥2
1 ,
E XERCISE 12. Beginning at 12.00 am, a computer center is up for 1 hour and
then down for 2 hours on a regular cycle. Wayua who was unaware of this sched-
ule dials the center at a random time between 12.00 am and 5.00 am. Find the
probability that the center will be up when her call comes in.
62
STA 2200 Probability and Statistics II
mx (t) = E etX
ˆ b
1
= etx dx
a b−a
ˆ b !
1
= etx dx
b−a a
tx b
1 e
=
b−a t a
bt
e − eat
1
=
b−a t
bt
e −e at
=
t (b − a)
Gamma Function
The Gamma function is denoted by Γα and is defined by
ˆ ∞
Γα = xα−1 e−x dx
0
Therefore if αis a positive integer then the gamma functions have the following
properties
´∞
(a) Γ1 = 1, since 0 e−x dx = 1
63
STA 2200 Probability and Statistics II
(b) Γα = (α − 1)Γ(α − 1)
(c) Γα = (α − 1) (α − 2) . . . (3)(2)Γ1 = (α − 1)!
=⇒ Γ (α + 1)=α! and
Γ (α + 2) = (α + 1)! etc
Beta Function
The Beta function B (α, β ) with two parameters α, β (given as powers within the
integral) is defined as
ˆ 1
B (α, β ) = xα−1 (1 − x)β −1 dx, α > 0, β > 0
0
Properties
B (α, β ) = B (β , α)
ˆ 1
i.e B (α, β ) = xα−1 (1 − x)β −1 dx
0
let 1−x =µ
du = −dx
x = 1−µ
ˆ 1
B (α, β ) = − (1 − u)α−1 uβ −1 du
0
ˆ 1
= uβ −1 (1 − u)α−1 du = B (β , α)
0
ΓαΓβ
B (α, β ) =
Γ (α + β )
Gamma Distribution
There are two types of gamma distributions. The first (having only one parameter,
α) and the second type (having two parameters α and β ). The two types of gamma
distributions are defined as follows:
64
STA 2200 Probability and Statistics II
Type I
A random variable X has a gamma distribution with parameter α > 0, if the pd f is
given by
xα−1 e−x , x ≥ 0, α > 0
Γα
f (x) =
0 , elsewhere
Proof
We can derive the gamma distribution from the gamma function as follows
ˆ ∞
Γα = xα−1 e−x dx
0
dividing this function by Γα , we have
´∞
Γα 0 xα−1 e−x dx
=
Γα Γα
´∞
xα−1 e−x dx
0
=1
Γα
It can be shown using the properties of the gamma function, that the mean and
variance of this distribution are both equal to α.
Type II
A random variable X has a gamma distribution with parameters α > 0 and β if its
pd f is given by
−x
1
Γαβ α xα−1 e β , 0 < x < ∞, β > 0, α > 0,
f (x) =
0 , elsewhere
Proof
We derive the gamma distribution from the gamma function as follows
ˆ ∞
Γα = xα−1 e−x dx
0
65
STA 2200 Probability and Statistics II
y
Introduce a new variable y by letting x = β where β > 0,then
ˆ
y α−1 − βy 1
∞
Γα = e dy
0 β β
ˆ ∞ α−1 −y
y
= α
e β dy = 1
0 Γαβ
For this type of the gamma distribution, the mean, E(X) = αβ and var(X) = αβ 2
Example . Show that, for a gamma distribution, with parameters α and β , given
by the pd f
−x
xα−1
Γαβ α e
β , 0<x<∞
f (x) =
0 , elsewhere
66
STA 2200 Probability and Statistics II
x dx
making the same substitution as we did for the mean, u = β such that du = β , the
variance can be expressed in the form
´ ∞ α+1 e−u β 2
var(X) = 0 u Γα du − (αβ )2
2 ´ ∞ α+1 −u
β
= Γα 0 u e du − (αβ )2
β2
= Γα Γ(α + 2) − (αβ )2
2
β
= Γα (α + 1)Γ(α + 1) − (αβ )2 simplifying further
2
β
= Γα (α + 1)αΓ(α) − (αβ )2
= β 2 α 2 + αβ 2 − α 2 β 2
= αβ 2
Hence, var(X) = αβ 2
mx (t) = E(etX )
ˆ ∞
1 −x
= etx x α−1 β
e dx
0 Γαβ α
ˆ ∞
1 α−1 tx β
−x
= x e e dx
0 Γαβ α
1
= xα−1 e−x(1−βt)/β dx
Γαβ α
1
let y = x (1 − βt) /β , t <
β
1 − βt
=⇒ dy = dx
β
67
STA 2200 Probability and Statistics II
β dy
=⇒ dx =
1 − βt
α ˆ ∞
1 1 α−1 −y
= y e dy
1 − βt 0 Γα
βy
and x=
1 − βt
ˆ ∞ α−1
β / (1 − βt) βy
∴ mx (t) = e−y dy
0 Γαβ α 1 − βt
ˆ ∞
1 α−1 −y
but y e dy = 1(pd f )
0 Γα
α
1 1
= ,t <
1 − βt β
= (1 − βt)−α
= αβ (1 − βt)−α−1
68
STA 2200 Probability and Statistics II
E X 2 − (E [X])2
var(X) =
E X 2 − α 2β 2
=
m00x (t) = αβ (−α − 1) (−β ) (1 − βt)α−2
E X 2 = m00x (t)|t=0 = αβ (αβ + β )
=
= α 2 β 2 + αβ 2
E X 2 − α 2β 2
∴ var(X) =
= α 2 β 2 + αβ 2 − α 2 β 2
= αβ 2
69
STA 2200 Probability and Statistics II
1 e−x/θ , 0 < x < ∞
f (x) = θ
0 , elsewhere
or
λ e−λ x ,0 < x < ∞
f (x) =
0 , elsewhere
The mg f of an exponential distribution is
1
1 − θt
or λ 1−t
E(x) = λ1 or θ
var(X)= λ12 or θ 2
or
Γ(α+β ) xα−1 (1 − x)β −1 , 0 ≤ x ≤ 1, β > 0, α > 0
ΓαΓβ
f (x) =
0 elsewhere
The mg f of a beta distribution is not useful but we can as well get the mean and
variance for the distribution
70
STA 2200 Probability and Statistics II
ˆ 1
E(X) = (x)1 f (x)dx
0
ˆ 1
(x)1 xα−1 (1 − x)β −1
= dx
0 B (α, β )
´1 x (1 − x)β −1
α
= 0 dx
B (α, β )
ˆ 1
but B (α, β ) = xα−1 (1 − x)β −1 dx
0
by convention
ˆ 1
xα (1 − x)β −1 dx = B (α + 1, β )
0
B (α + 1, β )
=⇒
B (α, β )
Γ (α + 1) Γβ Γ(α + β )
= ×
Γ(α + β + 1) ΓαΓβ
αΓαΓβ Γ (α + β )
= ×
(α + β ) Γ (α + β ) ΓαΓβ
α
=
α +β
Next var(X)
71
STA 2200 Probability and Statistics II
ˆ
x2 xα−1 (1 − x)β −1
1
E(X 2 ) = dx
0 B (α, β )
x α+2−1 (1 − x)β −1
= dx
B (α, β )
B (α + 2, β )
=
B (α, β )
Γ (α + 2) Γβ Γ (α + β )
= ×
Γ (α + β + 2) ΓαΓβ
(α + 1)Γ(α + 1)Γβ × Γ (α + β )
=
(α + β + 1) Γ (α + β + 1) ΓαΓβ
(α + 1)αΓαΓβ × Γ (α + β )
=
(α + β + 1) (α + β ) Γ (α + β ) ΓαΓβ
(α + 1) α
=
(α + β + 1) (α + β )
2
(α + 1) α α
E X 2 − (E [X])2
= −
(α + β + 1) (α + β ) α +β
2 2
(α + α)(α + β ) − α (α + β + 1)
=
(α + β + 1) (α + β )2
α 3 + α 2 + α 2 β + αβ − α 3 − α 2 β − α 2
=
(α + β + 1) (α + β )2
αβ
=
(α + β + 1) (α + β )2
72
STA 2200 Probability and Statistics II
7.3. Summary
We have mainly looked at some of the continuous distribution and derived the var-
ious probability distributions. Where possible, we have derived the mg f s of the
different continuous distributions and also used the mg f s to obtain the mean and
variance of the random variables. We have also seen that the Chi-square and the
exponential distributions are special cases of the beta distributions.
73
STA 2200 Probability and Statistics II
(a) He waits more than 500 minutes for the take off after checking in.
(b) His waiting time for the take off after checking in will be within 1.5
standard deviations of the mean waiting time
74
STA 2200 Probability and Statistics II
LESSON 8
Change of Variable and Distribution function Techniques
Learning outcomes
Upon completing this topic, you should be able to:
75
STA 2200 Probability and Statistics II
• The mg f technique.
We explore functions of random variables that are independent and identically dis-
tributed. For instance if X1 is the weight of a randomly selected individuals from
the population of males, X2 is the weight of another randomly selected individual
from the population of male...Xn ; then we might be interested in learning how the
random function; X̄ = X1 +X2 +....+X
n
n
is distributed
76
STA 2200 Probability and Statistics II
Looking at the graph of the function y = x2 , we note that the function is increasing
in X and 0 < y < 1
That noted, let’s now use the distribution function technique to find the pd f of Y.
First, we find the cumulative distribution function of Y:
1 1
Fy (y) = P(Y ≤ y) = P(X 2 ≤ y) = P(X ≤ y 2 ) = FX (y 2 )
´ y1 1
3
= 0 2 3t 2 dt = [t 3 ]y0 = y 2 ,0 < y < 1
2
E XERCISE 13. Let X be a continuous random variable with the following prob-
ability density function: f (x) = 3(1−x)2 for 0 < x < 1. What is the probability
density function of Y = (1−X)3 ?
µ x e−µ
E XERCISE 14. Let X have the Poisson pd f , f (x) = x! for x = 0, 1, 2.... Find
the pd f of Y = 4X
77
STA 2200 Probability and Statistics II
78
STA 2200 Probability and Statistics II
Similarly
Next
79
STA 2200 Probability and Statistics II
The blue curve, of course, represents the continuous and increasing function Y =
u(X). If you put an x-value, such as c1 and c2 , into the function Y = u(X), you get
a y-value, such as u(c1 ) and u(c2 ).
But, because the function is continuous and increasing, an inverse function X =
v(Y ) exists. In that case, if you put a y-value into the function X = v(Y ), you get an
x-value, such as v(y).
Okay, now that we have described the scenario, let’s derive the distribution function
of Y. It is:
´ v(y)
FY (y) = P(Y ≤ y) = P(u(X) ≤ y) = P(X ≤ v(y)) = c1 f (x)dx
for d1 = u(c1 ) < y < u(c2 ) = d2 . The first equality holds from the definition of the
cumulative distribution function of Y. The second equality holds because Y = u(X).
The third equality holds because, as shown in red on the following graph, for the
portion of the function for whichu(X) ≤ y, it is also true that X ≤ v(Y ):
And, the last equality holds from the definition of probability for a continuous ran-
dom variable X. Now, we just have to take the derivative of FY (y), the cumulative
distribution function of Y, to get fY (y), the probability density function of Y. The
80
STA 2200 Probability and Statistics II
Fundamental Theorem of Calculus, in conjunction with the Chain Rule, tells us that
the derivative is:
fY (y) = FY0 (y) = fx (v(y)) · v0 (y), for d1 = u(c1 ) < y < u(c2 ) = d2
The blue curve, of course, represents the continuous and decreasing function Y =
u(X). Again, if you put an x-value, such as c1 and c2 , into the function Y = u(X),
you get a y-value, such as u(c1 ) and u(c2 ). But, because the function is continuous
and decreasing, an inverse function X = v(Y ) exists. In that case, if you put a
y-value into the function X = v(Y ), you get an x-value, such as v(y).
That said, the distribution function of Y is then:
´ v(y)
FY (y) = P(Y ≤ y) = P(u(X) ≤ y) = P(X ≥ v(y)) = 1−P(X ≤ v(y)) = 1− c1 f (x)dx
for d2 = u(c2 ) < y < u(c1 ) = d1 . The first equality holds from the definition of the
cumulative distribution function of Y . The second equality holds because Y = u(X).
The third equality holds because, as shown in red on the following graph, for the
portion of the function for which u(X) ≤ y, it is also true that X ≥ v(Y ):
The fourth equality holds from the rule of complementary events. And, the last
equality holds from the definition of probability for a continuous random variable
81
STA 2200 Probability and Statistics II
X. Now, we just have to take the derivative of FY (y), the cumulative distribution
function of Y, to get fY (y), the probability density function of Y. Again, the Funda-
mental Theorem of Calculus, in conjunction with the Chain Rule, tells us that the
derivative is:
fY (y) = FY0 (y) = − fx (v(y)) · v0 (y)
for d2 = u(c2 ) < y < u(c1 ) = d1 . You might be alarmed in that it seems that the pd f
f (y) is negative, but note that the derivative of v(y) is negative, because X = v(Y )
is a decreasing function in Y . Therefore, the two negatives cancel each other out,
and therefore make f (y) positive.
Finally....We have now derived what is called the change-of-variable technique
first for an increasing function and then for a decreasing function. But, continu-
ous, increasing functions and continuous, decreasing functions, by their one-to-one
nature, are both invertible functions. Let’s, once and for all, then write the change-
of-variable technique for any generic invertible function
Definition
82
STA 2200 Probability and Statistics II
1
Now, taking the derivative of v(y), we get: v0 (y) = y−1/2
2
Therefore, the change-of-variable technique:
fY (y) = fX (v(y)) × |v0 (y)|
tells us that the probability density function of Y is:
1
fY (y) = 3[y1/2 ]2 · y−1/2
2
And, simplifying we get that the probability density function of Y is:
3
fY (y) = y1/2
2
We shouldn’t be surprised by this result, as it is the same result that we obtained
using the distribution function technique.
83
STA 2200 Probability and Statistics II
Using the distribution function technique and the change of variable technique, find
the probability density of Y = X 3 and comment on the results obtained using the
two approaches.
84
STA 2200 Probability and Statistics II
LESSON 9
Normal Distribution
Learning outcomes
Upon completing this topic, you should be able to:
• Be familiar with the normal distribution and the standard normal distribution
• Interpret the answer (after the approximation) in terms of the original prob-
lem.
85
STA 2200 Probability and Statistics II
9.1. Introduction
This is the most popular and commonly used distribution. A continuous random
variable X is said to have a normal distribution if its pd f is given by
− 1 (x−µ)2
√1 e 2σ 2 , −∞ < x < ∞, −∞ < µ < ∞, σ > 0
f (x) = σ 2π
0 , elsewhere
From the above pd f and notation, we observe that the normal distribution depends
on two unknown parameters µ and σ . Later we will show that these parameters µ
and σ are mean and standard deviation respectively.
Once the parameters µ and σ are specified the distribution is completely determined
hence the values of f (x) can be evaluated from the values of X and normal curve
plotted; which is normally bell-shaped and symmetric about x = µ.
Note that the total area under the curve is always1 .
ˆ ∞
1 − 1 (x−µ)2
f (x) = √ e 2σ 2 dx = 1
−∞ σ 2π
X −µ
z=
σ
In this case z is a standardized variable. Note that if X ∼ N µ, σ 2 then the stan-
86
STA 2200 Probability and Statistics II
dardized variable
X −µ
z= ∼ N (0, 1) .
σ
E(Z) = E(X−µ)
σ = E(X)−E(µ)
σ = σµ − σµ = 0
σ2
var(z) = var (X−µ) = var σX − var σµ = var(X) µ
σ σ2
− var σ = σ2 =1
Note var σµ =variance of constant =0.
If z ∼ N(0, 1) then ˆz
1 − 1 z2
P (Z < z) = e 2 dz
−∞ 2π
These probabilities have been evaluated and given in form of a table.
Now the probability i.e
P(x ≤ b) can be evaluated by
x−µ b−µ
= P ≤
σ σ
=⇒ P(x ≤ b) = P(Z < a∗) − − − −(1)
where a∗ = b−µ
σ
This probability on the RHS of (1) can be read from tabulated values of the CDF of
the normal distribution Z.
87
STA 2200 Probability and Statistics II
Example . A random variable X has a normal distribution with mean 9 and stan-
dard deviation 3. Find pP [5 < X < 11]
Solution:
We need to standardize X
i.e z = X−µ
σ
for X = 5
z = 5-9
3 = −1.333
for X = 11
z = 11−9
3 = 0.6667
=⇒ P(−1.33 < Z < 0.667)
=P(Z < 0.667) − (1 − P(Z < 1.333))
= 0.7486 − 0.0918 = 0.6568
Example . If X ∼ N 10, σ 2 and P(X > 12) = 0.1587. Find P(9 < X < 11)
Solution
88
STA 2200 Probability and Statistics II
89
STA 2200 Probability and Statistics II
That is ˆ ∞
1 1 2
√ e− 2 z dx = 1
−∞ 2π
´∞ 1
− 2σ (x−µ)2
or √1 = 1if z ∼ N (0, 1)
−∞ σ 2π e dx
Note
The normal distribution is widely used in statistics despite the fact that populations
hardly follow the exact normal distribution. This is because;
ˆ ∞
1 − 1 (x−µ)2 dx
mx (t) = etx √ e 2σ 2
σ 2π
ˆ−∞
∞
1 − 1 (x−µ)2 +tx
= √ e 2σ 2 dx......(1)
−∞ σ 2π
1 2
but − 2σ1 2 (x − µ)2 + tx = − 2 x2 + µ 2 − 2µx − 2σ 2tx
2σ
1
− 2 x2 − µ 2 − 2x(µ + σ 2t
=
2σ
1 µ2
= − 2 x2 − 2x(µ + σ 2t − 2 ......(2)
2σ 2σ
90
STA 2200 Probability and Statistics II
x2 − 2x(µ + σ 2t
2 2
i.e x2 − 2x(µ + σ 2t = x − (µ + σ 2t − µ − σ 2t
2
(µ + σ 2t
1 2 2
µ2
f rom(2) − 2 x − (µ + σ t + −
2σ 2σ 2 2σ 2
2 2
1 2 2
(µ + 2µσ 2t + σ 4t 2 − µ 2
= − 2 x − (µ + σ t +
2σ 2σ 2
1 2 σ 2t 2
= − 2 x − (µ + σ 2t + µt +
2σ 2
From (1)
ˆ ∞ 2 22
1 − 1 x−(µ+σ 2 t ) +µt+ σ 2t
√ e 2σ 2 ( dx
−∞ σ 2π
ˆ ∞ 2
22 1 − 1 x−(µ+σ 2 t )
=e µt+ σ 2t
√ e 2σ 2 (
−∞ σ 2π
ˆ ∞ 2
1 − 1 x−(µ+σ 2 t )
but √ e 2σ 2 ( dx = 1
−∞ σ 2π
σ 2t 2
= eµt+ 2
´∞ − 1 x−(µ+σ 2 t )
2
√1 e 2σ 2 ( dx is a normal density with mean = µ + σ 2t
−∞ σ 2π
and var= σ 2
We note that if we replace µwith 0 and σ 2 with 1 we get the mg f of standard normal
distribution
i.e
1 2
mz (t) = e0+ 2 t
1 2
= e2t
91
STA 2200 Probability and Statistics II
m00x (t)|t=0 = σ 2 + µ 2
var(X) = E(X 2 ) − (E [X])2
= σ 2 + µ2 − µ2 = σ 2
mZ (t) = E(etZ )
ˆ ∞
1 z2
= etz √ e 2
−∞ 2π
ˆ ∞ 2
1 (−z −2tz)
= √ e 2 dz
−∞ 2π
but z − 2tz = (z − t)2
2
ˆ ∞
1 (z−t)2 t 2
=⇒ √ e− 2 + 2
−∞ 2π
ˆ ∞
t2 1 (z−t)2
=e2 √ e− 2 dz
−∞ 2π
t2
=e2
Alternatively
We can get the mg f of a standard normal distribution from the mg f of a normal
density.
i.e
1 2 2
mx (t) = eµt+ 2 σ t ....(3)
92
STA 2200 Probability and Statistics II
Since z ∼ N(0, 1) we can replace the µ and σ with 0 and 1 respectively from (3).
t2 t2
=⇒ mz (t) = e0+ 2 = e 2
Mean(Z)
Var(X)
= E(z2 ) − (E [z])2
E z2 = m00z (t)|t=0
but
now m00z (t)|t=0 = 1 + 0 = 1
∴ var(z)1 + 0 − 02 = 1
• First, recall that a discrete random variable can only take on only specified
93
STA 2200 Probability and Statistics II
• Second, recall that with a continuous distribution (such as the normal), the
probability of obtaining a particular value of a random variable is zero. On
the other hand, when the normal approximation is used to approximate a dis-
crete distribution, a continuity correction can be employed so that we can
approximate the probability of a specific value of the discrete distribution.
Consider an experiment where we toss a fair coin 12 times and observe the
number of heads. Suppose we want to compute the probability of obtaining
exactly 4 heads. Whereas a discrete random variable can have only a spec-
ified value (such as 4), a continuous random variable used to approximate it
could take on any values within an interval around that specified value, as
demonstrated in this figure:
• The continuity correction requires adding or subtracting 0.5 from the value or
values of the discrete random variable X as needed. Hence to use the normal
distribution to approximate the probability of obtaining exactly 4 heads (i.e.,
X = 4), we would find the area under the normal curve from X = 3.5 to
X = 4.5, the lower and upper boundaries of 4. Moreover, to determine the
approximate probability of observing at least 4 heads, we would find the area
under the normal curve from X = 3.5 and above since, on a continuum, 3.5 is
the lower boundary of X. Similarly, to determine the approximate probability
94
STA 2200 Probability and Statistics II
of observing at most 4 heads, we would find the area under the normal curve
from X = 4.5 and below since, on a continuum, 4.5 is the upper boundary of
X.
Note that this statement is not an approximation — it is exactly correct! The reason
for this is that we are adding the events 2.5 < H < 3 and 5 < H < 5.5 to get from
95
STA 2200 Probability and Statistics II
the left side of the equation to the right side of the equation, but for the binomial
random variable, these events have probability zero. The continuity correction
is not where the approximation comes in; that comes when we approximate H using
a normal distribution with mean µ = 6and standard deviation σ = 1.732: thus,
Note that the approximation is only off by about 1%, which is pretty good for such
a small sample size!
E XERCISE 17. Suppose that a sample of n = 1, 600 tires of the same type are
obtained at random from an ongoing production process in which 8% of all such
tires produced are defective. What is the probability that in such a sample not more
than 150 tires will be defective?
E XERCISE 18. Based on past experience, 7% of all luncheon vouchers are in
error. If a random sample of 400 vouchers is selected, what is the approximate
probability that (a) exactly 25 are in error? (b) fewer than 25 are in error? (c)
between 20 and 25 (inclusive) are in error?
Example . A particular production process used to manufacture ferrite magnets
used to operate reed switches in electronic meters is known to give 10% defective
magnets on average. If 200 magnets are randomly selected, what is the probability
that the number of defective magnets is between 24 and 30?
96
STA 2200 Probability and Statistics II
97
STA 2200 Probability and Statistics II
2. In an examination the average mark was 76.5 and s.d was 9.5 if 15% of the
class second grades A what is the lowest possible grade A mark? Assume the
marks are normally distributed [Ans: The lowest possible score for grade A
was 87. The lowest possible score for grade B is 86]
(a) What is the probability that the diameter will exceed 0.81cm?
(b) The cable is considered defective if the diameter differs from the mean
by more than 0.025cm. What is the probability of obtaining a defective
cable?
5. A machine packs sugar in what are nominally 2kg bags. However there is a
variation in the actual weight which is described by the normal distribution.
(a) Previous records indicate that the standard deviation of the distribution
is 0.02 kg and the probability that the bag is underweight is 0.01. Find
the mean value of the distribution.
(b) It is hoped that an improvement to the machine will reduce the standard
deviation while allowing it to operate with the same mean value. What
value standard deviation is needed to ensure that the probability that a
bag is underweight is 0.001?
98
STA 2200 Probability and Statistics II
LESSON 10
Statistical Inference
Learning outcomes
Upon completing this topic, you should be able to:
• Carry out hypothesis test based on either the normal distribution or using the
t-distribution
99
STA 2200 Probability and Statistics II
10.1. Introduction
We begin by defining some of the terms used in statistical inference
• Statistical Hypothesis :
100
STA 2200 Probability and Statistics II
• Critical region: The set of all the values of the test that would cause us to
reject Ho .
• Critical value: This is the value(s) that separate the critical region from the
accepted region.
• 100(1 − α) confidence interval : This are interval say [a,b] for which prob-
ability Xε[a, b] is 1 − α. That is p(a < x < b) = 1 − α
101
STA 2200 Probability and Statistics II
Step 1
The first step in hypothesis testing procedure is to declare the relevant hypothesis,
Ho and the relevant alternative hypothesis H1 (before data are seen) Our aim is to
eventually to “accept” or “reject” the null hypothesis as a result of an objective
statistical procedure.
Step 2
Since the decision to “accept” or “reject “ Ho will be made on the basis of data
derived from some random process it is possible that an incorrect decision will be
made, that is to reject Ho when it is indeed true ( a Type I error) or to accept , when it
is false ( Type II error). Hence in hypothesis testing we ensure that the probability
of the both errors is arbitrarily small unless we are able to make the number of
102
STA 2200 Probability and Statistics II
Step 3
This step consist of determining a test statistic. This is the quantity calculated from
the data whose numerical value leads to “acceptance or rejection” of, Ho
Step 4
This step consist of determining those observed values of the test statistic that lead
to rejection of Ho . Choice is made so as to ensure that the test has numerical value
for the Type I error when chosen in Step 2
Step 5
The final step in hypothesis test procedure is to obtain the data, and to determine
whether the observed value of the test statistic is equal to or more extreme than the
significance point calculated in Step 4 and to reject Ho if it is, else fail to reject Ho
.
Example . Jam is produced in tins labeled 1 kg . The machine fitting the tins
is set to give a mean of 1030gms and s.d of 16gms . What is the probability that
a customer buys a tin which in (a) <982gms (b) <1000gms (c) Test at 1% level
whether a tin of 1000gms has been previously produced assuming the weights of
the Jam tins is normally distributed.
Solution:
X ∼ N(1030, 162 )
=⇒ Z ∼ N(0, 1) where Z = x−103016
Therefore
(a) P(X < 982) = P(Z < 982−1030
16 ) = P(Z < −3)
P(Z < −3) = 1 − Φ(3) = 1 − 0.999 = 0.001
(b) P(X < 1000) = P(Z < 1000−1030
16 ) = P(Z < −1.88)
P(Z < −1.88) = 1 − Φ(31.88) = 1 − 0.970 = 0.030
(c) Hypothesis test, we need
103
STA 2200 Probability and Statistics II
Ho : µ = 1030
versus
Ho : µ < 1030 (this is a one tailed test)
Since, α = 0.01, Zt = −2.33
Test statistic
Zc = 1000−1030
16 = −1.88
=⇒ Since Zc > −2.33
Decision and conclusion
We do not reject Ho and conclude that the production of tins has been accurate.
E XERCISE 20. From past record it is known that 20% of a certain seedling will
survive and grow into strong trees. In a batch of 400 seedlings planted on a planting
day, only 60 survived. Is this poor survival rate? Use 1% level of significance.
E XERCISE 21. A six sided die is rolled 120 times only nine 4’s appeared when
the score of the upper most face of the die was recorded. Is there evidence to suggest
bias in the die (use α = 0.05)
In simple terms:
1. For large samples ( n ≥ 30) from any population with mean µ and variance
σ 2 the sample mean X̄ ∼ N(µ, σ 2 /n)
2. For a sample of any size n taken from a N(µ, σ 2 ) population, the sample
mean X̄ ∼ N(µ, σ 2 /n)
104
STA 2200 Probability and Statistics II
Definitions
• Confidence level: The probability that an interval estimate encloses the pop-
ulation parameter expressed as a percentage. For large samples 100(1- α)%
confidence intervals for µ is given by X̄ ± Zα/2 ( √σn ). For n ≥ 30, this interval
is approximately X̄ ± Zα/2 ( √Sn )
105
STA 2200 Probability and Statistics II
In practice σ is not known and in such a case, the only option is use S sample
estimate
√ of standard deviation
n(X−µ )
S is approximately normal if n is large.
√
n(X−µ )
If n is not large then S is distributed as t.
X−µ 1
2
t = S/√n where S2 = n−1 ∑ Xi − X
t is widely used and the distribution of t is called t−distribution
The density function of variable t with k = n − 1 degrees of freedom is
1 1
f (t) = √ 1 k
k+1 , −∞ < t < ∞
kB 2, 2 t2
1+ k
2
106
STA 2200 Probability and Statistics II
• It is uni-modal
X − µ0
tn−1 =
s √
n X − µ0
= tn−1 =
s s
n (n − 1)
= tn−1 = X − µ0 2
∑ X −X
Whatever value we get we compare it with a tabulated value (from the t-table).
If the calculated value is greater than the tabulated, we reject the null hypothesis.
Example . The life expectancy of people in the year 1970 in Brazil was expected
to be 50 years. A survey was conducted in 11 regions of Brazil and the following
data obtained. Do the data confirm the expected view? Life expectancy (yrs): 54.2,
50.4, 44.2, 49.7, 55.4, 57.0, 58.2, 56.6, 61.9, 57.5, 53.4
Solution
We wish to test
H0 : µ = 50
vs
HA : µ 6= 50
The test statistic
107
STA 2200 Probability and Statistics II
√
n X − µ0
tn−1 =
s
54.2+50.4+. . . +53.4
X=
11
598.5
= = 54.41
11
1 2
s2 = ∑ n−1 X −X
1
32799.91 − 54.412
=
10
= 23.607
√
S2 =
√ 23.607 = 4.859
11 (54.41 − 50)
t=
4.859
= 3.01
Example . A bleacher claims that his variety of Cotton contains at most 40% lint
in seed cotton. Eighteen (18) samples of 100 grams each were take n and after gin-
36.3 37.0 36.6 37.5 37.5 37.9
ning the following quantity of lint was found in each sample.
38.5 37.9 38.8 37.5 37.1 37.0
1
Check the bleacher’s claim.Use 100 level of significance
Solution: We wish to test
H0 : µ = 40 vs HA : µ < 50
The test statistic
108
STA 2200 Probability and Statistics II
√
n X − µ0
tn−1 =
s
36.3+37.0+36.6+. . . +36.7+35.7
X=
18
669.7
= = 37.206
18 ( 2 )
1 ∑ Xi
S2 = X2 −
n−1 ∑ i n
1 2
s2 = ∑ n−1 X −X
1
= (27.33 − 37.206)
17
= 0.633
√
S2 =
√ 0.633 = 0.796
18 (37.206 − 40)
t=
0.796
= −14.49
109
STA 2200 Probability and Statistics II
3. A sample of size 80 has a mean of 600gms and s.d of 5gms .find the 99% for
the mean of the population.
110
STA 2200 Probability and Statistics II
Solutions to Exercises
Exercise 1. (a) ∑ P(W = w) = 1,=⇒ 0.1+0.25+0.3+0.15+d = 1. Thus solving
all w
for d, we have d = 0.2
(b) P(−3 ≤ W ≤ 0) = P(W = −3) + P(W = −2) + P(W = −1) + P(W = 0)
=⇒P(−3 ≤ W ≤ 0) = 0.1 + 0.25 + 0.3 + 0.5 = 0.8
or simply
P(−3 ≤ W ≤ 0) = 1 − P(W = 1) = 1 − d = 0.8
(c) P(W > −1) = P(W = 0) + P(W = 1)
=⇒ P(W > −1) = 0.15 + 0.2 = 0.35
(d) P(−1 < W < 1) = P(W = 0) = 0.15 Note: Check the value that is satisfied
by the range given in the question. Its only the value W = 0
(e) The mode is always the value with the highest frequency. In probability, the
mode is the value with the highest probability. this for this case, the mode is the
value corresponding to the largest probability i.e. 0.3 from the table. Hence our
mode is W = −1
Exercise 1
Exercise 2. Let us consider a table to show the probabilities associated with each
of the X values
x 1 2 3 4 5 6
P(X = x) C 2C 3C 4C 5C 6C
Solving
∑ P(X = x) = 1
=⇒ C + 2C + 3C + 4C + 5C + 6C = 1
1
21C = 1 =⇒ C = 21
Next,
P(X < 4) = P(X = 1) + P(X = 2) + P(X = 3)
1 2 3 6
= 21 + 21 + 21 = 21
Similarly, we can obtain P(3 ≤ X < 6) as follows;
P(3 ≤ X < 6) = P(X = 3) + P(X = 4) + P(X = 5)
3 4 5 12
=⇒P(3 ≤ X < 6) = 21 + 21 + 21 = 21
Exercise 2
´3 ´1 ´2 ´3
Exercise 3. (a) Since f (x) is a pd f then 0 f (x)dx = 0 axdx + 1 adx + 2 (−ax +
3a)dx = 1
111
STA 2200 Probability and Statistics II
2 2
=⇒ ax2 |10 + ax|21 + − ax2 | + 3ax|32 = 1
a 5
2 + a − 2 a + 3a = 1
2a = 1
=⇒ a = 12
(b) p(X ≤ 1.5)
ˆ 1.5 ˆ 1 ˆ 1.5
f (x)dx = axdx + adx
0 0 1
ˆ 1 ˆ 1.5
1 1
= xdx + dx
2 0 2 1
1 x2 1 1 1.5
= | + x|
22 0 2 1
1 1 3
= + ( − 1)
4 2 2
1 1
= + = 0.5
4 4
Exercise 3
Exercise 4. See Lesson 1, Example 2 for the experiment.
We have the following possible outcomes
P0 = 1/16, P1 = 4/16, P2 = 6/16, P3 = 4/16, P4 = 1/16
Now, assuming that X represents the number of heads. This implies that when he
gets 4 heads, we have P4 and for four tails, we have P0 . Both have a probability of
1
16 . The rest of the outcomes where he gets 1,2 and 3 heads have their respective
probabilities. We can then represent the information as follows in a table.
No. Heads 0 1 2 3 4
X 500 −150 −150 −150 500
1 4 6 4 1
P(X = x) 16 1616 16 16
1 4 1
E(X) = ∑ xP(X = x) = 500 ∗ ( 16 ) + (−150 ∗ 16 ) + ... + 500 ∗ 16
= − 1100 275
16 = − 4
Therefore, our conclusion would be; we wouldn’t advise him to play the game since
he expects on average to loose − 275
4
Note that, while solving the problem, we took 4 tails to be represented as Zero(0)
heads.
112
STA 2200 Probability and Statistics II
Exercise 4
Exercise 5.
E(X) = ∑ P(X = x)
all x
1 1 1 1 1 1
= 1( ) + 2( ) + 3( ) + 4( ) + 5( ) + 6( )
6 6 6 6 6 6
7
=
2
113
STA 2200 Probability and Statistics II
Exercise 5
Exercise 6.
By definition
114
STA 2200 Probability and Statistics II
ˆ ∞
mx (t) = etx f (x)dx
−∞
ˆ ∞
= xλ e−λ x dx
ˆ−∞
∞
= λ e(t−λ )x dx
ˆ−∞
∞
= λ e−(λ −t)x dx
−∞
λ
= e−(λ −t)x |∞
0
−(λ − t)
λ
= e−(λ −t)∞ − e−(λ −t)0
−(λ − t)
λ
e−∞ − e0
=
−(λ − t)
λ 1
= −1
−(λ − t) e∞
λ
=
(λ − t)
= λ (λ − t)−1
E(X) = m0x (t)|0
λ
=
(λ − t)2
1
=
λ
Var(X) = E(x2 ) − (E(x))2
2
Var(X) = m00x (t)|0 − m0x (t)|0
00 d λ
mx (t) =
dt (λ − t)2
2λ
=
(λ − t)3
2λ
m00x (t)|0 =
(λ )3
2
= 2
λ
2
2 1
Var(X) = 2 −
λ λ
1
= 2
λ 115
STA 2200 Probability and Statistics II
´∞
We need to verify that E(X) = −∞ xλ e−λ x dx = λ1 and
´∞ ´ 2
var(X) = −∞ x2 λ e−λ x dx − −∞ xλ e−λ x dx = λ12
∞
´∞
E(X) = λ −∞ xe−λ x dx
´
Using integration by parts udv = uv − vdu
Let u = x and dv = e−λ x dx
−λ x
=⇒ du = dx and v = −eλ
ˆ ∞
E(x) = λ xe−λ x dx
−∞
ˆ ∞ −λ x !
xe−λ x e
=λ + dx
λ −∞ λ
" #∞ !
xe−λ x e−λ x
=λ + 2
λ λ
0
" # ∞
h
−λ x
i∞ e−λ x
= −xe −
0 λ
0
( ∞1 −−1)
1
= − 0−
λ
1
1
=
λ
1
=
λ
Next
2
1
2
var(x) = E(x ) −
λ
ˆ ∞
2
but E(X ) = x2 f (x)dx
−∞
ˆ ∞
= x2 λ e−λ x dx
−∞
ˆ ∞
=λ x2 e−λ x dx
−∞
116
STA 2200 Probability and Statistics II
ˆ ∞
2
E(x ) = λ x2 e−λ x dx
−∞
ˆ !
−x2 e−λ x 2 ∞ −λ x
=λ + xe dx
λ λ −∞
" !#∞ !
−x2 e−λ x 2 −xe−λ x e−λ x
=λ + − 2
λ λ λ λ
" #∞ " !#0∞
2
−x e −λ x 2 −xe −λ x e−λ x
= − − 2
λ λ λ λ
0 0
h i∞ 2 h i∞ 2 h i∞
= −x2 e−λ x + −xe−λ x + 2 −e−λ x
0
λ 0 λ 0
2 1
= 0 + 0 + 2 −( ∞ − 1)
λ e
2
= 2
λ
Exercise 6
Exercise 7.
:
Let X be the number of heads in n = 10 tosses, and p = 0.5
P(X = k) = nk pk (1 − p)n−k
(a) P(X = 5) = 10
5 5
5 0.5 0.5 = 0.2461
(b) P(3tails) = P(X = 7) = 10 7 3
7 0.5 0.5 = 0.117
(c) P(X ≥ 3) = 1 − [P(X = 0) + P(X = 1) + P(X = 2)]
1 − [ 10 10 10
0 10 1 9 2 8
0 0.5 0.5 + 1 0.5 0.5 + 2 0.5 0.5 ] = 0.9453 Exercise 7
Exercise 8. :
The given mg f is for a binomial distribution.
117
STA 2200 Probability and Statistics II
X ∼ Poi (1.5)
e−1.5 (1.5)x
P (X = x) = , x = 0, 1, 2 . . .
x!
P (X > 3) = 1 − [P (X = 0) + P (X = 1) + P (X = 2) + P (X = 3)]
" #
e−1.5 (1.5)0 e−1.5 (1.5)1 e−1.5 (1.5)2 e−1.5 (1.5)3
= 1− + + +
0! 1! 2! 3!
(1.5)2 (1.5)3
= 1 − e−1.5 1 + 1.5 + +
2! 3!
= 1 − (0.2231 × 4.1879)
= 1 − 0.934
= 0.066
Note here that we have used the compliment to obtain the probability.
Exercise 9
Exercise 10. Let X be the number of broken eggs in a box of 500.
P(no.o f broken eggs) = 0.007, soX B(500)
E(X) = np = 500 ∗ 0.007 = 3.5
Since n > 50 and p < 0.1, we can use Poisson approximate, i.e X Poi(3.5)
−3.5 x
(a) P(X = 3) = e 3!(3.5) = 0.22
118
STA 2200 Probability and Statistics II
k N−k
x n−x
p(X = x) = N
n
N = 20, k = 6, N − k = 14, n = 10
3 6 14
10−x
x
p(X ≥ 3) = ∑ 20
x=0 10
10 90 10 90 10 90
0 8−0
= 100
+ 1 1008−1
+ 2 8−2
100
8 8 8
= 0.686
Exercise 11
Exercise 12.
Let Y be the time when the call comes in
Y ∼ Uni f (0, 5), since the time interval she call shall be 12.00 am to 5.00 am
Thus,
1 , 1≤y≤5
F(y) = 5
0 , elsewhere
In 5 hours, the center is opened as follows
12.00 to 1.00am, that is 0 → 1hour, then clossed for 2 hours
Center is again opened from 3.00 to 4.00am, the next 1 hour
Thus, we have
P(0 < y < 1) + P(3 < y < 4) = 51 + 15 = 25
Hence, the probability that she will find the center open is 25 or 0.4 Exercise 12
Exercise 13.
119
STA 2200 Probability and Statistics II
If we look at the graph of the function Y = (1−X)3 , we might note that (1) the
function is a decreasing function of X, and (2) 0 < y < 1.
That noted, let’s now use the distribution function technique to find the pd f of Y .
First, we find the cumulative distribution function of Y :
3 1
FY (y) = P(Y ≤ y) = P((1 − x) ≤ y) = P(1 − x ≤ y 3 )
1 1 1
P(−x ≤ −1 + y 3 ) = P(x ≥ 1 − y 3 ) = 1 − Fx (1 − y 3 )
´ (1−y 31 ) 1−y 1
= 1− ◦ 3(1 − t)2 dt = 1 + [(1 − t)3 ]o 3
1
= 1 + [(1 − (1 − y 3 ))3 − (1 − 0)3 ]
= 1+y−1 = y
Having shown that the cumulative distribution function of Y is:
FY (y) = y
for 0 < y < 1, we now just need to differentiate F(y) to get the probability density
function f (y). Doing so, we get:
fY (y) = F´Y (y) = 1 for 0 < y < 1.
That is, Y ∼ Uni f (0, 1) random variable. (Again, you might find it reassuring to
verify that f(y) does indeed integrate to 1 over the support of y.) Exercise 13
Exercise 14.
Solution of the problem is as follows
120
STA 2200 Probability and Statistics II
Exercise 14
Exercise 15.
Note that the function: Y = (1−X)3 defined over the interval 0 < x < 1 is an invert-
ible function.
The inverse function is:
x = v(y) = 1 − y1/3
for 0 < y < 1. (That range is because, when x = 0, y = 1; and when x = 1, y = 0).
Now, taking the derivative of v(y), we get:
1
v0 (y) = − y−2/3
3
Therefore, the change-of-variable technique:
fY (y) = fX (v(y)) × |v0 (y)|
tells us that the probability density function of Y is:
1 1
fY (y) = 3[1 − (1 − y1/3 )]2 · | − y−2/3 | = 3y2/3 · y−2/3
3 3
And, simplifying we get that the probability density function of Y is:
fY (y) = 1
for 0 < y < 1. Again, we shouldn’t be surprised by this result, as it is the same result
that we obtained using the distribution function technique. Exercise 15
Exercise 16.
Let the random variable X=time required to do the job.
121
STA 2200 Probability and Statistics II
60 − 55
P(X < 60) ≡
10
= P(Z < 0.5) = 0.6915
45 − 55 60 − 55
≡ P ≤Z≤
10 10
= P(−1 ≤ Z ≤ 0.5)
= P(Z ≤ 0.5) − P(Z < −1)
= P(Z ≤ 0.5) − [1 − P(Z < 1)]
= 0.6915 − 0.1587
We approximate the B(1600, 0.08) random variable T with a normal, with mean
p
(1600)(0.08) = 128 and standard deviation (1600)(0.08(0.92) = 10.85. The
probability calculation is thus
Exercise 17
Exercise 18.
We approximate the B(400, 0.07) random variable V with a normal, with mean
122
STA 2200 Probability and Statistics II
p
(400)(0.07) = 28 and standard deviation (400)(0.07)(0.93) = 5.103. The prob-
ability calculations are thus
(a) P(V = 25) = P(24.5 < V < 25.5)
≈ P( 24.5−28 25.5−28
5.103 < Z < 5.103 )
= P(−0.69 < Z < −0.49)
= 0.3121 − 0.2451 = 0.0670
(b) P(V < 25) = P(V < 24.5)
≈ P(Z < 24.5−28
5.103 )
= P(Z < −0.69)
= 0.2451
(c) P(20 ≤ V ≤ 25) = P(19.5 < V < 25.5)
≈ P( 19.5−28 25.5−28
5.103 < Z < 5.103 )
= P(−.67 < Z < −0.49)
= 0.3121 − 0.0475 = 0.2646
Exercise 18
Exercise 19. Solution of the problem is as follows
Note further how we have been changing the variables (e.g from X to Y) when
approximating the probability. For instance P(X ≥ 300) ≈ P(Y > 300.5). The
change is useful for our understanding of the concept illustrated. Exercise 19
Exercise 20.
123
STA 2200 Probability and Statistics II
Zt = −2.33
=⇒Reject Ho if Zc < −2.33
Test statistic
60.5−80
Zc = x−80
8 = 8 = −2.44
Since Zc < Zt , we therefore reject Ho and conclude that there is a significantly poor
germination rate.
Note: In hypothesis the question asked should help us know whether to use a one-
tailed or 2- tailed test!!
Exercise 20
Exercise 21.
Hypothesis
Ho : p = 61
Vs
H1 : p < 16
Under Ho ,X ∼ B(120, 16 ). We can now apply the normal approximations to binomial
=⇒ X ∼ N(20, 100 x−20
6 ) =⇒ Z = 10/ 6 ∼ N(0, 1)
√
X = 9 =⇒ 8.5 ≤ x ≤ 9.5
Since our interest is in less than, we take x ≤ 9.5
Therefore, rejection region
Zt = −1.65=⇒reject Ho if Zc < −1.65
124
STA 2200 Probability and Statistics II
Test statistic
Zc = 9.5−20
√ ∼ −2.57 =⇒Zc < Zt = −1.65
10/ 6
Hence, reject Ho
There is significant evidence that the die is biased to give too few fours.
Exercise 21
125