Slides PDF
Slides PDF
on
Eusebius Doedel
TABLE OF CONTENTS
SAMPLE SPACES 1
Events 5
The Algebra of Events 6
Axioms of Probability 9
Further Properties 10
Counting Outcomes 13
Permutations 14
Combinations 21
CONDITIONAL PROBABILITY 45
Independent Events 63
DEFINITION :
The sample space is the set of all possible outcomes of an experiment.
1
EXAMPLE :
S = {1, 2, 3, 4, 5, 6}.
1
The probability the die lands with k up is 6
, (k = 1, 2, , 6).
1 1 1 1
+ + = .
6 6 6 2
2
EXAMPLE :
When we toss a coin 3 times and record the results in the sequence
that they occur, then the sample space is
S = { HHH , HHT , HT H , HT T , T HH , T HT , T T H , T T T } .
1
Thus the probability of the sequence HT T is 8
.
1 1 1 3
+ + = .
8 8 8 8
3
EXAMPLE : When we toss a coin 3 times and record the results
without paying attention to the order in which they occur, e.g., if we
only record the number of Heads, then the sample space is
n o
S = {H, H, H} , {H, H, T } , {H, T, T } , {T, T, T } .
The outcomes in S are now sets ; i.e., order is not important.
Recall that the ordered outcomes are
{ HHH , HHT , HT H , HT T , T HH , T HT , T T H , T T T } .
Note that
{H, H, H} corresponds to one of the ordered outcomes,
{H, H, T } ,, three ,,
{H, T, T } ,, three ,,
{T, T, T } ,, one ,,
4
Events
S = {1, 2, 3, 4, 5, 6},
1 1 1 1
P (E) = + + = .
6 6 6 2
5
The Algebra of Events
Since events are sets, namely, subsets of the sample space S, we can
do the usual set operations :
Ec the complement of E
EF the union of E and F
EF the intersection of E and F
We write E F if E is a subset of F .
EF instead of E F ,
EF instead of E F .
6
If the sample space S is finite then we typically allow any subset of
S to be an event.
7
We always assume that the set E of allowable events includes the
complements, unions, and intersections of its events.
S = {a , b , c , d} ,
8
Axioms of Probability
0 P (E) 1 ,
P (S) = 1 ,
9
Further Properties
PROPERTY 1 :
P (E E c ) = P (E) + P (E c ) = 1 . ( Why ? )
Thus
P (E c ) = 1 P (E) .
EXAMPLE :
What is the probability of at least one H in four tosses of a coin?
1 15
P (at least one H) = 1 P (no H) = 1 = .
16 16
10
PROPERTY 2 :
P (E F ) = P (E) + P (F ) P (EF ) .
P (E F ) = P (EF ) + P (EF c ) + P (E c F )
NOTE :
Draw a Venn diagram with E and F to see this !
11
So far our sample spaces S have been finite.
S can also be countably infinite, e.g., the set Z of all integers.
S can also be uncountable, e.g., the set R of all real numbers.
12
Counting Outcomes
13
Permutations
NOTE :
For sets the order is not important. For example, the set {a,c,b} is
the same as the set {b,a,c} .
14
EXAMPLE : Suppose that four-letter words of lower case alpha-
betic characters are generated randomly with equally likely outcomes.
(Assume that letters may appear repeatedly.)
(a) How many four-letter words are there in the sample space S ?
SOLUTION : 264 = 456, 976 .
(b) How many four-letter words are there are there in S that start
with the letter s ?
SOLUTION : 263 .
15
EXAMPLE : How many re-orderings (permutations) are there of
the string abc ? (Here letters may appear only once.)
SOLUTION : Six, namely, abc , acb , bac , bca , cab , cba .
SOLUTION :
(n 1)! 1
= . ( Why ? )
n! n
16
EXAMPLE : How many
words of length k
(where k n ) ,
SOLUTION :
n (n 1) (n 2) (n (k 1))
= n (n 1) (n 2) (n k + 1)
n!
= ( Why ? )
(n k)!
17
EXAMPLE : Three-letter words are generated randomly from the
five characters a , b , c , d , e , where letters can be used at most
once.
(a) How many three-letter words are there in the sample space S ?
SOLUTION : 5 4 3 = 60 .
18
(c) Suppose the 60 solutions in the sample space are equally likely .
SOLUTION :
18
= 0.3 .
60
19
EXERCISE :
How many special words are in S for which only the second
and the fourth character are vowels, i.e., one of {a, e, i, o, u, y} ?
20
Combinations
Then
a combination of k elements from S ,
is
any selection of k elements from S ,
21
EXAMPLE :
S = {a , b , c} ,
namely,
ab , ba , ac , ca , bc , cb .
22
In general, given
a set S of n elements ,
n
REMARK : The notation is referred to as
k
n choose k .
n n! n!
NOTE : = = = 1,
n n! (n n)! n! 0!
23
PROOF :
n!
n (n 1) (n 2) (n k + 1) =
(n k)!
24
EXAMPLE :
In the previous example, with 2 elements chosen from the set
{a , b , c} ,
25
EXAMPLE : If we choose 3 elements from {a , b , c , d} , then
n = 4 and k = 3 ,
so there are
4!
= 24 words, namely :
(4 3)!
26
EXAMPLE :
(b) If each of these 210 outcomes is equally likely then what is the
probability that a particular person is on the committee?
SOLUTION :
9 10 84 4
/ = = . ( Why ? )
3 4 210 10
27
(c) What is the probability that a particular person is not on the
committee?
SOLUTION :
9 10 126 6
/ = = . ( Why ? )
4 4 210 10
28
EXAMPLE : Two balls are selected at random from a bag with
four white balls and three black balls, where order is not important.
29
EXAMPLE : ( continued )
(Two balls are selected at random from a bag with four white balls
and three black balls.)
What is the probability that both balls are white?
SOLUTION :
4 7 6 2
/ = = .
2 2 21 7
30
EXAMPLE : ( continued )
In detail, the sample space S is
n
{w1 , w2 }, {w1 , w3 }, {w1 , w4 }, | {w1 , b1 }, {w1 , b2 }, {w1 , b3 },
{w2 , w3 }, {w2 , w4 }, | {w2 , b1 }, {w2 , b2 }, {w2 , b3 },
{w3 , w4 }, | {w3 , b1 }, {w3 , b2 }, {w3 , b3 },
| {w4 , b1 }, {w4 , b2 }, {w4 , b3 },
{b1 , b2 }, {b1 , b3 }, o
S has 21 outcomes, each of which is a set . {b2 , b3 }
1
We assumed each outcome of S has probability 21
.
The event both balls are white contains 6 outcomes.
The event both balls are black contains 3 outcomes.
The event one is white and one is black contains 12 outcomes.
What would be different had we worked with sequences ?
31
EXERCISE :
What is the probability of one red, one green, and one blue ball ?
32
EXAMPLE : A bag contains 4 black balls and 4 white balls.
Suppose one draws two balls at the time, until the bag is empty.
What is the probability that each drawn pair is of the same color?
Thus the probability each pair is of the same color is 9/105 = 3/35 .
33
EXAMPLE : ( continued )
The 9 outcomes of pairwise the same color constitute the event
(
n o
{w1 , w2 } , {w3 , w4 } , {b1 , b2 } , {b3 , b4 } ,
n o
{w1 , w3 } , {w2 , w4 } , {b1 , b2 } , {b3 , b4 } ,
n o
{w1 , w4 } , {w2 , w3 } , {b1 , b2 } , {b3 , b4 } ,
n o
{w1 , w2 } , {w3 , w4 } , {b1 , b3 } , {b2 , b4 } ,
n o
{w1 , w3 } , {w2 , w4 } , {b1 , b3 } , {b2 , b4 } ,
n o
{w1 , w4 } , {w2 , w3 } , {b1 , b3 } , {b2 , b4 } ,
n o
{w1 , w2 } , {w3 , w4 } , {b1 , b4 } , {b2 , b3 } ,
n o
{w1 , w3 } , {w2 , w4 } , {b1 , b4 } , {b2 , b3 } ,
)
n o
{w1 , w4 } , {w2 , w3 } , {b1 , b4 } , {b2 , b3 } .
34
EXERCISE :
EXERCISE :
Two balls are selected at random from a bag with three white balls
and two black balls.
35
EXERCISE :
36
EXAMPLE :
How many nonnegative integer solutions are there to
x1 + x2 + x3 = 17 ?
SOLUTION :
Consider seventeen 1s separated by bars to indicate the possible
values of x1 , x2 , and x3 , e.g.,
111|111111111|11111 .
37
EXAMPLE :
How many nonnegative integer solutions are there to the inequality
x1 + x2 + x3 17 ?
SOLUTION :
Introduce an auxiliary variable (or slack variable )
x4 17 (x1 + x2 + x3 ) .
Then
x1 + x2 + x3 + x4 = 17 .
111|11111111|1111|11 .
38
111|11111111|1111|11 .
17 + 3 = 20 .
20 20! 20 19 18
= = = 1140 .
3 (20 3)! 3! 32
39
EXAMPLE :
How many positive integer solutions are there to the equation
x1 + x2 + x3 = 17 ?
SOLUTION : Let
x1 = x1 + 1 , x2 = x2 + 1 , x3 = x3 + 1 .
40
EXAMPLE :
What is the probability the sum is 9 in three rolls of a die ?
41
EXAMPLE : ( continued )
Now the equation
x1 + x2 + x3 = 6 , ( 0 x1 , x2 , x3 5 ) ,
1|111|11
has
8
= 28 solutions ,
2
28 3 25
= = 0.116 .
63 216
42
EXAMPLE : ( continued )
612 , 621 }.
43
EXERCISE :
x1 + x2 + x3 17 ,
if we require that
x1 1 , x2 2 , x3 3 ?
EXERCISE :
44
CONDITIONAL PROBABILITY
EXAMPLE :
If a coin is tossed two times then what is the probability of two
Heads?
ANSWER : 1
.
4
EXAMPLE :
If a coin is tossed two times then what is the probability of two Heads,
given that the first toss gave Heads ?
ANSWER : 1
.
2
45
NOTE :
Four suits :
Hearts , Diamonds (red ) , and Spades , Clubs (black) .
46
EXERCISE :
47
The two preceding questions are examples of conditional probability .
defined as
P (EF )
P (E|F ) .
P (F )
or, equivalently
P (EF ) = P (E|F ) P (F ) ,
48
P (EF )
P (E|F )
P (F )
E
S S E
F
F
49
P (EF )
P (E|F )
P (F )
S E S F
50
EXAMPLE : Suppose a coin is tossed two times.
We have
1
P (EF ) P (E) 4 1
P (E|F ) = = = 2 = .
P (F ) P (F ) 4
2
51
EXAMPLE :
Suppose we draw a card from a shuffled set of 52 playing cards.
What is the probability of drawing a Queen, given that the card
drawn is of suit Hearts ?
ANSWER :
1
P (QH) 52 1
P (Q|H) = = 13 = .
P (H) 52
13
(Here Q F , so that QF = Q .)
52
The probability of an event E is sometimes computed more easily
namely, from
53
EXAMPLE :
54
SOLUTION :
Thus
P (C) = P (C|U ) P (U ) + P (C|U c ) P (U c )
4 3 2 7
= +
100 10 100 10
26
= = 2.6% .
1000
55
EXAMPLE :
Two balls are drawn from a bag with 2 white and 3 black balls.
SOLUTION :
Then
c c 1 2 2 3 2
P (S) = P (S|F ) P (F ) + P (S|F ) P (F ) = + = .
4 5 4 5 5
56
EXAMPLE : ( continued )
Is it surprising that P (S) = P (F ) ?
w2 w1 , w2 b 1 , w2 b 2 , w2 b 3 ,
b 1 w1 , b 1 w2 , b1 b2 , b1 b3 ,
b 2 w1 , b 2 w2 , b2 b1 , b2 b3 ,
o
b 3 w1 , b 3 w2 , b3 b1 , b3 b2 ,
57
EXAMPLE :
ANSWER :
P (2nd card Q) =
3 4 4 48 204 4 1
= + = = = .
51 52 51 52 51 52 52 13
58
A useful formula that inverts conditioning is derived as follows :
P (EF ) = P (E|F ) P (F ) ,
and
P (EF ) = P (F |E) P (E) .
P (EF ) P (E|F ) P (F )
P (F |E) = = ,
P (E) P (E)
and, using the earlier useful formula, we get
P (E|F ) P (F )
P (F |E) = ,
P (E|F ) P (F ) + P (E|F c ) P (F c )
59
EXAMPLE : Suppose 1 in 1000 persons has a certain disease.
A test detects the disease in 99 % of diseased persons.
The test also detects the disease in 5 % of healthly persons.
With what probability does a positive test diagnose the disease?
SOLUTION : Let
D diseased , H healthy , + positive.
We are given that
P (D) = 0.001 , P (+|D) = 0.99 , P (+|H) = 0.05 .
By Bayes formula
P (+|D) P (D)
P (D|+) =
P (+|D) P (D) + P (+|H) P (H)
0.99 0.001
= = 0.0194 (!)
0.99 0.001 + 0.05 0.999
60
EXERCISE :
Suppose 1 in 100 products has a certain defect.
EXERCISE :
Suppose 1 in 2000 persons has a certain disease.
61
More generally, if the sample space S is the union of disjoint events
S = F1 F2 Fn ,
then for any event E
P (E|Fi ) P (Fi )
P (Fi |E) = .
P (E|F1 ) P (F1 ) + P (E|F2 ) P (F2 ) + + P (E|Fn ) P (Fn )
EXERCISE :
Machines M1 , M2 , M3 produce these proportions of a article
Production : M1 : 10 % , M2 : 30 % , M3 : 60 % .
Defects : M1 : 4 % , M2 : 3 % , M3 : 2 % .
62
Independent Events
P (EF ) = P (E) P (F ) .
In this case
P (EF ) P (E) P (F )
P (E|F ) = = = P (E) ,
P (F ) P (F )
Thus
63
EXAMPLE : Draw one card from a deck of 52 playing cards.
Counting outcomes we find
12 3
P (Face Card) = 52
= 13
,
13 1
P (Hearts) = 52
= 4
,
3
P (Face Card and Hearts) = 52
,
3
P (Face Card|Hearts) = 13
.
We see that
3
P (Face Card and Hearts) = P (Face Card) P (Hearts) (= ).
52
Thus the events Face Card and Hearts are independent.
64
EXERCISE :
65
EXERCISE : Two numbers are drawn at random from the set
{1, 2, 3, 4}.
X( {i, j} ) = i + j , Y ( {i, j} ) = |i j| .
(1) X = 5 and Y = 2 ,
(2) X = 5 and Y = 1 .
REMARK :
X and Y are examples of random variables . (More soon!)
66
EXAMPLE : If E and F are independent then so are E and F c .
= P (E) ( 1 P (F ) )
= P (E) P (F c ) .
EXERCISE :
Prove that if E and F are independent then so are E c and F c .
67
NOTE : Independence and disjointness are different things !
E
S S E
F
F
If E and F are independent and disjoint then one has zero probability !
68
Three events E , F , and G are independent if
69
EXERCISE :
Suppose that
9
M1 functions properly with probability 10
,
9
M2 functions properly with probability 10
,
8
M3 functions properly with probability 10
,
and that
70
DISCRETE RANDOM VARIABLES
71
Value-ranges of a random variable correspond to events in S .
72
Value-ranges of a random variable correspond to events in S ,
and
events in S have a probability .
Thus
Value-ranges of a random variable have a probability .
73
NOTATION : We will also write pX (x) to denote P (X = x) .
S = {HHH , HHT , HT H , HT T , T HH , T HT , T T H , T T T } ,
with
X(s) = the number of Heads ,
we have
1
pX (0) P ( {T T T } ) = 8
3
pX (1) P ( {HT T , T HT , T T H} ) = 8
3
pX (2) P ( {HHT , HT H , T HH} ) = 8
1
pX (3) P ( {HHH} ) = 8
where
pX (0) + pX (1) + pX (2) + pX (3) = 1 . ( Why ? )
74
E3 X(s)
HHH
S
E2 3
HHT
HTH
2
THH
HTT
E1 1
THT
TTH
E0 0
TTT
Graphical representation of X .
75
The graph of pX .
76
DEFINITION :
pX (x) P (X = x) ,
is called the probability mass function .
DEFINITION :
FX (x) P (X x) ,
is called the (cumulative) probability distribution function .
PROPERTIES :
FX (x) is a non-decreasing function of x . ( Why ? )
FX () = 0 and FX () = 1 . ( Why ? )
P (a < X b) = FX (b) FX (a) . ( Why ? )
77
EXAMPLE : With X(s) = the number of Heads , and
S = {HHH , HHT , HT H , HT T , T HH , T HT , T T H , T T T } ,
1 3 3 1
p(0) = 8
, p(1) = 8
, p(2) = 8
, p(3) = 8
,
we have the probability distribution function
F (1) P (X 1) = 0
1
F ( 0) P (X 0) = 8
4
F ( 1) P (X 1) = 8
7
F ( 2) P (X 2) = 8
F ( 3) P (X 3) = 1
F ( 4) P (X 4) = 1
We see, for example, that
P (0 < X 2) = P (X = 1) + P (X = 2)
7 1 6
= F (2) F (0) = 8
8
= 8
.
78
The graph of the probability distribution function FX .
79
EXAMPLE : Toss a coin until Heads occurs.
Then the sample space is countably infinite , namely,
S = {H , T H , T T H , T T T H , } .
80
X(s) is the number of tosses until Heads occurs
HT HH , HT HT , HT T H , HT T T ,
T HHH , T HHT , T HT H , T HT T ,
T T HH , T T HT , T T T H , TTTT }.
81
Joint distributions
The probability mass function and the probability distribution function
can also be functions of more than one variable.
S = {HHH , HHT , HT H , HT T , T HH , T HT , T T H , T T T } ,
we let
X(s) = # Heads , Y (s) = index of the first H (0 for T T T ) .
Then we have the joint probability mass function
pX,Y (x, y) = P (X = x , Y = y) .
For example,
pX,Y (2, 1) = P (X = 2 , Y = 1)
2 1
= 8
= 4
.
82
EXAMPLE : ( continued ) For
S = {HHH , HHT , HT H , HT T , T HH , T HT , T T H , T T T } ,
X(s) = number of Heads, and Y (s) = index of the first H ,
NOTE :
The marginal probability pX is the probability mass function of X.
The marginal probability pY is the probability mass function of Y .
83
EXAMPLE : ( continued )
X(s) = number of Heads, and Y (s) = index of the first H .
For example,
84
X(s)
Y(s)
E 31
HHH
S
E 13 3
TTH
E11
HTT
1
E12 THT
E 21 HTH
2
HHT
E 22 THH
0
E 00
TTT
85
DEFINITION :
pX,Y (x, y) P (X = x , Y = y) ,
is called the joint probability mass function .
DEFINITION :
FX,Y (x, y) P (X x , Y y) ,
is called the joint (cumulative) probability distribution function .
86
EXAMPLE : Three tosses : X(s) = # Heads, Y (s) = index 1st H .
Joint probability mass function pX,Y (x, y)
y=0 y=1 y=2 y=3 pX (x)
1 1
x=0 8 0 0 0 8
1 1 1 3
x=1 0 8 8 8 8
2 1 3
x=2 0 8 8 0 8
1 1
x=3 0 8 0 0 8
1 4 2 1
pY (y) 8 8 8 8 1
87
In the preceding example :
Joint probability mass function pX,Y (x, y)
y=0 y=1 y=2 y=3 pX (x)
1 1
x=0 8 0 0 0 8
1 1 1 3
x=1 0 8 8 8 8
2 1 3
x=2 0 8 8 0 8
1 1
x=3 0 8 0 0 8
1 4 2 1
pY (y) 8 8 8 8 1
QUESTION : Why is
P (1 < X 3 , 1 < Y 3) = F (3, 3) F (1, 3) F (3, 1) + F (1, 1) ?
88
EXERCISE :
Roll a four-sided die (tetrahedron) two times.
(The sides are marked 1 , 2 , 3 , 4 .)
Suppose each of the four sides is equally likely to end facing down.
Suppose the outcome of a single roll is the side that faces down ( ! ).
89
EXERCISE :
90
Independent random variables
91
X(s)
Y(s)
E 31
HHH
S
E 13 3
TTH
E11
HTT
1
E12 THT
E 21 HTH
2
HHT
E 22 THH
0
E 00
TTT
92
RECALL :
EXERCISE :
Let
X be the result of the 1st roll ,
and
Y the result of the 2nd roll .
93
EXERCISE :
94
EXERCISE : Are these random variables X and Y independent ?
95
PROPERTY :
The joint distribution function of independent random variables
X and Y satisfies
FX,Y (x, y) = FX (x) FY (y) , for all x, y .
PROOF :
FX,Y (xk , y ) = P (X xk , Y y )
P P
= ik j pX,Y (xi , yj )
P P
= ik j pX (xi ) pY (yj ) (by independence)
P P
= ik { pX (xi ) j pY (yj ) }
P P
= { ik pX (xi ) } { j pY (yj ) }
= FX (xk ) FY (y ) .
96
Conditional distributions
Then
P (Ex Ey ) pX,Y (x, y)
P (Ex |Ey ) = .
P (Ey ) pY (y)
97
X(s)
Y(s)
E 31
HHH
S
E 13 3
TTH
E11
HTT
1
E12 THT
E 21 HTH
2
HHT
E 22 THH
0
E 00
TTT
98
EXAMPLE : (3 tosses : X(s) = # Heads, Y (s) = index 1st H.)
Joint probability mass function pX,Y (x, y)
y = 0 y = 1 y = 2 y = 3 pX (x)
x=0 1
8 0 0 0 1
8
x=1 0 1
8
1
8
1
8
3
8
x=2 0 2
8
1
8 0 3
8
x=3 0 1
8 0 0 1
8
pY (y) 1
8
4
8
2
8
1
8 1
pX,Y (x,y)
Conditional probability mass function pX|Y (x|y) = pY (y)
.
y=0 y=1 y=2 y=3
x=0 1 0 0 0
x=1 0 2
8
4
8 1
x=2 0 4
8
4
8 0
x=3 0 2
8 0 0
1 1 1 1
pX,Y (x,y)
EXERCISE : Also construct the Table for pY |X (y|x) = pX (x)
.
99
EXAMPLE :
Joint probability mass function pX,Y (x, y)
y = 1 y = 2 y = 3 pX (x)
x=1 1
3
1
12
1
12
1
2
x=2 2
9
1
18
1
18
1
3
x=3 1
9
1
36
1
36
1
6
pY (y) 2
3
1
6
1
6 1
pX,Y (x,y)
Conditional probability mass function pX|Y (x|y) = pY (y)
.
y=1 y=2 y=3
x=1 1
2
1
2
1
2
x=2 1
3
1
3
1
3
x=3 1
6
1
6
1
6
1 1 1
100
Expectation
The expected value of a discrete random variable X is
X X
E[X] xk P (X = xk ) = xk pX (xk ) .
k k
101
EXAMPLE : Toss a coin until Heads occurs. Then
S = {H , T H , T T H , T T T H , } .
102
The expected value of a function of a random variable is
X
E[g(X)] g(xk ) p(xk ) .
k
EXAMPLE :
The pay-off of rolling a die is $k2 , where k is the side facing up.
What should the entry fee be for the betting to break even?
103
The expected value of a function of two random variables is
XX
E[g(X, Y )] g(xk , y ) p(xk , y ) .
k
1 1 1 5
E[X] = 1 2
+ 2 3
+ 3 6
= 3
,
2 1 1 3
E[Y ] = 1 3
+ 2 6
+ 3 6
= 2
,
1 1 1
E[XY ] = 1 3
+ 2 12
+ 3 12
2 1 1
+ 2 9
+ 4 18
+ 6 18
1 1 1 5 ( So ? )
+ 3 9
+ 6 36
+ 9 36
= 2
.
104
PROPERTY :
PROOF :
P P
E[XY ] = k xk y pX,Y (xk , y )
P P
= k xk y pX (xk ) pY (y ) (by independence)
P P
= k{ xk pX (xk ) y pY (y )}
P P
= { k xk pX (xk )} { y pY (y )}
= E[X] E[Y ] .
105
PROPERTY : E[X + Y ] = E[X] + E[Y ] . ( Always ! )
PROOF :
P P
E[X + Y ] = k (xk + y ) pX,Y (xk , y )
P P P P
= k xk pX,Y (xk , y ) + k y pX,Y (xk , y )
P P P P
= k xk pX,Y (xk , y ) + k y pX,Y (xk , y )
P P P P
= k {xk pX,Y (xk , y )} + { y k pX,Y (xk , y )}
P P
= k {xk pX (xk )} + {y pY (y )}
= E[X] + E[Y ] .
106
EXERCISE :
Probability mass function pX,Y (x, y)
y = 6 y = 8 y = 10 pX (x)
x=1 1
5 0 1
5
2
5
x=2 0 1
5 0 1
5
x=3 1
5 0 1
5
2
5
pY (y) 2
5
1
5
2
5 1
Show that
Thus if
E[XY ] = E[X] E[Y ] ,
107
Variance and Standard Deviation
Let X have mean
= E[X] .
We have
V ar(X) = E[X 2 2X + 2 ]
= E[X 2 ] 2E[X] + 2
= E[X 2 ] 22 + 2
= E[X 2 ] 2 .
108
The standard deviation of X is
p p p
(X) V ar(X) = E[ (X )2 ] = E[X 2 ] 2 .
1 6(6 + 1)(2 6 + 1) 7 2 35
= ( ) = .
6 6 2 12
109
Covariance
Let X and Y be random variables with mean
E[X] = X , E[Y ] = Y .
We have
Cov(X, Y ) = E[ (X X ) (Y Y ) ]
= E[XY X Y Y X + X Y ]
= E[XY ] X Y Y X + X Y
110
We defined
Cov(X, Y ) E[ (X X ) (Y Y ) ]
X
= (xk X ) (y Y ) p(xk , y )
k,
Cov(X, Y ) > 0 .
Cov(X, Y ) < 0 .
111
EXERCISE : Prove the following :
V ar(aX + b) = a2 V ar(X) ,
Cov(X, Y ) = Cov(Y, X) ,
Cov(cX, Y ) = c Cov(X, Y ) ,
Cov(X, cY ) = c Cov(X, Y ) ,
112
PROPERTY :
PROOF :
113
EXERCISE : ( already used earlier )
Probability mass function pX,Y (x, y)
y = 6 y = 8 y = 10 pX (x)
x=1 1
5 0 1
5
2
5
x=2 0 1
5 0 1
5
x=3 1
5 0 1
5
2
5
pY (y) 2
5
1
5
2
5 1
Show that
E[X] = 2 , E[Y ] = 8 , E[XY ] = 16
Cov(X, Y ) = E[XY ] E[X] E[Y ] = 0
X and Y are not independent
Thus if
Cov(X, Y ) = 0 ,
114
PROPERTY :
PROOF :
Cov(X, Y ) = 0 ,
115
EXERCISE :
Compute
E[X] , E[Y ] , E[X 2 ] , E[Y 2 ]
Cov(X, Y )
for
116
EXERCISE :
Compute
E[X] , E[Y ] , E[X 2 ] , E[Y 2 ]
Cov(X, Y )
for
117
SPECIAL DISCRETE RANDOM VARIABLES
P (X = 0) = 1p ,
e.g., tossing a coin, winning or losing a game, .
We have
E[X] = 1 p + 0 (1 p) = p ,
E[X 2 ] = 12 p + 02 (1 p) = p ,
118
EXAMPLES :
1
When p = 2
(e.g., for tossing a coin), we have
1 1
E[X] = p = 2
, V ar(X) = p(1 p) = 4
.
119
The Binomial Random Variable
An outcome could be
100011001010 (n = 12) ,
with probability
P (100011001010) = p5 (1 p)7 . ( Why ? )
For example,
X(100011001010) = 5 .
We have
12
P (X = 5) = p5 (1 p)7 . ( Why ? )
5
120
In general, for k successes in a sequence of n trials, we have
n
P (X = k) = pk (1 p)nk , (0 k n) .
k
121
1
The Binomial mass and distribution functions for n = 12 , p = 2
122
For k successes in a sequence of n trials :
n
P (X = k) = pk (1 p)nk , (0 k n) .
k
123
1
The Binomial mass and distribution functions for n = 12 , p = 6
124
EXAMPLE :
In 12 rolls of a die write the outcome as, for example,
100011001010
where
1 denotes the roll resulted in a six ,
and
0 denotes the roll did not result in a six .
P (X = 5)
= 2.8 % , P (X 5)
= 99.2 % .
125
EXERCISE : Show that from
n
P (X = k) = pk (1 p)nk ,
k
and
n
P (X = k + 1) = pk+1 (1 p)nk1 ,
k+1
it follows that
P (X = k + 1) = ck P (X = k) ,
where nk p
ck = .
k+1 1p
P (X = k + 1) = ck P (X = k) , k = 0, 1, , n 1 .
126
Mean and variance of the Binomial random variable :
then
X X1 + X2 + + Xn ,
is the Binomial random variable that counts the successes .
127
X X1 + X2 + + Xn
We know that
E[Xk ] = p ,
so
E[X] = E[X1 ] + E[X2 ] + + E[Xn ] = np .
128
EXAMPLES :
129
The Poisson Random Variable
130
A stable and efficient way to compute the Poisson probability
k
P (X = k) = e , k = 0, 1, 2, ,
k!
k+1
P (X = k + 1) = e ,
(k + 1)!
P (X = 0) = e ,
P (X = k + 1) = P (X = k) , k = 0, 1, 2, .
k+1
131
The Poisson random variable
k
P (X = k) = e , k = 0, 1, 2, ,
k!
has (as shown later) : E[X] = and V ar(X) = .
132
The Poisson random variable
k
P (X = k) = e , k = 0, 1, 2, ,
k!
models the probability of k successes in a given time interval,
when the average number of successes is .
133
k
n
pBinomial (k) = pk (1 p)nk = pPoisson (k) = e
k k!
EXAMPLE : = 6 customers/hour.
For the Binomial take n = 12 , p = 0.5 (0.5 customers/5 minutes) ,
so that indeed np = .
k pBinomial pPoisson FBinomial FPoisson
0 0.0002 0.0024 0.0002 0.0024
1 0.0029 0.0148 0.0031 0.0173
2 0.0161 0.0446 0.0192 0.0619
3 0.0537 0.0892 0.0729 0.1512
4 0.1208 0.1338 0.1938 0.2850
5 0.1933 0.1606 0.3872 0.4456
6 0.2255 0.1606 0.6127 0.6063
7 0.1933 0.1376 0.8061 0.7439
8 0.1208 0.1032 0.9270 0.8472
9 0.0537 0.0688 0.9807 0.9160
10 0.0161 0.0413 0.9968 0.9573
11 0.0029 0.0225 0.9997 0.9799
12 0.0002 0.0112 1.0000 0.9911 Why not 1.0000 ?
134
k
n
pBinomial (k) = pk (1 p)nk = pPoisson (k) = e
k k!
EXAMPLE : = 6 customers/hour.
For the Binomial take n = 60 , p = 0.1 (0.1 customers/minute) ,
so that indeed np = .
k pBinomial pPoisson FBinomial FPoisson
0 0.0017 0.0024 0.0017 0.0024
1 0.0119 0.0148 0.0137 0.0173
2 0.0392 0.0446 0.0530 0.0619
3 0.0843 0.0892 0.1373 0.1512
4 0.1335 0.1338 0.2709 0.2850
5 0.1662 0.1606 0.4371 0.4456
6 0.1692 0.1606 0.6064 0.6063
7 0.1451 0.1376 0.7515 0.7439
8 0.1068 0.1032 0.8583 0.8472
9 0.0685 0.0688 0.9269 0.9160
10 0.0388 0.0413 0.9657 0.9573
11 0.0196 0.0225 0.9854 0.9799
12 0.0089 0.0112 0.9943 0.9911
13
135
1
n = 12 , p = 2
, =6 n = 200 , p = 0.01 , = 2
136
For the Binomial random variable we found
E[X] = np and V ar(X) = np(1 p) ,
137
FACT : (The Method of Moments)
It follows that
(0) = E[X] , (0) = E[X 2 ] . ( Why ? )
138
APPLICATION : The Poisson mean and variance :
X X k
(t) E[etX ] = etk P (X = k) = etk e
k=0 k=0
k!
X (et )k et t
= e = e e = e(e 1) .
k=0
k!
t
Here (t) = et e(e 1)
t 2 t (et 1)
(t) = (e ) + e e ( Check ! )
E[X 2 ] = (0) = ( + 1) = 2 +
139
EXAMPLE : Defects in a wire occur at the rate of one per 10 meter,
with a Poisson distribution :
k
P (X = k) = e , k = 0, 1, 2, .
k!
What is the probability that :
Of five 12-meter rolls two have one defect and three have none?
5
ANSWER : 0.30123 0.36142 = 0.0357 . ( Why ? )
3
140
EXERCISE :
Defects in a certain wire occur at the rate of one per 10 meter.
Assume the defects have a Poisson distribution.
What is the probability that :
a 20-meter wire has no defects?
EXERCISE :
Customers arrive at a counter at the rate of 8 per hour.
Assume the arrivals have a Poisson distribution.
What is the probability that :
no customer arrives in 15 minutes?
141
CONTINUOUS RANDOM VARIABLES
X() : S R.
EXAMPLE :
Rotate a pointer about a pivot in a plane (like a hand of a clock).
The outcome is the angle where it stops : 2 , where (0, 1] .
A good sample space is all values of , i.e. S = (0, 1] .
A very simple example of a continuous random variable is X() = .
142
The (cumulative) probability distribution function is defined as
FX (x) P (X x) .
Thus
FX (b) FX (a) P (a < X b) .
We must have
FX () = 0 and FX () = 1 ,
i.e.,
lim FX (x) = 0 ,
x
and
lim FX (x) = 1 .
x
NOTE : All the above is the same as for discrete random variables !
143
EXAMPLE : In the pointer example , where X() = , we have
the probability distribution function
F(theta)
1/2
1/3
theta
0 1/3 1/2 1
Note that
F ( 31 ) P (X 13 ) = 1
3
, F ( 21 ) P (X 12 ) = 1
2
,
P ( 31 < X 12 ) = F ( 12 ) F ( 13 ) = 1
2
1
3
= 1
6
.
QUESTION : What is P ( 31 X 21 ) ?
144
The probability density function is the derivative of the probability
distribution function :
d
fX (x) FX (x) FX (x) .
dx
Thus
0, x0
fX (x) = FX (x) = 1, 0<x1
0, 1<x
145
EXAMPLE : ( continued )
0, x0 0, x0
F (x) = x, 0<x1 , f (x) = 1, 0<x1
1, 1<x 0, 1<x
F(theta) f(theta)
1 1
1/2
1/3
theta theta
0 1/3 1/2 1 0 1/3 1/2 1
NOTE :
1
1 1 1
Z
2
P( < X ) = f (x) dx = = the shaded area .
3 2 1
3
6
146
In general, from
f (x) F (x) ,
with
F () = 0 and F () = 1 ,
Z x
f (x) dx = F (x) F () = F (x) = P (X x) ,
Z b
f (x) dx = F (b) F (a) = P (a < X b) ,
a
Z a
f (x) dx = F (a) F (a) = 0 = P (X = a) .
a
147
EXERCISE : Draw graphs of the distribution and density functions
0, x0 0, x0
F (x) = , f (x) = ,
1 ex , x > 0 ex , x > 0
P (X > 1) = 1 F (1) = e1
= 0.37 ,
148
EXERCISE : For positive integer n, consider the density functions
n
cx (1 xn ) , 0 x 1
fn (x) =
0, otherwise
Determine P (0 X 21 ) in terms of n .
149
Joint distributions
2 FX,Y (x, y)
By Calculus we have = fX,Y (x, y) .
xy
Also, Z d Z b
P (a < X b , c < Y d) = fX,Y (x, y) dx dy .
c a
150
EXAMPLE :
If
1 for x (0, 1] and y (0, 1] ,
fX,Y (x, y) =
0 otherwise ,
Thus
FX,Y (x, y) = xy , for x (0, 1] and y (0, 1] .
For example
1 1 1 1 1
P( X , Y ) = FX,Y ( , ) = .
3 2 3 2 6
151
0.9
1.5 0.8
0.7
1.0 0.6
f 0.5
0.5
F 0.4
0.3
0.2
0.1 0.1
0.2 0.1
0.2
0.3 0.3
0.4
y 0.4 0.5 0.3
0.2
0.1
0.1 0.2
0.3 0.4 0.7
0.6
0.5 x
0.5
0.6
0.7 0.5
0.4
x y 0.6 0.7 0.8 0.9
0.9
0.8
0.6
0.8 0.7
0.9 0.8
0.9
Also,
3 1
1 1 1 3 1
Z Z
4 2
P( X , Y ) = f (x, y) dx dy = .
3 2 4 4 1
4
1
3
12
152
Marginal density functions
Z y Z y Z
FY (y) P (Y y) = fY (y) dy = fX,Y (x, y) dx dy .
By Calculus we have
dFX (x) dFY (y)
= fX (x) , = fY (y) .
dx dy
153
EXAMPLE : If
1 for x (0, 1] and y (0, 1] ,
fX,Y (x, y) =
0 otherwise ,
154
EXERCISE :
(1 ex )(1 ey )
for x 0 and y 0 ,
Let FX,Y (x, y) =
0 otherwise .
Verify that
2
exy
F for x 0 and y 0 ,
fX,Y (x, y) = =
xy 0 otherwise .
0.9
0.8
0.7
0.9
0.6 0.8
0.5 0.7
f 0.4 0.6
0.5
0.3
F 0.4
0.2 0.3
0.1 0.2
0.1
1 1
2 1
1 2
3 2 2
y 4
3 3
4 x y
3 x
4 4
155
EXERCISE : ( continued )
FX,Y (x, y) = (1ex )(1ey ) , fX,Y (x, y) = exy , for x, y 0 .
RR
0 0
fX,Y (x, y) dx dy = 1 , ( Why zero lower limits ? )
R
fX (x) = 0
exy dy = ex ,
R
fY (y) = 0
exy dx = ey .
156
EXERCISE : ( continued )
= (e1 e2 )(1 e1 )
= 0.15 ,
157
Independent continuous random variables
Equivalently, X(s) and Y (s) are independent if for all such sets
IX and IY the events
X 1 (IX ) and Y 1 (IY ) ,
are independent in the sample space S.
158
FACT : X(s) and Y (s) are independent if for all x and y
fX,Y (x, y) = fX (x) fY (y) .
NOTE :
(1 ex )(1 ey ) for x 0 and y 0 ,
FX,Y (x, y) =
0 otherwise ,
also satisfies (by the preceding exercise)
FX,Y (x, y) = FX (x) FY (y) .
159
PROPERTY :
For independent continuous random variables X and Y we have
FX,Y (x, y) = FX (x) FY (y) , for all x, y .
PROOF :
FX,Y (x, y) = P (X x , Y y)
Rx Ry
=
fX,Y (x, y) dy dx
Rx Ry
=
fX (x) fY (y) dy dx (by independence)
Rx Ry
=
[ fX (x)
fY (y) dy ] dx
Rx Ry
= [
fX (x) dx ] [
fY (y) dy ]
= FX (x) FY (y) .
REMARK : Note how the proof parallels that for the discrete case !
160
Conditional distributions
Let X and Y be continuous random variables.
For given allowable sets IX and IY (typically intervals), let
Ex = X 1 (IX ) and Ey = Y 1 (IY ) ,
be their corresponding events in the sample space S .
P (Ex Ey )
We have P (Ex |Ey ) .
P (Ey )
161
EXAMPLE : The random variables with density function
xy
e for x 0 and y 0 ,
fX,Y (x, y) =
0 otherwise ,
162
Expectation
163
EXAMPLE :
we have
1
x2 1 1
Z Z
E[X] = x fX (x) dx = x dx = = ,
0 2 0 2
and
1
x3 1 1
Z Z
2 2 2
E[X ] = x fX (x) dx = x dx = = .
0 3 0 3
164
EXAMPLE : For the joint density function
xy
e for x > 0 and y > 0 ,
fX,Y (x, y) =
0 otherwise .
Z
Thus E[X] = x ex dx = [(x+1)ex ] = 1 . ( Check ! )
0 0
Z
Similarly E[Y ] = y ey dy = 1 ,
0
and Z Z
E[XY ] = xy exy dy dx = 1 . ( Check ! )
0 0
165
EXERCISE :
Prove the following for continuous random variables :
E[aX] = a E[X] ,
E[aX + b] = a E[X] + b ,
EXERCISE :
A stick of length 1 is split at a randomly selected point X.
( Thus X is uniformly distributed in the interval [0, 1]. )
Determine the expected length of the piece containing the point 1/3.
166
PROPERTY : If X and Y are independent then
R R
E[XY ] = R R
x y fX,Y (x, y) dy dx
R R
= R R
x y fX (x) fY (y) dy dx (by independence)
R R
= R
[ x fX (x) R
y fY (y) dy ] dx
R R
= [ R x fX (x) dx ] [ R y fY (y) dy ]
= E[X] E[Y ] .
REMARK : Note how the proof parallels that for the discrete case !
167
EXAMPLE : For xy
e for x > 0 and y > 0 ,
fX,Y (x, y) =
0 otherwise ,
we already found
fX (x) = ex , fY (y) = ey ,
so that
fX,Y (x, y) = fX (x) fY (y) ,
168
Variance Z
Let = E[X] = x fX (x) dx
169
ex ,
x>0,
EXAMPLE : For f (x) =
0, x0,
we have
R
E[X] = = 0
x ex dx = 1 ( already done ! ) ,
R
E[X 2 ] = x2 ex dx = [(x2 + 2x + 2)ex ] = 2,
0
0
V ar(X) = E[X 2 ] 2 = 2 12 = 1 ,
p
(X) = V ar(X) = 1 .
EXERCISE :
Also use the Method of Moments to compute E[X] and E[X 2 ] .
170
EXERCISE : For the random variable X with density function
0, x 1
f (x) = c , 1 < x 1
0, x>1
171
EXERCISE : For the random variable X with density function
x + 1 , 1 < x 0
f (x) = 1x , 0<x1
0, otherwise
172
EXERCISE : For the random variable X with density function
3
(1 x2 ) , 1 < x 1
f (x) = 4
0, otherwise
Draw the graph of f (x)
R
Verify that f (x) dx = 1
Determine the distribution function F (x)
Draw the graph of F (x)
Determine E[X]
Compute V ar(X) and (X)
Determine P (X 0)
Compute P (X 23 )
Compute P (| X | 23 )
173
EXERCISE : Recall the density function
n
cx (1 xn ) , 0 x 1
fn (x) =
0, otherwise
Determine E[X 2 ]
174
Covariance
Let X and Y be continuous random variables with mean
E[X] = X , E[Y ] = Y .
Cov(X, Y ) E[ (X X ) (Y Y ) ]
Z Z
= (x X ) (y Y ) fX,Y (x, y) dy dx .
= E[XY X Y Y X + X Y ]
175
As in the discrete case, we also have
PROPERTY 1 :
V ar(X + Y ) = V ar(X) + V ar(Y ) + 2 Cov(X, Y ) ,
and
NOTE :
The proofs are identical to those for the discrete case !
176
EXAMPLE : For
xy
e for x > 0 and y > 0 ,
fX,Y (x, y) =
0 otherwise ,
we already found
fX (x) = ex , fY (y) = ey ,
so that
fX,Y (x, y) = fX (x) fY (y) ,
177
EXERCISE :
V ar(cX + d) = c2 V ar(X) ,
Cov(X, Y ) = Cov(Y, X) ,
Cov(cX, Y ) = c Cov(X, Y ) ,
Cov(X, cY ) = c Cov(X, Y ) ,
178
EXERCISE :
R1R1
Verify that 0 0
f (x, y) dy dx = 1 .
179
The joint probability density function fXY (x, y) .
180
Markovs inequality.
For a continuous nonnegative random variable X , and c > 0 ,
we have
E[X]
P (X c) .
c
PROOF :
Z Z c Z
E[X] = xf (x) dx = xf (x) dx + xf (x) dx
0 0 c
Z
xf (x) dx
c
Z
c f (x) dx ( Why ? )
c
= c P (X c) .
EXERCISE :
Show Markovs inequality also holds for discrete random variables.
181
Markovs inequality : For continuous nonnegative X , c > 0 :
E[X]
P (X c) .
c
x
EXAMPLE : For e for x > 0 ,
f (x) =
0 otherwise ,
we have
Z
E[X] = x ex dx = 1 ( already done ! )
0
E[X] 1
c = 10 : P (X 10) = = 0.1
10 10
182
QUESTION : Are these estimates sharp ?
Markovs inequality gives
E[X] 1
c=1 : P (X 1) = = 1 (!)
1 1
E[X] 1
c = 10 : P (X 10) = = 0.1
10 10
The actual values are
Z
P (X 1) = ex dx = e1
= 0.37
1
Z
P (X 10) = ex dx = e10
= 0.000045
10
183
Chebyshevs inequality: For (practically) any random variable X:
1
P( | X | k ) 2
,
k
p
where = E[X] is the mean, = V ar(X) the standard deviation.
P ( | X | k ) = P ( (X )2 k2 2 ) = P ( Y k2 2 )
E[ Y ] V ar(X) 2 1
= = 2 2 = . QED !
k2 2 k2 2 k k 2
184
EXAMPLE : Suppose the value of the Canadian dollar in terms of
the US dollar over a certain period is a random variable X with
What can be said of the probability that the Canadian dollar is valued
between $0.88US and $1.08US ,
that is,
between 2 and + 2 ?
185
EXERCISE :
The score of students taking an examination is a random variable
with mean = 65 and standard deviation = 5 .
186
SPECIAL CONTINUOUS RANDOM VARIABLES
F(x1)
F(x2)
x x
a x1 x2 b a x1 x2 b
187
EXERCISE :
has mean
a+b
= ,
2
188
A joint uniform random variable :
1 (x a)(y c)
f (x, y) = , F (x, y) = ,
(b a)(d c) (b a)(d c)
0.9
1.5 0.8
0.7
1.0 0.6
f 0.5
0.5
F 0.4
0.3
0.2
0.1 0.1
0.2 0.1
0.2
0.3 0.3
0.4
y 0.4 0.5 0.3
0.2
0.1
0.1 0.2
0.3 0.4 0.7
0.6
0.5 x
0.5
0.6
0.7 0.5
0.4
x y 0.6 0.7 0.8 0.9
0.9
0.8
0.6
0.8 0.7
0.9 0.8
0.9
189
EXERCISE :
What is P (X < 0) ?
What is f ( x | y = 1 ) ?
190
The Exponential Random Variable
x
e , x>0 1 ex , x > 0
f (x) = , F (x) =
0, x0 0, x0
with
R x 1
E[X] = = 0
x e dx =
( Check ! ) ,
2
R 2
E[X ] = 0
x2 ex dx = 2
( Check ! ) ,
1
V ar(X) = E[X 2 ] 2 = 2
,
p 1
(X) = V ar(X) =
.
191
2.0 1.0
0.8
1.5
0.6
F (x)
f (x)
1.0
0.4
0.5
0.2
0.0 0.0
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0
x x
for = 0.25, 0.50, 0.75, 1.00 (blue), 1.25, 1.50, 1.75, 2.00 (red ).
192
PROPERTY : From
we have
P (X > x) = 1 (1 ex ) = ex .
193
EXAMPLE :
f (t) = et , (taking = 1 ) ,
with
194
EXAMPLE : ( continued ) F (t) = 1 et .
P (Et ) = 1 F (t) = et .
P (Et+1 ) = 1 F (t + 1) = e(t+1) .
which is independent of t !
195
The Standard Normal Random Variable
Since
Z
E[X 2 ] = x2 f (x) dx = 1 , ( more difficult )
we have
196
1 12 x2
f (x) = e
2
0.40
0.35
0.30
0.25
f (x)
0.20
0.15
0.10
0.05
0.00
4 3 2 1 0 1 2 3 4
x
The standard normal density function f (x) .
197
x
1
Z
12 x2
(x) = F (x) = e dx
2
1.0
0.9
0.8
0.7
0.6
F (x)
0.5
0.4
0.3
0.2
0.1
0.0
4 3 2 1 0 1 2 3 4
x
The standard normal distribution function F (x)
( often denoted by (x) ) .
198
The Standard Normal Distribution (z)
z (z) z (z)
0.0 .5000 -1.2 .1151
-0.1 .4602 -1.4 .0808
-0.2 .4207 -1.6 .0548
-0.3 .3821 -1.8 .0359
-0.4 .3446 -2.0 .0228
-0.5 .3085 -2.2 .0139
-0.6 .2743 -2.4 .0082
-0.7 .2420 -2.6 .0047
-0.8 .2119 -2.8 .0026
-0.9 .1841 -3.0 .0013
-1.0 .1587 -3.2 .0007
199
EXERCISE :
P ( X 0.5 )
P ( X 0.5 )
P ( | X | 0.5 )
P ( | X | 0.5 )
P( 1 X 1 )
P ( 1 X 0.5 )
200
The General Normal Random Variable
E[X] = ( Why ? )
V ar(X) E[(X )2 ] = 2 ,
201
The standard normal (black) and the normal density functions
with = 1, = 0.5 (red ) and = 1.5, = 2.5 (blue).
202
To compute the mean of the general normal density function
1 21 (x)2 / 2
f (x) = e
2
consider
Z
E[X ] = (x ) f (x) dx
1
Z
12 (x)2 / 2
= (x ) e dx
2
2 1 2 2
= e 2 (x) / = 0.
2
E[X] = .
203
NOTE : If X is general normal we have the very useful formula :
X
P( c ) = (c) ,
i.e., we can use the Table of the standard normal distribution !
Let y (x )/ , so that x = + y .
204
EXERCISE : Suppose X is normally distributed with
P ( X 0.5 )
P ( X 0.5 )
P ( | X | 0.5 )
P ( | X | 0.5 )
205
The Chi-Square Random Variable
Suppose X1 , X2 , , Xn ,
are independent standard normal random variables.
NOTE :
2
The in 2n is part of its name , while 2
in X12 , etc. is power 2 !
206
0.5 1.0
0.4 0.8
0.3 0.6
F (x)
f (x)
0.2 0.4
0.1 0.2
0.0 0.0
0 2 4 6 8 10 0 2 4 6 8 10
x x
207
If n = 1 then
21 X12 , where X X1 is standard normal .
We can compute the moment generating function of 21 :
Z
t21 tX 2 1 tx2 12 x2
E[e ] = E[e ] = e e dx
2
1
Z
12 x2 (12t)
= e dx
2
Let
1 1
1 2t = 2 , or equivalently , .
1 2t
Then
1 1
Z
t21 12 x2 / 2
E[e ] = e dx = = .
2 1 2t
(integral of a normal density function)
208
Thus we have found that :
t21 1
(t) E[e ] = ,
1 2t
209
We found that
E[21 ] = 1 , V ar(21 ) = 2 .
and
(2n ) = 2n .
210
0.16
0.14
0.12
0.10
f (x)
0.08
0.06
0.04
0.02
0.00
0 5 10 15 20 25
x
The Chi-Square density functions for n = 5, 6, , 15 .
( For large n they look like normal density functions ! )
211
The 2n - Table
n = 0.975 = 0.95 = 0.05 = 0.025
5 0.83 1.15 11.07 12.83
6 1.24 1.64 12.59 14.45
7 1.69 2.17 14.07 16.01
8 2.18 2.73 15.51 17.54
9 2.70 3.33 16.92 19.02
10 3.25 3.94 18.31 20.48
11 3.82 4.58 19.68 21.92
12 4.40 5.23 21.03 23.34
13 5.01 5.89 22.36 24.74
14 5.63 6.57 23.69 26.12
15 6.26 7.26 25.00 27.49
2n X1 + X2 + + Xn ,
where
212
RECALL :
each having
then
S X1 + X2 + + Xn ,
has
mean : S E[S] = n ( Why ? )
213
THEOREM (The Central Limit Theorem) (CLT) :
each having
is approximately normal .
S n
NOTE : Thus is approximately standard normal .
n
214
EXAMPLE : Recall that
2n X12 + X22 + + Xn2 ,
215
EXERCISE :
xn
P( 2n x) = ( ),
2n
P ( 232 24 )
P ( 232 40 )
P ( | 232 32 | 8 )
216
RECALL :
each having
then
1
X (X1 + X2 + + Xn ) ,
n
has
mean : X = E[X] = ( Why ? )
2 1
variance : X = n2
n 2 = 2 /n ( Why ? )
Standard deviation : X = / n
217
COROLLARY (to the Central Limit Theorem) :
each having
X
NOTE : Thus is approximately standard normal .
/ n
218
EXAMPLE : Suppose X1 , X2 , , Xn are
identical , independent , uniform random variables ,
each having density function
1
f (x) = , for x [1, 1] , ( 0 otherwise ) ,
2
with 1
mean = 0 , standard deviation = ( Check ! )
3
219
EXERCISE : In the preceding example
x0
P (X x)
= ( ) ( 3n x ) .
1/ 3n
220
EXPERIMENT : ( a lengthy one ! )
221
EXPERIMENT : ( continued )
222
EXPERIMENT : ( continued )
223
EXPERIMENT : ( continued )
2
Divide [1, 1] into M subintervals of equal size x = M
.
mk
Let f (xk ) = N x
, (N is the total # of random numbers) .
Then 1 f (x) dx
R1 PM PM mk
= k=1 f (xk ) x = k=1 N x x = 1 ,
and f (xk ) approximates the value of the density function .
224
EXPERIMENT : ( continued )
Interval Frequency Sum f (x) F (x)
1 50013 50013 0.500 0.067
2 50033 100046 0.500 0.133
3 50104 150150 0.501 0.200
4 49894 200044 0.499 0.267
5 50242 250286 0.502 0.334
6 49483 299769 0.495 0.400
7 50016 349785 0.500 0.466
8 50241 400026 0.502 0.533
9 50261 450287 0.503 0.600
10 49818 500105 0.498 0.667
11 49814 549919 0.498 0.733
12 50224 600143 0.502 0.800
13 49971 650114 0.500 0.867
14 49873 699987 0.499 0.933
15 50013 750000 0.500 1.000
Frequency Table, showing the count per interval .
(N = 750, 000 random numbers, M = 15 intervals)
225
EXPERIMENT : ( continued )
1.0
0.8
0.6
f (x)
0.4
0.2
0.0
1.0 0.5 0.0 0.5 1.0
x
The approximate density function , f (xk ) = Nmx
k
for
N = 5, 000, 000 random numbers, and M = 200 intervals.
226
EXPERIMENT : ( continued )
1.0 1.0
0.9
0.8 0.8
0.7
0.6 0.6
F (x)
f (x)
0.5
0.4 0.4
0.3
0.2 0.2
0.1
0.0 0.0
1.0 0.5 0.0 0.5 1.0 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00
x x
227
EXPERIMENT : ( continued )
228
EXPERIMENT : ( continued )
229
EXPERIMENT : ( continued )
mk
As before, let fn (xk ) = N x
.
230
EXPERIMENT : ( continued )
Interval Frequency Sum f (x) F (x)
1 0 0 0.00000 0.00000
2 0 0 0.00000 0.00000
3 0 0 0.00000 0.00000
4 11 11 0.00011 0.00001
5 1283 1294 0.01283 0.00173
6 29982 31276 0.29982 0.04170
7 181209 212485 1.81209 0.28331
8 325314 537799 3.25314 0.71707
9 181273 719072 1.81273 0.95876
10 29620 748692 0.29620 0.99826
11 1294 749986 0.01294 0.99998
12 14 750000 0.00014 1.00000
13 0 750000 0.00000 1.00000
14 0 750000 0.00000 1.00000
15 0 750000 0.00000 1.00000
Frequency Table for X, showing the count per interval .
(N = 750, 000 values of X, M = 15 intervals, sample size n = 25)
231
EXPERIMENT : ( continued )
3.5 1.0
0.9
3.0
0.8
2.5 0.7
0.6
2.0
F (x)
f (x)
0.5
1.5
0.4
1.0 0.3
0.2
0.5
0.1
0.0 0.0
1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.0
x x
232
EXPERIMENT : ( continued )
Recall that for uniform random variables Xi on [1, 1]
1
X (X1 + X2 + + Xn ) ,
n
is approximately normal , with
1
mean = 0 , standard deviation = .
3n
Thus for each n we can normalize x and fn (x) :
x x0 fn (x)
x = = = 3n x , n (x) =
f .
1 3n
3n
233
0.40
0.35
0.30
0.25
fm(x)
0.20
0.15
0.10
0.05
0.00
4 3 2 1 0 1 2 3 4
x
The normalized density functions fn (x) , for n = 1, 2, 5, 10, 25 .
( N = 5, 000, 000 values of X , M = 200 intervals )
234
EXERCISE : Suppose
X1 , X2 , , X12 , (n = 12) ,
are identical, independent, uniform random variables on [0, 1] .
1
P (X 3
)
2
P (X 3
)
1 1
P (| X 2
| 3
)
235
EXERCISE : Suppose
X1 , X2 , , X9 , (n = 9) ,
are identical, independent, exponential random variables, with
f (x) = ex , where = 1 .
P (X 0.4)
P (X 1.6)
P (| X 1 | 0.6)
236
EXERCISE : Suppose
X1 , X2 , , Xn ,
Let
1
X (X1 + X2 + + Xn ) .
n
P (| X | 1) 90 % .
237
EXAMPLE : The CLT also applies to discrete random variables .
The Binomial random variable , with
n
P (X = k) = pk (1 p)nk , (0 k n) ,
k
is already a sum (namely, of Bernoulli random variables).
Thus its binomial probability mass function already looks normal :
0.09
0.08
0.25
0.07
0.20 0.06
0.05
p(x)
p(x)
0.15
0.04
0.10 0.03
0.02
0.05
0.01
0.00 0.00
0 2 4 6 8 10 0 10 20 30 40 50 60 70 80 90 100
x x
Binomial : n = 10 , p = 0.3 Binomial : n = 100 , p = 0.3
238
EXAMPLE : ( continued )
239
EXAMPLE : ( continued )
240
EXERCISE :
1
Consider the Binomial distribution with n = 676 and p = 26
:
0.08
0.07
0.06
0.05
p(x)
0.04
0.03
0.02
0.01
0.00
0 10 20 30 40 50
x
1
The Binomial (n = 676, p = 26
), shown in [0, 50] .
241
1
EXERCISE : (continued ) (Binomial : n = 676 , p = 26
)
242
EXPERIMENT :
Any conclusions ?
243
EXPERIMENT : ( continued )
Any conclusions ?
244
EXPERIMENT : ( continued )
Compare the accuracy of the Poisson and the adjusted Normal
approximations to the Binomial, for different values of n .
k n Binomial Poisson Normal
0 4 0.9606 0.9608 0.9896
0 8 0.9227 0.9231 0.9322
0 16 0.8515 0.8521 0.8035
0 32 0.7250 0.7261 0.6254
0 64 0.5256 0.5273 0.4302
1 128 0.6334 0.6339 0.5775
2 256 0.5278 0.5285 0.4850
5 512 0.5948 0.5949 0.5670
10 1024 0.5529 0.5530 0.5325
20 2048 0.5163 0.5165 0.5018
40 4096 0.4814 0.4817 0.4712
P (X k) , where k = np, with p = 0.01 .
Any conclusions ?
245
SAMPLE STATISTICS
246
DEFINITIONS :
X1 , X2 , , Xn .
A statistic is a function of X1 , X2 , , Xn .
247
EXAMPLES :
( to be discussed in detail )
The sample standard deviation S = S2 .
248
For a random sample
X1 , X2 , , Xn ,
The sample range : the difference between the largest and the
smallest observation.
249
EXAMPLE : For the 8 observations
Sample mean :
1
X = ( 0.737 + 0.511 0.083 + 0.066
8
0.562 0.906 + 0.358 + 0.359 ) = 0.124 .
Sample variance :
1
{ (0.737 X)2 + (0.511 X)2 + (0.083 X)2
8
+ (0.066 X)2 + (0.562 X)2 + (0.906 X)2
+ (0.358 X)2 + (0.359 X)2 } = 0.26 .
Sample standard deviation : 0.26 = 0.51 .
250
EXAMPLE : ( continued )
we also have
251
The Sample Mean
Suppose the population mean and standard deviation are and .
252
How well does the sample mean approximate the population mean ?
X
,
/ n
X
P( | | z)
= 1 2 (z) .
/ n
253
It follows that
X z
P | | z = P | X |
/ n n
z z
= P [ X , X + ]
n n
= 1 2 (z) ,
254
z z
We found : P ( [X , X + ] )
= 1 2 (z) .
n n
255
EXERCISE :
As in the preceding example, is unknown, = 3 , X = 4.5 .
Use the formula
z z
P ( [X , X + ])
= 1 2 (z) ,
n n
to determine
The 50 % confidence interval estimate of when n = 25 .
The 50 % confidence interval estimate of when n = 100 .
The 95 % confidence interval estimate of when n = 100 .
256
The Sample Variance We defined the sample variance as
n n
2 1 X X 1
S (Xk X)2 = 2
[ (Xk X) ].
n k=1 k=1
n
257
We have just argued that the sample variance
n
1 X
S2 (Xk X)2 ,
n k=1
Nevertheless, we will show that for large n their values are close !
258
FACT 1 : We (obviously) have that
n n
1 X X
X = Xk implies Xk = nX .
n k=1 k=1
FACT 2 : From
2 V ar(X) E[(X )2 ] = E[X 2 ] 2 ,
we (obviously) have
E[X 2 ] = 2 + 2 .
259
FACT 4 : ( Useful for computing S 2 efficiently ) :
n n
1 X 1 X 2
S2 (Xk X)2 = [ Xk ] X 2 .
n k=1 n k=1
PROOF :
n
1 X
S2 = (Xk X)2
n k=1
n
1 X 2
= (Xk 2Xk X + X 2 )
n k=1
n n
1 X 2 X
= [ Xk 2X Xk + nX 2 ] ( now use Fact 1 )
n k=1 k=1
n n
1 X 2 1 X 2
= [ Xk 2nX 2 + nX 2 ] = [ Xk ] X 2 QED !
n k=1 n k=1
260
THEOREM : The sample variance
n
1 X
S2 (Xk X)2
n k=1
has expected value
1
E[S ] = (1 ) 2 .
2
PROOF : n
n
2 1X
E[S ] = E[ (Xk X)2 ]
n k=1
h 1X n i
= E [Xk2 ] X 2 ( using Fact 4 )
n k=1
n
1X
= E[Xk2 ] E[X 2 ]
n k=1
= 2 + 2 (X
2
+ 2X ) ( using Fact 2 n + 1 times ! )
2
2 2 2 1 2
= + ( + ) = (1 ) . ( Fact 3 ) QED !
n n
REMARK : Thus limn E[S 2 ] = 2 .
261
Most authors instead define the sample variance as
n
1 X
S 2 (Xk X)2 .
n 1 k=1
262
EXAMPLE : The random sample of 120 values of a uniform
random variable on [1, 1] in an earlier Table has
120
1 X
X = Xk = 0.030 ,
120 k=1
120
1 X
S2 = (Xk X)2 = 0.335 ,
120 k=1
S = S 2 = 0.579 ,
while
= 0,
Z 1
2 2 1 1
= (x ) dx = ,
1 2 3
1
= 2
= = 0.577 .
3
What do you say ?
263
EXAMPLE :
264
EXAMPLE : ( continued )
500
1 X
Results : X = Xk = 0.00136 ,
500 k=1
500
1 X
S2 = (Xk X)2 = 0.00664 ,
500 k=1
S = S 2 = 0.08152 .
EXERCISE :
What is the value of E[X] ?
Compare X to E[X] .
What is the value of V ar(X) ?
Compare S 2 to V ar(X) .
265
Estimating the variance of a normal distribution
We have shown that
n
1 X
S2 (Xk X)2
= 2 .
n k=1
How good is this approximation for normal random variables Xk ?
To answer this we need :
FACT 5 :
X n n
X
(Xk )2 (Xk X)2 = n(X )2 .
k=1 k=1
PROOF :
Pn 2 2 2 2
LHS = k=1 { X k 2X k + X k + 2X k X X }
= 2nX + n2 + 2nX 2 nX 2
266
Rewrite Fact 5
X n n
X
(Xk )2 (Xk X)2 = n(X )2 ,
k=1 k=1
as n n
X Xk 2 n 1 X 2
X 2
2
(X k X) = ,
k=1
n k=1 / n
and then as n
X n 2
Zk2 2
S = Z 2
,
k=1
where
S 2 is the sample variance ,
and
Z and Zk are standard normal because the Xk are normal .
267
We have found that
n 2 2 2
2
S = n 1 .
268
n1 2 2
For normal random variables : 2
S has the n1 distribution
SOLUTION : n 1 15
P (S 129) = P (S 2 1292 ) = P S 2
129 2
2 1002
= P (215 24.96)
= 5 % ( from the 2 Table ) .
269
0.16
0.14
0.12
0.10
f (x)
0.08
0.06
0.04
0.02
0.00
0 5 10 15 20 25
x
The Chi-Square density functions for n = 5, 6, , 15 .
(For large n they look like normal density functions .)
270
EXERCISE :
In the preceding example, also compute
P ( 215 24.96 )
using the standard normal approximation .
EXERCISE :
Consider the same shipment of light bulbs :
271
EXAMPLE : For the data below from a normal population :
272
SOLUTION : We have n = 16 , X = 0.00575 , S 2 = 0.02278 .
(n 1) S 2 2 (n 1)S 2 15 0.02278
2
= 6.26 = = = 0.05458
6.26 6.26
(n 1) S 2 2 (n 1)S 2 15 0.02278
= 27.49 = = = 0.01223
2 27.49 27.49
273
Samples from Finite Populations
274
EXAMPLE :
275
With replacement : The possible samples are
(1, 1) , (1, 2) , (1, 3) , (2, 1) , (2, 2) , (2, 3) , (3, 1) , (3, 2) , (3, 3) ,
1
each with equal probability 9
.
276
Without replacement : The possible samples are
(1, 2) , (1, 3) , (2, 1) , (2, 3) , (3, 1) , (3, 2) ,
1
each with equal probability 6
.
The sample means X are
3 3 5 5
, 2 , , , 2 , ,
2 2 2 2
with expected value
1 3 3 5 5
E[X] = ( + 2 + + + 2 + ) = 2.
6 2 2 2 2
The sample variances S 2 are
1 1 1 1
, 1 , , , 1 , . ( Check ! )
4 4 4 4
with expected value
2 1 1 1 1 1 1
E[S ] = ( + 1 + + + 1 + ) = .
6 4 4 4 4 2
277
EXAMPLE : ( continued )
278
EXAMPLE : ( continued )
We have computed :
2
Population statistics : = 2 , 2 = 3
,
2 1 2 1
E[S ] = (1 ) = .
2 3
279
QUESTION :
Why is E[S 2 ] wrong for sampling without replacement ?
280
NOTE :
{ 1 , 2 , 3 , , N } .
P (X2 = 1 | X1 = 1) = 0 ,
281
The Sample Correlation Coefficient
We have
| X,Y | X Y , (the Cauchy-Schwartz inequality )
Thus | X,Y | 1 , ( Why ? )
If X and Y are independent then X,Y = 0 . ( Why ? )
282
Similarly, the sample correlation coefficient of a data set
{ (Xi , Yi ) }N
i=1 ,
is defined as
PN
i=1 (Xi X)(Yi Y )
RX,Y qP qP ;
N 2 N 2
i=1 (X i X) i=1 (Yi Y )
283
The sample correlation coefficient
PN
i=1 (Xi X)(Yi Y )
RX,Y qP qP .
N 2 N 2
i=1 (X i X) i=1 (Yi Y )
In fact,
If | RX,Y | = 1 then X and Y are related linearly .
Specifically,
If RX,Y = 1 then Yi = cXi + d, for constants c, d, with c > 0 .
If RX,Y = 1 then Yi = cXi + d, for constants c, d, with c < 0 .
Also,
If | RX,Y |
= 1 then X and Y are almost linear .
284
EXAMPLE :
285
A scatter diagram showing the average daily high temperature.
The sample correlation coefficient is RX,Y = 0.98
286
EXERCISE :
The Table below shows class attendance and course grade/100.
11 47 13 43 15 70 17 72 18 96 14 61 5 25 17 74
16 85 13 82 16 67 17 91 16 71 16 50 14 77 12 68
8 62 13 71 12 56 15 81 16 69 18 93 18 77 17 48
14 82 17 66 16 91 17 67 7 43 15 86 18 85 17 84
11 43 17 66 18 57 18 74 13 73 15 74 18 73 17 71
14 69 15 85 17 79 18 84 17 70 15 55 14 75 15 61
16 61 4 46 18 70 0 29 17 82 18 82 16 82 14 68
9 84 15 91 15 77 16 75
Any conclusions ?
287
Maximum Likelihood Estimators
EXAMPLE :
288
EXAMPLE : ( continued )
2 1 2
E[S ] = (1 ) .
n
289
The maximum likelihood procedure is the following :
Let
X1 , X2 , , Xn ,
be
independent, identically distributed ,
each having
density function f (x ; ) ,
with unknown parameter .
290
EXAMPLE : For our normal distribution with mean 0 we have
1 Pn
x2k
e 2 2 k=1
f (x1 , x2 , , xn ; ) = . ( Why ? )
( 2 )n
291
EXAMPLE : ( continued )
We had n
d 1 X 2
2 xk n log = 0.
d 2 k=1
292
EXERCISE :
Suppose a random variable has the general normal density function
1 21 (x)2 / 2
f (x ; , ) = e ,
2
with unknown mean and unknown standard deviation .
293
EXERCISE : ( continued )
n
1 X
= Xk ,
n k=1
n 12
1 X
= (Xk X)2 ,
n k=1
that is,
294
NOTE :
2 1 2
E[S ] = (1 ) = 2.
n
295
EXERCISE :
296
EXAMPLE : Consider the special exponential density function
2 x
xe , x>0
f (x ; ) =
0, x0
1.50 1.0
0.9
1.25
0.8
0.7
1.00
0.6
F (x)
f (x)
0.75 0.5
0.4
0.50
0.3
0.2
0.25
0.1
0.00 0.0
0 1 2 3 4 5 0 1 2 3 4 5
x x
297
EXAMPLE : ( continued )
For the maximum likelihood estimator of , we have
f (x ; ) = 2 x ex , for x > 0 ,
so, assuming independence, the joint density function is
f (x1 , x2 , , xn ; ) = 2n x1 x2 xn e(x1 +x2 + +xn )
.
298
EXAMPLE : ( continued )
We had
n n
d X X
2n log + log xk xk = 0.
d k=1 k=1
Differentiating gives
n
2n X
xk = 0,
k=1
from which
2n
= Pn .
k=1 xk
299
EXERCISE :
Verify that Z
f (x ; ) dx = 1 .
0
Also compute
Z
E[X] = x f (x ; ) dx
0
300
NOTE :
Maximum likelihood estimates also work in the discrete case .
In such case we maximize the probability mass function .
EXAMPLE :
Find the maximum likelihood estimator of p in the Bernoulli trial
P (X = 1) = p,
P (X = 0) = 1p .
SOLUTION : We can write
P (x ; p) P (X = x) = px (1 p)1x , ( x = 0, 1 ) (!)
301
EXAMPLE : ( continued )
We found
Pn Pn
P (x1 , x2 , , xn ; p) = p k=1 xk n
(1 p)
(1 p) k=1 xk
.
Differentiating gives
n n
1 X n 1 X
xk + xk = 0.
p k=1 1p 1 p k=1
302
EXAMPLE : ( continued )
We found
n n
1 X n 1 X
xk + xk = 0,
p k=1 1p 1 p k=1
from which
1 n
1 X n
+ xk = .
p 1 p k=1 1p
Multiplying by 1 p gives
1 p n n
X 1 X
+1 xk = xk = n,
p k=1
p k=1
303
EXERCISE :
where x is an integer, (0 x N ) .
304
Hypothesis Testing
305
EXAMPLE :
We assume that :
The lifetime of the bulbs has indeed a normal distribution .
The standard deviation is indeed = 100 hours.
We test the lifetime of a sample of 25 bulbs .
306
Density function of X , Density function of X (n = 25) ,
also indicating X , also indicating X X ,
(X = 1000 , X = 100) . (X = 1000 , X = 20 ).
307
EXAMPLE : ( continued )
308
EXAMPLE : ( continued )
Would you accept the hypothesis that that the mean is 1000 hours ?
309
EXAMPLE : ( continued )
960 X 1040 .
960 1000
P ( | X1000 | 40 ) = 1 2 = 12(2)
= 95 % ,
100/ 25
P ( | X 1000 | 40 ) = 100 % 95 % = 5 % .
310
Density function of X (n = 25) , with = X = 1000 , X = 20 ,
P (960 X 1040)
= 95%
311
EXAMPLE : ( continued )
(1 0.0013) 0.1587 = 84 % .
312
= X = 980 = X = 1000 = X = 1040
P (accept) = 84% P (accept) = 95% P (accept) = 50%
Density functions of X : n = 25 , X = 20
313
EXAMPLE :
314
There are two hypotheses :
315
The density functions of X (n = 25) , also indicating x .
blue : (1 , 1 ) = (1000, 100) , red : (2 , 2 ) = (1100, 200) .
316
RECALL :
317
Probability of Type 1 error vs. x Probability of Type 2 error vs. x
(1 , 1 ) = (1000, 100) (2 , 2 ) = (1100, 100) .
318
The probability of Type 1 and Type 2 errors versus x .
Left : (1 , 1 ) = (1000, 100), (2 , 2 ) = (1100, 100).
Right : (1 , 1 ) = (1000, 100), (2 , 2 ) = (1100, 200).
Colors indicate sample size : 2 (red), 8 (blue), 32 (black) .
Curves of a given color intersect at the minimax x-value.
319
The probability of Type 1 and Type 2 errors versus x .
320
NOTE :
At x the value of
is minimized .
321
0.0040 0.020
0.0035
0.0030 0.015
0.0025
fX (x)
f (x)
0.0020 0.010
0.0015
0.0010 0.005
0.0005
0.0000 0.000
400 600 800 1000 1200 1400 1600 1800 400 600 800 1000 1200 1400 1600 1800
x x
322
The density functions of X (n = 25) , with minimax value of x .
323
The minimax value x of x is easily computed : At x we have
P ( Type 1 Error ) = P ( Type 2 Error ) ,
P (X x | = 1 ) = P (X x | = 2 ) ,
x x
1
= 2 ,
1 / n 2 / n
1 x x 2
= , ( by monotonicity of ) .
1 / n 2 / n
from which
1 2 + 2 1
x = . ( Check ! )
1 + 2
324
Thus we have proved the following :
max { P (X x | = 1 , = 1 ) , P (X x | = 2 , = 2 } ,
is given by
1 2 + 2 1
x = .
1 + 2
325
EXERCISE :
when
n=1 , n = 25 , n = 100 .
326
EXAMPLE ( Known standard deviation ) :
Do we accept H0 ?
327
SOLUTION ( Known standard deviation ) :
Given: n = 9 , = 0.2 , X = 4.88, = 5.0, | X |= 0.12 .
Since
X
Z is standard normal ,
/ n
328
EXAMPLE ( Unknown standard deviation, large sample ) :
P ( X 4.847 ) < 5 % .
329
SOLUTION ( Unknown standard deviation, large sample ) :
CONCLUSION:
We (barely) accept H0 at level of significance 5 % .
( We would reject H0 at level of significance 10 % .)
330
EXAMPLE ( Unknown standard deviation, small sample ) :
NOTE :
If n 30 then the approximation
= S is not so accurate.
In this case better use the student t-distribution Tn1 .
331
The T - distribution Table
n = 0.1 = 0.05 =0.01 = 0.005
5 -1.476 -2.015 -3.365 -4.032
6 -1.440 -1.943 -3.143 -3.707
7 -1.415 -1.895 -2.998 -3.499
8 -1.397 -1.860 -2.896 -3.355
9 -1.383 -1.833 -2.821 -3.250
10 -1.372 -1.812 -2.764 -3.169
11 -1.363 -1.796 -2.718 -3.106
12 -1.356 -1.782 -2.681 -3.055
13 -1.350 -1.771 -2.650 -3.012
14 -1.345 -1.761 -2.624 -2.977
15 -1.341 -1.753 -2.602 -2.947
332
EXAMPLE ( Testing a hypothesis on the standard deviation ) :
A sample of 16 items from a normal population has sample
standard deviation S = 2.58 .
Do you believe the population standard deviation satisfies 2.0 ?
333
EXERCISE :
S = 0.83 .
1.2 ?
( Probably Yes ! )
334
LEAST SQUARES APPROXIMATION
335
Average daily high temperature in Montreal in March
336
Suppose that :
337
Average daily high temperatures, with a linear approximation .
338
There are many ways to determine such a linear approximation.
339
The least squares error versus c1 and c2 .
340
From setting the partial derivatives to zero, we have
N
X N
X
( Tk (c1 + c2 xk ) ) = 0 , xk ( Tk (c1 + c2 xk ) ) = 0 .
k=1 k=1
341
EXAMPLE : For our March temperatures example, we find
c1 = 2.47 and c2 = 0.289 .
342
General Least Squares
Given discrete data points
{ (xi , yi ) }N
i=1 ,
EXAMPLES :
p(x) = c1 + c2 x . (Already done !)
343
For any vector x RN let
N
X
k x k2 xT x x2k . (T denotes transpose).
i=1
Then
p(x1 ) y1
N
X 2
EL [ p(xi ) yi ]2 = k k
i=1
p(xN ) yN
Pn
i=1 ci i (x1 )
y1
2
= k
k
Pn
i=1 ci i (xN ) yN
1 (x1 ) n (x1 ) y1
c1 2
k Ac y k2 .
= k
k
cn
1 (xN ) n (xN ) yN
344
THEOREM :
AT A c = AT y .
PROOF :
EL = k Ac y k2
= (Ac)T Ac (Ac)T y yT Ac + yT y
= cT AT Ac cT AT y yT Ac + yT y .
345
PROOF : ( continued )
We had
EL = cT AT Ac cT AT y yT Ac + yT y .
346
EXAMPLE : Given the data points
{ (xi , yi ) }4i=1 = { (0, 1) , (1, 3) , (2, 2) , (4, 3) } ,
find the coefficients c1 and c2 of p(x) = c1 + c2 x ,
that minimize
X4
EL [ (c1 + c2 xi ) yi ]2 .
i=1
SOLUTION : Here N = 4 , n = 2 , 1 (x) = 1 , 2 (x) = x .
Use the Theorem :
1 0 1
1 1 1 1 1 1 c1 1 1 1 1 3
= ,
0 1 2 4 1 2 c2 0 1 2 4 2
1 4 3
or
4 7 c1 9
= ,
7 21 c2 19
347
EXAMPLE : Given the same data points, find the coefficients of
p(x) = c1 + c2 x + c3 x2 ,
that minimize 4
X
EL [ (c1 + c2 xi + c3 x2i ) yi ]2 .
i=1
SOLUTION : Here
N =4 , n=3 , 1 (x) = 1 , 2 (x) = x , 3 (x) = x2 .
Use the Theorem
:
1 0 0 1
1 1 1 1 c1 1 1 1 1
0 1 1 1
c2 = 0 1 2 4 3 ,
1 2 4 1 2 4 2
0 1 4 16 c3 0 1 4 16
1 4 16 3
or
4 7 21 c1 9
7 21 73 c2 = 19 ,
21 73 273 c3 59
348
The least squares approximations from the preceding two examples :
3.5 3.5
3.0 3.0
2.5 2.5
y
y
2.0 2.0
1.5 1.5
1.0 1.0
0.5 0.5
0 1 2 3 4 0 1 2 3 4
x x
p(x) = c1 + c2 x p(x) = c1 + c2 x + c3 x2
349
EXAMPLE : From actual data :
January -5
February -3
March 3
April 11
May 19
June 24
July 26
August 25
September 20
October 13
November 6
December -2
350
30
25
20
15
Temperature
10
10
0 2 4 6 8 10 12
Month
Average daily high temperature in Montreal (by month).
351
EXAMPLE : ( continued )
The graph suggests using a 3-term least squares approximation
352
30
25
20
15
Temperature
10
10
0 2 4 6 8 10 12
Month
Least squares fit of average daily high temperatures.
353
EXAMPLE :
Consider the following experimental data :
0.6
0.5
0.4
y
0.3
0.2
0.1
0.0
0 1 2 3 4 5 6 7 8
x
354
EXAMPLE : ( continued )
y = c1 xc2 ec3 x .
355
EXAMPLE : ( continued )
y = c1 xc2 ec3 x .
NOTE :
What to do ?
356
EXAMPLE : ( continued )
Thus
We can now use regular least squares.
357
EXAMPLE : ( continued )
0.0
0.5
1.0
1.5
2.0
log y
2.5
3.0
3.5
4.0
4.5
0 1 2 3 4 5 6 7 8
x
358
EXAMPLE : ( continued )
We had
y = c1 xc2 ec3 x ,
and
log y = c1 1 (x) + c2 2 (x) + c3 3 (x) ,
with
1 (x) = 1 , 2 (x) = log x , 3 (x) = x ,
and
c1 = log c1 .
359
EXAMPLE : ( continued )
3
log y
7
0 1 2 3 4 5 6 7 8
x
360
EXAMPLE : ( continued )
0.6
0.5
0.4
y
0.3
0.2
0.1
0.0
0 1 2 3 4 5 6 7 8
x
361
RANDOM NUMBER GENERATION
362
The Logistic Equation
xk+1 = xk , k = 1, 2, ,
The solution is
xk = k x0 , k = 1, 2, .
Thus
363
A somewhat more realistic population growth model is
xk+1 = xk (1 xk ) , k = 1, 2, ,
is given, (0 4) .
x0 is given, (0 x0 1) .
364
EXERCISE :
xk+1 = xk (1 xk ) , k = 1, 2, ,
Compute xk for k = 1, 2, 50 .
365
1.0
0.5 0.9
0.8
0.4
0.7
0.6
p(x)
0.3
y
0.5
0.4
0.2
0.3
0.1 0.2
0.1
0.0 0.0
0.0 0.2 0.4 0.6 0.8 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
x x
366
0.9 1.0
0.8 0.9
0.8
0.7
0.7
0.6
0.6
0.5
p(x)
y
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1 0.1
0.0 0.0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
x x
367
0.06 1.0
0.9
0.05
0.8
0.7
0.04
0.6
p(x)
y
0.03 0.5
0.4
0.02
0.3
0.2
0.01
0.1
0.00 0.0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
x x
368
EXERCISE :
xk+1 = xk (1 xk ) , k = 1, 2, ,
369
1.0
1.0
0.9
0.8
0.8
0.7
0.6
0.6
p(x)
y
0.5
0.4 0.4
0.3
0.2 0.2
0.1
0.0 0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
x x
370
1.0
0.25
0.9
0.8
0.20
0.7
0.6
0.15
p(x)
y
0.5
0.10 0.4
0.3
0.05 0.2
0.1
0.00 0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
x x
371
1.0
0.12
0.9
0.8
0.10
0.7
0.08
0.6
p(x)
y
0.5
0.06
0.4
0.04 0.3
0.2
0.02
0.1
0.00 0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
x x
372
0.08 1.0
0.9
0.07
0.8
0.06
0.7
0.05
0.6
p(x)
y
0.04 0.5
0.4
0.03
0.3
0.02
0.2
0.01
0.1
0.00 0.0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
x x
373
0.035 1.0
0.9
0.030
0.8
0.025 0.7
0.6
0.020
p(x)
y
0.5
0.015
0.4
0.010 0.3
0.2
0.005
0.1
0.000 0.0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
x x
374
0.040 1.0
0.9
0.035
0.8
0.030
0.7
0.025
0.6
p(x)
y
0.020 0.5
0.4
0.015
0.3
0.010
0.2
0.005
0.1
0.000 0.0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
x x
375
0.035 1.0
0.9
0.030
0.8
0.025 0.7
0.6
0.020
p(x)
y
0.5
0.015
0.4
0.010 0.3
0.2
0.005
0.1
0.000 0.0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
x x
376
CONCLUSIONS :
Nature is complex !
377
Generating Random Numbers
a uniform distribution,
a normal distribution,
another known distribution.
QUESTION :
How to generate uniform (and other) random numbers ?
( These are useful in computer simulations .)
378
Generating Uniformly Distributed Random Numbers
xk+1 = f (xk ) , k = 1, 2, 3, .
xk+1 = (n xk ) mod p .
p 6| n .
379
The following fact is useful :
THEOREM :
p 6| n .
f : { 0, 1, 2, , p 1 } { 0, 1, 2, , p 1 } ,
given by
f (x) = (n x) mod p ,
380
p prime , p 6 | n (n x) mod p is 1 1
EXAMPLE :
p = 7 and n = 12 .
Invertible !
NOTE : The values of 12x mod 7 look somewhat random !
381
p prime , p 6 | n (n x) mod p is 1 1
EXAMPLE :
p = 6 and n = 2.
x 2x 2x mod 6
0 0 0
1 2 2
2 4 4
3 6 0
4 8 2
5 10 4
Not Invertible .
382
p prime , p 6 | n (n x) mod p is 1 1
EXAMPLE :
p = 6 and n = 13 .
Invertible . ( So ? )
NOTE : The numbers in the right hand column dont look random !
383
p prime , p 6 | n (n x) mod p is 1 1
p | n(x1 x2 ) , ( Why ? )
p | (x1 x2 ) .
384
For given x0 , an iteration of the form
xk = (n xk1 ) mod p , k = 1, 2, , p 1 .
385
EXAMPLE : As a simple example, take again p = 7 and n = 12 :
386
f (x) sequence cycle period
5x mod 7 1546231 6
6x mod 7 1616161 2
8x mod 7 1111111 1
9x mod 7 1241241 3
387
EXAMPLE : With x0 = 2 , compute
Result :
71 46 17 48 88 94 4 41 92 34
96 75 87 8 82 83 68 91 49 73
16 63 65 35 81 98 45 32 25 29
70 61 95 90 64 50 58 39 21 89
79 27 100 15 78 42 77 57 54 99
30 55 84 53 13 7 97 60 9 67
5 26 14 93 19 18 33 10 52 28
85 38 36 66 20 3 56 69 76 72
31 40 6 11 37 51 43 62 80 12
22 74 1 86 23 59 24 44 47 2
388
EXAMPLE : As in the preceding example, use x0 = 2 , and
compute
xk = (137951 xk1 ) mod 101 , k = 1, 2, , 100 ,
and set xk
xk = .
100
0.710 0.460 0.170 0.480 0.880 0.940 0.040 0.410 0.920 0.340
0.960 0.750 0.870 0.080 0.820 0.830 0.680 0.910 0.490 0.730
0.160 0.630 0.650 0.350 0.810 0.980 0.450 0.320 0.250 0.290
0.700 0.610 0.950 0.900 0.640 0.500 0.580 0.390 0.210 0.890
0.790 0.270 1.000 0.150 0.780 0.420 0.770 0.570 0.540 0.990
0.300 0.550 0.840 0.530 0.130 0.070 0.970 0.600 0.090 0.670
0.050 0.260 0.140 0.930 0.190 0.180 0.330 0.100 0.520 0.280
0.850 0.380 0.360 0.660 0.200 0.030 0.560 0.690 0.760 0.720
0.310 0.400 0.060 0.110 0.370 0.510 0.430 0.620 0.800 0.120
0.220 0.740 0.010 0.860 0.230 0.590 0.240 0.440 0.470 0.020
389
EXAMPLE : With x0 = 2 , compute
Result :
75 35 50 57 67 38 11 59 41 73
61 15 7 10 72 74 48 83 32 89
55 93 3 62 2 75 35 50 57 67
38 11 59 41 73 61 15 7 10 72
74 48 83 32 89 55 93 3 62 2
75 35 50 57 67 38 11 59 41 73
61 15 7 10 72 74 48 83 32 89
55 93 3 62 2 75 35 50 57 67
38 11 59 41 73 61 15 7 10 72
74 48 83 32 89 55 93 3 62 2
390
EXAMPLE : With x0 = 4 , compute
QUESTIONS :
Are there cycles ?
Is this the same cycle that we already found ?
391
Generating Random Numbers using the Inverse Method
392
RECALL :
393
X x1 x2
We want random numbers X with distribution F (x) (blue) .
Let the random variable Y be uniform on the interval [0, 1] .
Let X = F 1 (Y ) .
Then P (x1 X x2 ) = y2 y1 = F (x2 ) F (x1 ) .
Thus F (x) is indeed the distribution function of X !
394
If Y is uniformly distributed on [0, 1] then
P ( y1 Y y2 ) = y2 y1 .
Let
X = F 1 (Y ) ,
with
x1 = F 1 (y1 ) and x2 = F 1 (y2 ) .
Then
395
EXAMPLE :
1 1
F (y) = log(1 y) , 0y<1. ( Check ! )
396
0.8
0.6
F (x)
0.4
0.2
0.0
0 1 2 3 4 5 6
x
The inverse method for generating 20 exponential random numbers ,
397
EXAMPLE : ( continued )
Let mk
f(xk ) = .
N x
100
Then
Z
f(x) dx f(xk ) x = 1 .
X
=
0 k=1
398
1.0 1.0
0.8 0.8
0.6 0.6
f (x)
f (x)
0.4 0.4
0.2 0.2
0.0 0.0
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
x x
399
EXERCISE : Consider the Tent density function
x + 1 , 1 < x 0
f (x) = 1x , 0<x1
0, otherwise
F 1 (y) =
1 2 2y , 12 < y 1
400
1.0
0.8
0.8
0.6
0.6
f (x)
F (x)
0.4
0.4
0.2 0.2
0.0 0.0
1.0 0.5 0.0 0.5 1.0 1.0 0.5 0.0 0.5 1.0
x x
401
1.50 1.25
1.25
1.00
1.00
0.75
f (x)
f (x)
0.75
0.50
0.50
0.25
0.25
0.00 0.00
1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00
x x
402
SUMMARY TABLES
and
FORMULAS
403
Discrete Continuous
p(xi ) = P (X = xi ) f (x)x
= P (x
2 < X < x + 2 )
P R
i p(xi ) = 1
f (x) dx = 1
P Rx
F (xk ) = ik p(xi ) F (x) =
f (x) dx
P R
E[X] = i xi p(xi ) E[X] =
x f (x) dx
P R
E[g(X)] = i g(xi )p(xi ) E[g(X)] =
g(x) f (x) dx
P R R
E[XY ] = i,j xi yj p(xi , yj ) E[XY ] =
xyf (x, y) dy dx
404
Name General Formula
Mean = E[X]
Markov P (X c) E[X]/c
Chebyshev P (| X | k) 1/k2
405
Name Probability mass function Domain
Bernoulli P (X = 1) = p , P (X = 0) = 1 p 0, 1
n
Binomial P (X = k) = pk (1 p)nk 0kn
k
k
Poisson P (X = k) = e k!
k = 0, 1, 2,
406
Name Mean Standard deviation
p
Bernoulli p p(1 p)
p
Binomial np np(1 p)
Poisson
407
Name Density function Distribution Domain
1 xa
Uniform ba ba
x (a, b]
Exponential ex 1 ex x (0, )
1 2
Std. Normal 1 e 2 x x (, )
2
1 2 2
Normal 1 e 2 (x) / x (, )
2
408
Name Mean Standard Deviation
a+b ba
Uniform 2
2 3
1 1
Exponential
Standard Normal 0 1
General Normal
Chi-Square n 2n
409
The Standard Normal Distribution (z)
z (z) z (z)
0.0 .5000 -1.2 .1151
-0.1 .4602 -1.4 .0808
-0.2 .4207 -1.6 .0548
-0.3 .3821 -1.8 .0359
-0.4 .3446 -2.0 .0228
-0.5 .3085 -2.2 .0139
-0.6 .2743 -2.4 .0082
-0.7 .2420 -2.6 .0047
-0.8 .2119 -2.8 .0026
-0.9 .1841 -3.0 .0013
-1.0 .1587 -3.2 .0007
410
The 2n - Table
n = 0.975 = 0.95 = 0.05 = 0.025
5 0.83 1.15 11.07 12.83
6 1.24 1.64 12.59 14.45
7 1.69 2.17 14.07 16.01
8 2.18 2.73 15.51 17.54
9 2.70 3.33 16.92 19.02
10 3.25 3.94 18.31 20.48
11 3.82 4.58 19.68 21.92
12 4.40 5.23 21.03 23.34
13 5.01 5.89 22.36 24.74
14 5.63 6.57 23.69 26.12
15 6.26 7.26 25.00 27.49
411
The T - distribution Table
n = 0.1 = 0.05 =0.01 = 0.005
5 -1.476 -2.015 -3.365 -4.032
6 -1.440 -1.943 -3.143 -3.707
7 -1.415 -1.895 -2.998 -3.499
8 -1.397 -1.860 -2.896 -3.355
9 -1.383 -1.833 -2.821 -3.250
10 -1.372 -1.812 -2.764 -3.169
11 -1.363 -1.796 -2.718 -3.106
12 -1.356 -1.782 -2.681 -3.055
13 -1.350 -1.771 -2.650 -3.012
14 -1.345 -1.761 -2.624 -2.977
15 -1.341 -1.753 -2.602 -2.947
412