Turn in Recitation and Tutorial Scheduling Form Policy: Text
Turn in Recitation and Tutorial Scheduling Form Policy: Text
• Probabilistic models
– sample space
– probability law
• Axioms of probability
• Simple examples
1
Sample space: Discrete example Sample space: Continuous example
1 2 3 4
4
X = First roll 4,4
2
Discrete uniform law Continuous uniform law
• Let all outcomes be equally likely • Two “random” numbers in [0, 1].
y
• Then,
1
number of elements of A
P(A) =
total number of sample points
1 x
• Computing probabilities ≡ counting
• Uniform law: Probability = Area
• Defines fair coins, fair dice, well-shuffled decks
– P(X + Y ≤ 1/2) = ?
1/2
• Turn in recitation/tutorial scheduling form now
1 2 3 4
1 1 1 1
P({2, 4, 6, . . .}) = P(2) + P(4) + · · · = + 4 + 6 + ··· =
22 2 2 3
3
LECTURE 2 Review of probability models
• Problem solving:
– Specify sample space
– Define probability law
– Identify event of interest
– Calculate...
A 4
Y = Second 3
B roll
2
1
• P(A | B) = probability of A,
1 2 3 4
given that B occurred
– B is our new universe X = First roll
• P(M = 2 | B) =
4
Models based on conditional Multiplication rule
Multiplication rule
probabilities
PP(A∩∩B
(A C) =
B ∩ C) = PP(A) P(B | A)
(A)·P A)P (C ||A
· P(C A∩∩ B)
B)
• Event A: Airplane is flying above
Event B: Something registers on radar
screen
U U
P(B | A)=0.99 A B P(C | A B) U U
A B C
P(B | A)
P(Bc | A)=0.01
P(A)=0.05
A
P(Bc | A)
A Bc C
U U
P(A)
Bc
U
A
P(Ac)=0.95
A Bc Cc
U U
P(B | Ac)=0.10
Ac
P(A ∩ B) =
P(B) =
P(A | B) =
5
LECTURE 3 Models based on conditional
probabilities
• Readings: Section 1.5
• 3 tosses of a biased coin:
• Review P(H) = p, P(T ) = 1 − p
• Independence of two events p HHH
1- p HTT
P(A ∩ B) p THH
P(A | B) = , assuming P(B ) > 0
P(B) 1- p p
1- p THT
• Multiplication rule: p TTH
1- p
P(A ∩ B) = P(B) · P(A | B) = P(A) · P(B | A) 1- p TTT
6
Conditioning may affect independence Independence of a collection of events
HH HT
TH TT
7
LECTURE 4 Discrete uniform law
• Then,
Lecture outline number of elements of A |A|
P(A) = =
total number of sample points |Ω|
• Principles of counting
• Just count. . .
• Many examples
– permutations
– k-permutations
– combinations
– partitions
• Binomial probabilities
– Number of elements
in the sample space:
• . . . if repetition is prohibited =
8
Combinations Binomial probabilities
�n �
• : number of k-element subsets
k • n independent coin tosses
of a given n-element set
– P(H) = p
• Two ways of constructing an ordered
sequence of k distinct items:
• P(HT T HHH) =
– Choose the k items one at a time:
n!
n(n−1) · · · (n−k+1) = choices
(n − k)! • P(sequence) = p# heads(1 − p)# tails
– Choose k items, then order them
(k! possible orders)
�
P(k heads) = P(seq.)
• Hence: k−head seq.
�n � n!
· k! =
k (n − k)! = (# of k−head seqs.) · pk (1 − p)n−k
�n � n! �n �
= = pk (1 − p)n−k
k k!(n − k)! k
n � �
� n
=
k=0
k
9
LECTURE 5 Random variables
• Notation:
– random variable X
– numerical value x
F = First roll
– geometric PMF
pX (2) =
10
Binomial PMF Expectation
• Definition:
• X: number of heads in n independent $
coin tosses E[X] = xpX (x)
x
• P(H) = p
• Interpretations:
• Let n = 4 – Center of gravity of PMF
– Average in large number of repetitions
pX (2) = P(HHT T ) + P(HT HT ) + P(HT T H) of the experiment
+P(T HHT ) + P(T HT H) + P(T T HH) (to be substantiated later in this course)
1/(n+1)
In general:
pX (k) =
"n#
k
pk (1−p)n−k , k = 0, 1, . . . , n ...
0 1 n- 1 n x
1 1 1
E[X] = 0× +1× +· · ·+n× =
n+1 n+1 n+1
$
= (x − E[X ])2pX (x)
x
• E[α] =
Properties:
• E[αX] = • var(X) ≥ 0
11
LECTURE 6 Review
• Random variable X: function from
• Readings: Sections 2.4-2.6
sample space to the real numbers
$ %
var(X) = E (X − E[X])2
!
= (x − E[X])2pX (x)
x
= E[X 2] − (E[X])2
&
Standard deviation: σX = var(X)
• Traverse a 200 mile distance at constant • Traverse a 200 mile distance at constant
but random speed V but random speed V
pV (v ) 1/2 1/2 pV (v ) 1/2 1/2
1 200 v 1 200 v
• σV =
12
Conditional PMF and expectation Geometric PMF
• X: number of independent coin tosses
• pX|A(x) = P(X = x | A)
until first head
!
• E[X | A] = xpX |A(x) pX (k) = (1 − p)k−1p, k = 1, 2, . . .
x
∞
! ∞
!
pX (x ) E[X] = kpX (k) = k(1 − p)k−1p
k=1 k=1
p(1-p)2
p
1 2 3 4 x
... ...
• Let A = {X ≥ 2} 1 k 3 k
pX- 2|X>2(k)
pX|A(x) =
p
E[X | A] =
...
1 k
A2 A3 1 1/20
1 2 3 4 x
13
LECTURE 7 Review
• A hat problem
!
pX,Y,Z (x, y, z) = pX (x)pY |X (y | x)pZ |X,Y (z | x, y) E[X] = xpX (x)
x
!!
• Random variables X, Y , Z are
E[g(X, Y )] = g(x, y)pX,Y (x, y)
x y
independent if:
" #
pX,Y,Z (x, y, z) = pX (x) · pY (y) · pZ (z) • In general: E[g(X, Y )] #= g E[X], E[Y ]
for all x, y, z
y • E[αX + β] = αE[X] + β
4 1/20 2/20 2/20
• E[X + Y + Z] = E[X] + E[Y ] + E[Z]
3 2/20 4/20 1/20 2/20
• What if we condition on X ≤ 2
and Y ≥ 3?
14
Variances Binomial mean and variance
• Examples:
• E[Xi] =
– If X = Y , Var(X + Y ) =
– If X = −Y , Var(X + Y ) = • E[X] =
– If X, Y indep., and Z = X − 3Y ,
Var(Z) = • Var(Xi) =
• Var(X) =
• n people throw their hats in a box and • Var(X ) = E[X 2] − (E[X])2 = E[X 2] − 1
then pick one at random.
– X: number of people who get their own
! !
hat X2 = Xi2 + XiXj
i i,j :i=j
#
– Find E[X]
• E[Xi2] =
1, if i selects own hat
Xi =
0, otherwise.
P(X1X2 = 1) = P(X1 = 1)·P(X2 = 1 | X1 = 1)
• X = X1 + X2 + · · · + Xn
=
• P(Xi = 1) =
• E[Xi] =
• E[X] = • Var(X) =
15
LECTURE 8 Continuous r.v.’s and pdf’s
Lecture outline
fX(x)
Sample Space
• Probability density functions
! b
P(a ≤ X ≤ b) = fX (x) dx
a
! ∞
fX (x) dx = 1
−∞
! x+δ
P(x ≤ X ≤ x + δ) = fX (s) ds ≈ fX (x) · δ
x
!
P(X ∈ B) = fX (x) dx, for “nice” sets B
B
a b x a b x
2/6
• E[X] = 1/6
! b" #
2 = a+b 2 1 (b − a)2 1 2 4 x 1 2 4 x
• σX x− dx =
a 2 b−a 12
16
Mixed distributions Gaussian (normal) PDF
-1 0 1 2 x -1 0 1 2 x
• E[X] = var(X) = 1
0 1/2 1 x0
• General normal N (µ, σ 2):
• The corresponding CDF: 1 2 2
fX (x) = √ e−(x−µ) /2σ
σ 2π
FX (x) = P(X ≤ x)
CDF
Outline
• PDF review Event {a < X < b }
a b x
�
• From the joint to the marginal: • Intersect if X ≤ sin Θ
2
fX (x) · δ ≈ P(x ≤ X ≤ x + δ) =
� � � �
�
P X≤ sin Θ = fX (x)fΘ(θ) dx dθ
2 x≤ 2� sin θ
� �
4 π/2 (�/2) sin θ
= dx dθ
πd 0 0
• X and Y are called independent if �
4 π/2 � 2�
fX,Y (x, y) = fX (x)fY (y), for all x, y = sin θ dθ =
πd 0 2 πd
18
Conditioning
• Recall
P(x ≤ X ≤ x + δ | Y ≈ y) ≈ fX |Y (x | y) · δ
Stick-breaking example
• Break a stick of length � twice: 1
fX,Y (x, y) = , 0≤y≤x≤�
break at X: uniform in [0, 1]; �x
break again at Y , uniform in [0, X]
y
f X(x) f Y |X (y | x)
L
L x y
L x
19
LECTURE 10 The Bayes variations
Continuous X, Discrete Y
– Obtaining the PDF for
fX (x)pY |X (y | x)
fX |Y (x | y) = g(X, Y ) = Y /X
pY (y)
� involves deriving a distribution.
pY (y) = fX (x)pY |X (y | x) dx Note: g(X, Y ) is a random variable
x
Example:
When not to find them
• X: a continuous signal; “prior” fX (x)
(e.g., intensity of light beam); • Don’t need PDF for g(X, Y ) if only want
• Y : discrete r.v. affected by X to compute expected value:
� �
(e.g., photon count)
E[g(X, Y )] = g(x, y)fX,Y (x, y) dx dy
• pY |X (y | x): model of the discrete r.v.
20
How to find them The continuous case
x y
g(x)
Example
. . • X: uniform on [0,2]
. . • Find PDF of Y = X 3
. .
. . • Solution:
. . fY (y) =
dFY
dy
(y) =
1
6y 2/3
f v(v0 )
1/30 � �
1 y−b
fY (y) = fX
|a| a
21
LECTURE 11 A general formula
• Hence,
Find the PDF of Z = g(X, Y ) = Y /X ! !
! dg !
δfX (x) = δfY (y) !! (x)!!
FZ (z ) = z≤1 dx
where y = g(x)
FZ (z) = z≥1
• W = X + Y ; X, Y independent • W = X + Y ; X, Y independent
. y
y
. (0,3)
. (1,2) w
.(2,1)
.
(3,0)
x
. w x
pW (w) = P(X + Y = w)
" x +y=w
= P(X = x)P(Y = w − x)
x
"
= pX (x)pY (w − x) • fW |X (w | x) = fY (w − x)
x
• fW,X (w, x) = fX (x)fW |X (w | x)
• Mechanics:
= fX (x)fY (w − x)
– Put the pmf’s on top of each other
# ∞
– Flip the pmf of Y • fW (w) = fX (x)fY (w − x) dx
−∞
– Shift the flipped pmf by w
(to the right if w > 0)
– Cross-multiply and add
22
Two independent normal r.v.s The sum of independent normal r.v.’s
. σX σY
.
.
.
. .. ... . .... ... . .. .. . ... . .
. . .... . .. .
. ... ...... ... .. . . .
. . . .. .. .
. . . .. .
. . . .. . . ... .. . . y y
• |ρ| = 1 ⇔ (X − E[X]) = c(Y − E[Y ])
. .
. . (linearly related)
. . .
.
• Independent ⇒ ρ = 0
• cov(X, Y ) = E[XY ] − E[X]E[Y ] (converse is not true)
n
" n
" "
• var Xi = var(Xi) + 2 cov(Xi, Xj )
i=1 i=1 (i,j ):i=j
&
• independent ⇒ cov(X, Y ) = 0
(converse is not true)
23
LECTURE 12 Conditional expectations
• In stick example:
E[X] = E[E[X | Y ]] = E[Y /2] = !/4
Proof:
E[X | Y = 1] = 90, E[X | Y = 2] = 60
1 2
(d) var(E[X | Y ]) = E[ (E[X | Y ])2 ]−(E[X])2 var(E[X | Y ]) = (90 − 70)2 + (60 − 70)2
3 3
600
Sum of right-hand sides of (c), (d): = = 200
3
E[X 2] − (E[X ])2 = var(X)
24
Section means and variances (ctd.) Example
10 30
1 ! 1 ! var(X) = E[var(X | Y )] + var(E[X | Y ])
(xi−90)2 = 10 (xi−60)2 = 20
10 i=1 20 i=11
f X(x)
2/3
var(X | Y = 1) = 10 var(X | Y = 2) = 20
1/3
1 2
Y=1 Y=2 x
10, w.p. 1/3
var(X | Y ) =
20, w.p. 2/3
E[X | Y = 1] = E[X | Y = 2] =
E[var(X | Y )] = 1 · 10 + 2 · 20 = 50
3 3 3
var(X | Y = 1) = var(X | Y = 2) =
E[Y | N = n] = E[X1 + X2 + · · · + Xn | N = n]
= E[X1 + X2 + · · · + Xn]
= E[X1] + E[X2] + · · · + E[Xn] var(Y ) = E[var(Y | N )] + var(E[Y | N ])
= n E[X] = E[N ] var(X ) + (E[X])2 var(N )
• E[Y | N ] = N E[X]
E[Y ] = E[E[Y | N ]]
= E[N E[X]]
= E[N ] E[X]
25
LECTURE 13 The Bernoulli process
26
Interarrival times Time of the kth arrival
• T1: number of trials until first success • Given that first arrival was at time t
i.e., T1 = t:
– P(T1 = t) = additional time, T2, until next arrival
– E[Yk ] =
• If you buy a lottery ticket every day, what
is the distribution of the length of the
first string of losing days? – Var(Yk ) =
– P(Yk = t) =
time
Figure 6.4: Merging of independent Bernoulli processes.
λk
pZ (k) = e−λ , k = 0, 1, 2, . . . .
k!
Its mean and variance are given by
E[Z] = λ, var(Z) = λ.
27
LECTURE 14 Bernoulli review
•• Distribution • Memorylessness • M
Definition ofof interarrival
Poisson times
process • Definition of Poisson process
•• Other properties
Distribution of the of
of number Poisson
arrivalsprocess • Distribution of number of arrivals
• Other properties of the Poisson process • Other properties of the Poisson process
t1 t2 t3 ! t1 t2 t3 !
x x x x x x xx x x x x xk −λτ
x x x x xx x x x
0 0 (λτ ) e
Time P (k, τ ) = , k = 0, 1, . . . Time
k!
•• Time • E[N ] •= λτ • E
P (k, τ )homogeneity:
= Prob. of k arrivals in interval P (k, τ ) = Prob. of k arrivals in interval
• Finely discretize [0, t]: approximately Bernoulli
Pof(k,duration
τ ) = Prob.
τ of k arrivals in interval 2 = λτof duration τ
• σN • σ
of duration τ
• Assumptions: • Assumptions:
λt(esdiscrete
• =
• MN (s) Nte(of −1) approximation): binomial • M
• Numbers of arrivals in disjoint time – Numbers of arrivals in disjoint time in-
– Numbers of arrivals in disjoint time in-
intervals
tervals are independent
are independent tervals are independent
• Taking δ → 0 (or n → ∞) gives:
For VERY
• –Small interval small
probabilities:
δ: – For VERY small δ:
(λτ )k −λτ
eaccording
Example: YouP (k, get
τ ) =email to a Exa
For VERY small − λδ if k = 0
δ:1 1 −,λδ ifk k==0,01, . . .
Poisson process at a rate k!
of λ = 0.4 mes- Poi
P (k, δ) ≈ P (k, δ) ≈ λδ if k = 1
− λδ, ifif kk=
1 λδ =1
0; sages per hour. You check your
sag
0 if k > 1 0 email ifevery
k>1
P (k, δ ) ≈ λδ, if k = 1; • E[Nt] = λt, var(Nt) = λt thir
thirty minutes.
– λ = “arrival rate” – λ = “arrival rate”
0, if k > 1.
– Prob(no new messages)= –
– λ: “arrival rate” – Prob(one new message)= –
28
Example Interarrival Times
Interarrival Times
• You get email according to a Poisson • Yk time of kth arrival
process at a rate of λ = 5 messages per • Yk time of kth arrival !
hour. You check your email every thirty • Erlang distribution:
minutes.
• Erlang distribution:
k k−1 0
λ y e−λy
fYk (y) = k k−1 −λy , y≥0
λ y(k −e1)!
fYk (y) = , y≥0
• Prob(no new messages) = (k − 1)!
flr (l)
fY (y)
k
r=1
• Prob(one new message) = r=2
k=1 r=3
k=2
k=3
l
0 y
PMF of Rate
Arrival # of Arrivals Poisson
λ/uni t time Binomial
p/per trial "2
m vari-
29
Interarrival Times
k=1
P (k, δ) ≈ λδ, if k = 1,
• Merging and splitting
k=2 0, if k > 1.
• E [Nτ ] = var(N
First-order τ ) = λτ
interarrival times (k = 1):
exponential
• Interarrival times (k = 1): exponential:
fY1 (y) = λe−λy , y≥0
fT1 (t) = λe−λt, t ≥ 0, E[T1] = 1/λ
– Memoryless property: The time to the
next arrival is independent of the past
• Time Yk to kth arrival: Erlang(k):
λk y k−1e−λy
fYk (y) = , y≥0
(k − 1)!
All flashes
"1
a) P(fish for more than two hours)= (Poisson)
"2
b) P(fish for more than two and less than
Green bulb flashes
five hours)=
(Poisson)
d) E[number of fish]=
30
Light bulb example Splitting of
Splitting of Poisson
Poisson processes
processes
USA
Email Traffic
leaving MIT p!
MIT
Server
! (1 - p) !
Foreign
Renewal processes
Random incidence for Poisson Random incidence in
• Series of“renewal processes”
successive arrivals
• Poisson process that has been running
forever – i.i.d. interarrival
• Series times
of successive arrivals
(but not necessarily exponential)
– i.i.d. interarrival times
• Show up at some “random time”
(but not necessarily exponential)
(really means “arbitrary time”)
• Example:
• Bus interarrival times are equally likely to
Example:
be
Bus or
5 10 minutes
interarrival times are equally likely to
x x x x x
Time be 5 or 10 minutes
• If you arrive at a “random time”:
Chosen
time instant • If you arrive at a “random time”:
– what is the probability that you selected
– awhat is the probability
5 minute interarrivalthat you selected
interval?
a 5 minute interarrival interval?
– what is the expected time to next ar-
what is the expected time
– rival?
• What is the distribution of the length of to next arrival?
the chosen interarrival interval?
31
LECTURE 16 Checkout counter model
• Classification of states
0 1 2 3 .. . 9 10
pij = P(Xn+1 = j | Xn = i)
i k
= P(Xn+1 = j | Xn = i, Xn−1, . . . , X0) r ik(n-1)
p kj j
...
r im(n-1) p mj
• Model specification: m
32
Example Generic convergence questions:
• Does rij (n) converge to something?
0.5 0.8
1 2 3
1 2 1 1
n=0 n=1 n=2 n = 100 n = 101 • Does the limit depend on initial state?
r11(n)
r12(n) 0.4
r21(n) 1 2 3 4
0.3 0.3
r22(n)
r1 1(n)=
r3 1(n)=
r2 1(n)=
3 4
6 7
5
1 2
8
– i transient:
P(Xn = i) → 0,
i visited finite number of times
• Recurrent class:
collection of recurrent states that
“communicate” with each other
and with no other state
33
LECTURE 17 Review
Markov Processes – II
• Discrete state, discrete time, time-homogeneous
• Readings: Section 7.3 – Transition probabilities pij
– Markov property
Lecture outline
• rij (n) = P(Xn = j | X0 = i)
• Review
• Key recursion:
• Steady-State behavior !
rij (n) = rik (n − 1)pkj
k
– Steady-state convergence theorem
– Balance equations
• Birth-death processes
P(X1 = 2, X2 = 6, X3 = 7 | X0 = 1) =
P(X4 = 7 | X0 = 2) =
starting from i, 5
4
and from wherever you can go,
there is a way of returning to i 6 8
3
• If not recurrent, called transient
• Recurrent class: 2 1 7
collection of recurrent states that
“communicate” to each other
and to no other state
34
Expected Frequency of a Particular Transition
Consider n transitions of a Markov chain with a single class which is aperi-
odic, starting from a given initial state. Let qjk (n) be the expected number
of such transitions that take the state from j to k. Then, regardless of the
Steady-State Probabilities Visit frequency interpretation
initial state, we have
qjk (n)
• Do the rij (n) converge to some πj ? lim = πj pjk .
(independent of the initial state i)
n→∞ n!
πj = πk pkj
• Yes, if: k
Given the frequency interpretation of πj and πk pkj , the balance equation
– recurrent states are all in a single class,
m
! of being in j: πj
and • (Long run) frequency
πj = πk pkj
– single recurrent class is not periodic k=1
• Frequency
has an intuitive meaning. of transitions
It expresses the factk →
that
j: the
πk pexpected
kj frequency π
of visits to j is equal to the sum of the expected frequencies πk pkj of transition
• Assuming “yes,” start from key recursion
that lead to j; see• Fig. 7.13.
Frequency of transitions into j:
!
π p
! k kj
rij (n) = rik (n − 1)pkj k
k
1 π j pjj
– take the limit as n → ∞ π 1 p1j
!
πj = πk pkj , for all j
k 2
π 2p2j j
– Additional equation:
. . .
. . .
!
πj = 1
j π m pmj
m
0.5 0.8 over an infinite time horizon, the observed long-term frequency with which state j
p p
π i = π 0 ρi , i = 0, 1, . . . , m
π0 = 1 − ρ
ρ
E[Xn] = (in steady-state)
1−ρ
35
LECTURE 18 Review
LECTURE 20 Review
lim (n)
lim rrijij(n )==ππj j
Readings: Section 7.4 n→∞
n→∞
Readings: Section 6.4 where
where ππjj does not depend
does not dependon
onthe
theinitial
initial
conditions
conditions:
Lecture outline lim P(X = =
lim P(X =0)i)=
j | jX|0X
n n =ππjj
n→∞
n→∞
Lecture outline
• Review of steady-state behavior
• Review of steady-state behavior •• ππ11,,......,, ππm can
m can be
be found
found as
as the
theunique
unique
• Probability of blocked phone calls
solution
solution of the to the balance equations
balance equations
• Probability of blocked phone calls !
• Calculating absorption probabilities πj = πk pkj , ! j = 1, . . . , m,
πj = πk pkj
k
Calculating
• • Calculating absorption
expected timeprobabilities
to absorption k
together with
together with
!
• Calculating expected time to absorption !πj = 1
j πj = 1
j
π1 = 2/7, π2 = 5/7
π1 = 2/7, π2 = 5/7 Discrete time intervals
•• Discrete intervals
• Assume process starts at state 1. of (small)
of (small) length
length δδ
• Assume process starts at state 1.
"#
• P(X1 = 1, and X100 = 1)=
• P(X1 = 1, and X100 = 1)= 0 1 i!1 i B-1 B
iµ#
• • PP(X100
(X = 1 and X101 = 2)
100 = 1 and X101 = 2) •• Balance
Balance equations:
equations: λπ i−1==iµπ
λπi−1 iµπi i
B
λi !B λ i
i
• πi = π0 λii π0 = 1/ ! λ
• πi = π0 µi i! π0 = 1/i=0 µ i!i
i
µ i! i=0 µ i!
36
Calculating absorption probabilities Expected time to absorption
0.5 44
1 0.4
54
0.6 0.2
0.2 1 2
1
3 0.3 0.8
0.5 44
0.4
• Find expected number of transitions µi,
0.6 0.2
until reaching the absorbing state,
1 2
given that the initial state is i?
0.8
For i = 4, ai =
For i = 5, ai = µi = 0 for i =
!
! For all other i: µi = 1 + pij µj
ai = pij aj , for all other i
j
j
– unique solution
– unique solution
ts = 0,
!
ti = 1 + pij tj , for all i =
% s
j
"
• t∗s = 1 + j psj tj
37
LECTURE 19 Chebyshev’s inequality
Limit theorems – I
• Random variable X
• Readings: Sections 5.1-5.3; (with finite mean µ and variance σ 2)
start Section 5.4 !
σ2 = (x − µ)2fX (x) dx
! −c ! ∞
• X1, . . . , Xn i.i.d. ≥ (x − µ)2fX (x) dx + (x − µ)2fX (x) dx
−∞ c
X1 + · · · + Xn
Mn =
n ≥ c2 · P(|X − µ| ≥ c)
What happens as n → ∞?
• Why bother? σ2
P(|X − µ| ≥ c) ≤
c2
• A tool: Chebyshev’s inequality
1 /n
0 n
Does Yn converge?
38
Convergence of the sample mean The pollster’s problem
(Weak law of large numbers)
• f : fraction of population that “. . . ”
• X1, X2, . . . i.i.d.
• ith (randomly selected) person polled:
finite mean µ and variance σ 2
1, if yes,
X + · · · + Xn
Mn = 1 Xi =
0, if no.
n
• Mn = (X1 + · · · + Xn)/n
• E[Mn] = fraction of “yes” in our sample
• If n = 50, 000,
then P(|Mn − f | ≥ .01) ≤ .05
(conservative)
39
LECTURE 20 Usefulness
THE CENTRAL LIMIT THEOREM • universal; only means, variances matter
• accurate computational shortcut
• Readings: Section 5.4
• justification of normal models
• X1, . . . , Xn i.i.d., finite variance σ 2
• “Standardized” Sn = X1 + · · · + Xn:
What exactly does it say?
Sn − E[Sn] Sn − nE[X]
Zn = = √ • CDF of Zn converges to normal CDF
σ Sn nσ
– not a statement about convergence of
– E[Zn] = 0, var(Zn) = 1 PDFs or PMFs
0.14 0.1
n =2 n =4
0.12
0.08
0.1
0.08 0.06
0.06
0.04
The pollster’s problem using the CLT
0.04
0.02
0.02
• f : fraction of population that “ . . .��
0 0
0 5 10 15 20 0 5 10 15 20 25 30 35
0.01
0.05
0.005
• Mn = (X1 + · · · + Xn)/n
0 0
• Suppose we want:
0 2 4 6 8 100 120 140 160 180 200
n = 16
0.07
n=8
0.06
0.08
0.05
� �
� X1 + · · · + Xn − nf �
0.06 0.04
� �
� ≥ .01
0.03
0.04
�
0.02
0.02
n
0.01
0 0
0 5 10 15 20 25 30 35 40 0 10 20 30 40 50 60 70
� � √
1 0.06 � X + · · · + X − nf � .01 n
� 1 n �
0.9 n = 32
� √ � ≥
� �
0.05
0.8
0.7
nσ σ
0.04
0.6
0.5 0.03
0.4
0.3
0.02
√
0.2
0.01 P(|Mn − f | ≥ .01) ≈ P(|Z| ≥ .01 n/σ)
0.1
√
0
0 1 2 3 4 5 6 7
0
30 40 50 60 70 80 90 100 ≤ P(|Z| ≥ .02 n)
40
Apply to binomial The 1/2 correction for binomial
approximation
• Fix p, where 0 < p < 1
• P(Sn ≤ 21) = P(Sn < 22),
• Xi: Bernoulli(p) because Sn is integer
Sn − np
• CDF of � −→ standard normal
np(1 − p)
Example
18 19 20 21 22
• n = 36, p = 0.5; find P(Sn ≤ 21)
• Exact answer:
21 � � �
� 36� 1 36
= 0.8785
k=0
k 2
= 0.124
• Exact answer:
�36� � 1 �36
= 0.1251
19 2
41
Θ
Θ Θ
pΘ(θ)
Θ pΘ(θ)
Θ pΘ(θ)
N
Θ Θ
pΘ(θ) N
Θ pΘ(θ) Θ Θ N
LECTURE 21 Types of Inference models/approaches
pX|Θ(x | θ)
Sample Applications pΘ(θ)pΘ(θ)
• Model building
pΘ(θ) versus
p
N
(x inferring
| θ) unkno
pX|Θwn
(x | θ)
• Readings: Sections• 8.1-8.2 pΘ(θ) N pΘ(θ) X|Θ X
Polling N N variables. E.g., assume X = aS + W
p (x | θ)
– Design of experiments/sampling – ModelNbuilding:
pX|Θ(x | θ) methodologies N X X|Θ X
“It is the mark of truly educated people N
| θ)(x | know
Θ̂
Lancet study on Iraq death toll pX|Θ(x “signal” S, observe X, infer a
to be deeply moved by–statistics.”
pX|Θ θ)
ΘΘ X Θ̂
X – Estimationp pX|Θ
(x |in
(x
θ) | the
θ)Θ̂ presence
Θ of
Estimator noise:
(Oscar Wilde) pX|Θ(x | θ) pX (x;Xθ) X|Θ
• Medical/pharmaceutical trials pknow a, observe X, Θ̂ estimate S. Θ
X
Θp(θ)
Θ (θ) Estimator Estimator
Θ̂ X pΘ(θ)
• Data mining X Θ̂
• Hypothesis testing: unknown
X takes one of
Reality Model Θ̂
N N possible values;
few Θ
W ∼ aim at small
Estimator pΘ(θ)
(e.g., customer arrivals)
–(e.g.,Netflix
Poisson)
competition Estimator Θ̂ Θ̂ fW (w)
N Θ ∈ {0, 1} X =Θ+W
Θ̂ Estimator probability
Estimator of incorrect decision
Θ pX|Θ (x(x| θ) pΘ(θ) N
Data • Finance pX|Θ
W ∼ fW (w) • ΘEstimation:
∈ {0,
| θ)
Θ X = aim
Estimator
1} Estimator Θ+W at a small
pX|Θ(xestimation
| θ) error
Estimator W ∼ fW W (w)
∼ fW (w)Θ ∈ {0,
Θ 1}
∈ {0, 1}X = Θ
X+=WΘ+W
• Design &Θ̂interpretation of experiments pΘ(θ) XX pΘ(θ)
N pX|Θ(x | θ)
• Classical statistics: X
– polling,
Matrix Completion Θ̂
pmedical/pharmaceutical
Y |X (y | x)
trials.
W ∼ .f.W (w) Θ ∈ {0, 1} =Θ̂Θ + W
X Θ̂ pX|Θ(x |Θθ) X
Θ̂ N N Θ̂
• Netflix competition
Y |X (·to
! Partially observed matrix:pgoal predict the
| ·) • Finance Θ
X̂ Estimator pΘ(θ)
unobserved entriesΘ̂ pY |X (y | x) fΘ(θ) 1/6 4 10 θ
Estimator X Θ̂
pX|Θ(x | θ) (x; (x
pX|Θ
pX θ) | θ) Θ Estimator
objects X̂ θ N pΘ(θ)
2 1 pY 4|X (y |5x) X̂ Θ̂ N
5 4 ? 1 3 Estimator
4
3 5
?
2
5 3 ?
θ measurement
pY (y ; θ)
Y Θ̂ X θ: unknownXparameter pΘ(not
(θ) a r.v.) N
Graph of S&P 500 index removed Θ Θ
Estimator
4 1X̂3 5 θ pX|Θ(x | θ)
2 1? 4 Θ̂
due to copyright restrictions. ◦ E.g., θ =
Θ̂
mass of electron
Estimator
sensors
1 5 5 4
2 ?5 ? 4 Θ̂ N pX|Θ(x | θ)
3 3 θ1 5 2 1 pY (y ; θ) pΘ(θ)
Θ p (θ)
3
4
1
51
2
3
3
Θ̂ N
pY |X (· | ·) • Bayesian: Use priors &XBayes rule
3 3? 5 Estimator
2? 1 p1Y (y ; θ) pX (x) Estimator pX|Θ(x | θ) X
1
5
3
2 ? X4 4
1 5 4 5pY |X (y | x)
Y X̂ N N Θ̂
Estimator
1 2 4 5?
pX (x) X ∈ {0, 1} ΘΘ X Θ̂
W ∼ fW (w) pΘ p∈(x
{0,
|(x1}| θ)
θ) X = Estimator
Θ+W
X̂ X|ΘX|Θ
• Signal processing
W ∼ fW (w) pΘp(θ)
Θ(θ) Θ̂ Estimator
θ
– Tracking, detection, speaker identification,. . . fΘ(θ) 1/6 X X4 10 θ
Y =X +W
NN Estimator
pY (y ; θ)
Θ̂ Θ̂
pX|Θ
pX|Θ(x(x
| θ)
| θ)
Estimator
Estimator
XX
Θ̂Θ̂
Estimator
Estimator
Bayesian inference: Use Bayes rule Estimation with discrete data
• Hypothesis testing
– discrete data fΘ(θ) pX |Θ(x | θ)
fΘ|X (θ | x) =
pΘ(θ) pX |Θ(x | θ) pX (x)
pΘ|X (θ | x) = !
pX (x)
pX (x) = fΘ(θ)pX |Θ(x | θ) dθ
– continuous data
pΘ(θ) fX |Θ(x | θ)
pΘ|X (θ | x) = • Example:
fX (x)
– Coin with unknown parameter θ
– Observe X heads in n tosses
• Estimation; continuous data
• What is the Bayesian approach?
fΘ(θ) fX |Θ(x | θ) – Want to find fΘ|X (θ | x)
fΘ|X (θ | x) =
fX (x) – Assume a prior on Θ (e.g., uniform)
Zt = Θ0 + tΘ1 + t2Θ2
Xt = Zt + Wt, t = 1, 2, . . . , n
Bayes rule gives:
42
pX|Θ(x | W θ)
Θp(θ
NpΘp(θ)Θ)(θ) pX (x) = f ( ) pf W (w)
X | (x | ) d
p
pX (x) X|Θ (x | θ) W pW ( w )
pΘ(θ)
pN N (x
N | θ) X Y = X + W
+
X|Θ
EXxaX 0 , p1Y} ( y ; ) Y = X + W
m{ ple:
N
X pXp|(x
pX|Θ Θ
X|Θ (x (x
| θ)| θ|) θ) Θ̂ object at unknown location X Θ
W Θ̂p W ( w )
Output of Bayesian Inference sensors
Least Mean Squares Estimation
pX|Θ(x | θ)
Θ̂
XX X Estimator pΘ(θ)
Y = X + W
• Posterior distribution: Estimator in the absence of information
• Estimation
X N Θ+W
Θ̂ Θ̂ Θ̂
Estimator W ∼ fW (w) Θ ∈ {0, 1} X=
– pmf pΘ|X (· | x) or pdf fΘ|X (· | x)
W ∼ fW (w) Θ ∈ {0, 1} X =Θ+W
Θ̂ pX|Θ(x | θ)
W E sEstimator
ft W
Estimator
∼ i m(w)
a t or Θ| (θ)
p X | (f1Θ = 1}1/6 X =4Θ + W10
∈) {0, θ
f=ΘP(θ)
(se nsor i1/6 “ se nses ” t4 h e o b j e c10
t | = ) θX
Estimator
Θ (θ)
fW W∼W ∼fW∼ f(w) (w
W (w)
Wf1/6 ) = hΘ( disΘ∈4t aΘ
{0, {e1}
∈n c∈ 1 } 1}
0,o{0,
f10 fr o X =
m X
se XΘ==
θnsor Θ+ i )Θ
W++ WW
Θ̂
W fW (w) Θ {0, 1} X =Θ+W
• If interested in a single answer: Θf(θ
fΘf(θ)Θ)(θ) 1/6
1 /1/6
6 4 4 4 101 010 θ θ θ Estimator
– Maximum a posteriori probability (MAP): fΘ(θ) 1/6 4 10 θ
• find estimate c, to:
◦ pΘ|X (θ∗ | x) = maxθ pΘ|X (θ | x) " #
minimizes probability of error; minimize E (Θ − c)2
often used in hypothesis testing
• Optimal estimate: c = E[Θ]
◦ fΘ|X (θ∗ | x) = maxθ fΘ|X (θ | x)
• Optimal mean squared error:
– Conditional expectation:
! " #
E (Θ − E[Θ])2 = Var(Θ)
E[Θ | X = y] = θfΘ|X (θ | x) dθ
≤ E[(Θ − g(x))2 | X = x]
" # " #
◦ E (Θ − E[Θ | X])2 | X ≤ E (Θ − g(X))2 | X
" # " #
◦ E (Θ − E[Θ | X ])2 ≤ E (Θ − g(X))2
" #
E[Θ | X] minimizes E (Θ − g (X))2
over all estimators g(·)
43
N
Xp pXp|(x (x (x
| θ)| θ|) θ) Θ̂ object at unknown location X
W ∼ fW (w) Θ ∈X|Θ{0, ΘX|Θ
1} WXΘ̂ =p WΘ( w +) W Θ
pΘ(θ) sensors
pX|Θ(x | θ)
Θ̂
XX X Estimator pΘ(θ)
fΘ(θ) fΘ(θ) 1/6 4 10 Y = Xθ + W
fΘ(θ) Estimator
x
X 10
N Θ̂ Θ̂ Θ̂
Estimator W ∼ fW (w) Θ ∈ {0, 1} X= N Θ+W
4 fΘ(θ) f f(θ) (x | θ) x W ∼ fW (w) Θ ∈ {0, 1} X =Θ+W
fX|Θ(x | θ) Θ X|Θ pX|Θ(x | θ)
Θ̂
LECTURE
Θ 22 W E sEstimator
ft W
Estimator
∼ i m(w)
a t or Θ| (θ)
p X | (f1Θ = 1}1/6 X =4Θ + W10
∈) {0, θ
fX|Θ(xf | g( ) | θ)
θ)·(x x pX|Θ(x | θ) fΘ(θ) f=ΘP(θ) 1/6
fΘ(θ)
4 10 θ
g( · ) f (θ)3 x (se nsor i “ se nses ” t h e o b j e c t | = )X
X|Θ9 11 Estimator
Θ 5 y
• Readings: pp. 225-226; pSections
Θ (θ) 8.3-8.4 Θ (θ)
fW W∼W ∼fW∼ f(w) (w
Wf1/6 )4= hΘ
(w) Θ∈
( dis4t aΘ
{0, {ef1}
∈n c∈ 0, f10
o{0, } 1}
1(θ) frfo X =
fmX|Θ
se
X X ==
Θ
θnsor
(x +
|Θi )Θ
θ) W++WxW
W
fX|Θ(x | θ) Θ Θ (θ) Θ̂
g( · ) g( Θ̂
· ) = g(X) X
Θ̂ = fg(X)(x | θ) W fW (w) Θ {0, 1} X = Θ +ΘW
Θ Θf(θ
Θ)(θ) g( (x 0|10 )θ| θ
·1)/1/6
6 4 4 f4X|Θ 1fX|Θ
X|Θ fΘf(θ) 1/6 10 θ)·(x
g( x θ
θ) x
N Estimator
fΘ(θ)3 5 9 11 y
1/2 g( · ) Topics
Θ̂ = g(X)
Θ̂ 1/2
= g(X) fΘ(θ) 1/6 4 10· ) pΘ(θ)
θ = g(X)
pΘ(θ) p
X|Θ (x | θ)
Θ̂ fΘ(θ) Θ̂ = fg(X)(x | θ)
g( Θ̂
g( · )
Θ
• (Bayesian) Least1/2 means
θ − 1 squares (LMS)
X |Θ
N
θ − 1Θ̂ = g(X) 1/2 Θ̂ = g(X)
N fX|Θ(x | θ) Θ̂ 1/2
= g(X)
estimation X 1/2 g( · )
pΘ(θ) p fΘ(θ)
fΘ(θ) Estimator X|Θ (x | θ)
θ−1 − +1
θ1 1/2 θ−1
• θ + 11/2
(Bayesian) Linear θLMS estimation
pX|Θ(x |
Θ̂ θ) g( · ) θ − 1Θ̂ = g(X) N
1/2 fX|Θ(x | θ)
fΘ(θ) X
fX|Θ(x | θ) fΘ(θ)
θ+1 θ+1 W ∼ fW (w) Θ ∈ {0, 1} X = Θθ + W
+ 11/2
θ − 1 θ + 1
θ −p 1 (x | θ) g( · )
Θθ −1 X Estimator Θ̂ = g(X) X|Θ Θ̂
fX |Θ(x | θ) fΘ(θ)
g( · ) fX|Θ(x | θ)
θ+1 θ+1
fΘ(θ) Θθ −1 X Θ̂ = g(X)
pΘθ(θ) Θ̂ fΘ(θ) 1/6 4 10 θ Estimator
+1 fX|Θ(x | θ)
g( · )
g( · ) x fΘ(θ)
Θ̂ = g(X) 10 pΘθ(θ) Θ̂
fX|Θ
N (x | θ) Estimator +1
g( · )
• MAP estimate: Θ̂ = g(X)
Θ̂ = g(X)θ̂MAP maximizes fΘ|X (θ | x) fX|Θ
N (x | θ) Estimator
g(p· ) (x | θ) Θ̂ = g(X)
• LMSX|Θ
estimation: g(p· )
X|Θ (x | θ)
� � fΘ(θ)
Θ̂X
– Θ̂ = E[Θ | X] minimizes E (Θ − g(X ))2
=g(X) fΘ(θ) Θ̂X
= g(X)
over all estimators g(·)
Θ̂ 4 Θ̂ fΘ(θ) f f(θ) (x | θ) x
fX|Θ(x | θ) Θ X|Θ
– for any x, θ̂ �= E[Θ | X = x] �
Estimator Θ
minimizes E (Θ − θˆ)2 | X = x Estimator
fX|Θ(xf | g( ) | θ)
θ)·(x x x
over all estimates θ̂ g( · ) f X|Θ9
Θ θ3 5 11 y
pΘ(θ)
g( · ) g( Θ̂
· ) = g(X)
Θ̂ = fg(X) x | θ
X|Θ Θ N
Θ̂ = g(X)
Θ̂ 1/2
= g(X)
1/2 g · fΘ(θ)
pΘ(θ) p
X|Θ (x | θ)
x | θ) 1/2 θ−1
1/2
θ − 1Θ g X N fX|Θ(x | θ)
X
fΘ(θ)
θ−1 θ− θ1+1
θ+1 / pX|Θ(xΘ̂| θ) g( · )
Conditional mean squared error fΘ(θ)
Some properties of LMS estimation
fX|Θ(x | θ)
θ+1 θ+1
mator • E[(Θ − E[Θ | X ])2 | X = x] Θ θ − Θ̂ = E[Θ | X]
– Estimator:
X Estimator Θ̂ = g(X)
fX!Θ(x ! θ)
g( · )
– same as Var(Θ | X = x): variance of the fΘ(θ)
– Estimation error: Θ̃ = Θ̂Θ̂− Θ
fW (w) Θ ∈ {0, 1} X =Θ+W pΘθ(θ)
conditional distribution of Θ g( " )
Θ̂ = g(X)
1/6 4 10 fX|Θ (x | θ)
10
θ
x • E[Θ̃]N= 0 E[Θ̃ | XEstimator
= x] = 0
Θ̂ = g(X)
g(p· ) (x | θ)
X|Θ
• E[Θ̃h(X)] = 0, for any function h
Θ̂X= g(X)
• cov(Θ̃, Θ̂) = 0
4
fΘ(θ)
• SinceΘ̂Θ = Θ̂ − Θ̃:
Conditional mean (x | θ) error
squared
fX|Θ x
3 5 9 11 y var(Θ) = var(Θ̂) + var(Θ̃)
• E[(Θ − E[Θ | X])2 | X = x] Estimator
x g( · )
10 as Var(Θ | X = x): variance of the
– same
conditional distribution of Θ
Θ̂ = g(X)
1/2
4
−(θ)
θfΘ 1
+ 1(x | θ)
θfX|Θ x
3 5 9 11
Predicting X based on Yy
• Two r.v.’s X, Y g( · )
• we observe that Y = y
Θ̂ =
– new universe: condition ong(X)
Y =y
� �
• E (X − c)2 | Y = y is minimized
1/2 by
c= 44
Linear LMS Linear LMS properties
4 10 θ Θ̂ = a1X1 + · · · + anXn + b
x
10
• Find best choices of a1, . . . , an, b
• Minimize:
Θ̂X
= g(X)
• Estimation methods:
Choosing Xi in linear LMS – MAP
Θ̂
• E[Θ | X] is the same as E[Θ | X 3] – MSE
Estimator
• Linear LMS is different: – Linear MSE
◦ Θ̂ = aX + b versus Θ̂ = aX 3 + b
◦ Also consider Θ̂ = a1X + a2X 2 + a3X 3 + b
45
X
pΘp(θ)
(θ) Θ̂ Θ
Estimator
Θ Estimator
Θ̂ X X pΘ(θ)
X Θ̂
Θ̂
NN ΘEstimator pΘ(θ)
Estimator Θ̂ Θ̂ W ∼ fW (w)
N Θ ∈ {0, 1} X =Θ+W
Θ̂ Estimator
Estimator
Θ p (x | θ)
pX|Θ(x | θ)
X|Θ pΘ(θ) N
W ∼ fW (w) Θ ∈ {0, ΘEstimator
1} Estimator
X =Θ+W pX|Θ(x | θ)
Estimator W ∼ fWW (w)
∼ f W (w)Θ ∈ {0,
Θ 1}
∈ {0, X
1} = Θ
X +
= WΘ + W
LECTURE 23 pΘ(θ) XX Classical
pΘ(θ)
N
statistics pX|Θ(x | θ)
X
W ∼ fW (w) Θ ∈ {0, 1} =Θ̂Θ + W
X Θ̂ pX|Θ(x | θ) X
N N
• Readings: Section 9.1 Θ̂
(not responsible for t-based confidence
fΘ(θ) 1/6 4 10 Estimator
θ
Estimator X Θ̂
pX|Θ(x | θ) (x; (x
pX|Θ
pX θ) | θ) Estimator
intervals, in pp. 471-473)
Θ̂ Estimator
X
X • also for vectors X and θ:
• Outline Estimator
pX1,...,Xn (x
Θ̂1 , . . . , xn ; θ1 , . . . , θm )
– Classical statistics Θ̂
• Problem types:
– Hypothesis testing:
H0 : θ = 1/2 versus H1 : θ = 3/4
– Composite hypotheses:
H0 : θ = 1/2 versus H1 : θ =
� 1/2
– Estimation: design an estimator Θ̂,
to keep estimation error Θ̂ − θ small
n n
θ̂ML = Θ̂n =
x1 + · · · + xn X1 + · · · + X n
46
Estimate a mean Confidence intervals (CIs)
• X1, . . . , Xn: i.i.d., mean θ, variance σ 2 • An estimate Θ̂n may not be informative
enough
Xi = θ + W i
• An 1 − α confidence interval
Wi: i.i.d., mean, 0, variance σ 2 is a (random) interval [Θ̂− +
n , Θ̂n ],
X1 + · · · + X n +
Θ̂n = sample mean = Mn = s.t. P(Θ̂−
n ≤ θ ≤ Θ̂n ) ≥ 1 − α, ∀ θ
n
– often α = 0.05, or 0.25, or 0.01
Properties: – interpretation is subtle
47
LECTURE 24 Review
• Maximum likeliho o d estimation
• Reference: Section 9.3
– Have model with unknown parameters:
• Course Evaluations (until 12/16) X ∼ pX (x; θ)
http://web.mit.edu/subjectevaluation – Pick θ that “makes data most likely”
max pX (x; θ)
476 Classical Statistical Inference Chap. 9 θ
48
The world of linear regression The world of regression (ctd.)
49
LECTURE 25 Simple binary hypothesis testing
Outline
– null hypothesis H0:
• Reference: Section 9.4 X ∼ pX (x; H0) [or fX (x; H0)]
– alternative hypothesis H1:
• Course Evaluations (until 12/16)
X ∼ pX (x; H1) [or fX (x; H1)]
http://web.mit.edu/subjectevaluation
– Choose a rejection region R;
reject H0 iff data ∈ R
• Review of simple binary hypothesis tests
• Likelihood ratio test: reject H0 if
– examples
pX (x; H1) fX (x; H1)
• Testing composite hypotheses >ξ or >ξ
pX (x; H0) fX (x; H0)
– is my coin fair? – fix false rejection probability α
– is my die fair? (e.g., α = 0.05)
• Likelihood ratio test; rejection region: • Likelihood ratio test; rejection region:
√ � √ �
(1/ 2π)n exp{− i(Xi − 1)2/2} (1/2 2π)n exp{− i Xi2/(2 · 4)}
√ � >ξ √ � >ξ
(1/ 2π)n exp{− i Xi2/2} (1/ 2π)n exp{− i Xi2/2}
� �
– algebra: reject H0 if: Xi > ξ � – algebra: reject H0 if Xi2 > ξ �
i i
• Find ξ � such that • Find ξ � such that
� �
n � � �
n �
P Xi > ξ � ; H0 = α P Xi2 > ξ �; H0 = α
i=1 i=1
�
– use normal tables – the distribution of i Xi2 is known
(derived distribution problem)
– “chi-square” distribution;
tables are available
50
Composite hypotheses Is my die fair?
• Partition the range into bins • Systematic methods for coming up with
– npi: expected incidence of bin i shape of rejection regions
(from the pdf)
– Ni: observed incidence of bin i • Methods to estimate an unknown PDF
(e.g., form a histogram and “smooth” it
– Use chi-square test (as in die problem)
out)
• Kolmogorov-Smirnov test:
form empirical CDF, F̂X , from data • Efficient and recursive signal processing
51
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.