Statistical Foundation for Electrical Engineers
(EE343)
Unit-8 Classical Statistical Inference
Tutorial 8: Classical Statistical Inference
Krishnan C.M.C
Assistant Professor, E&E,
NITK Surathkal
MLE
T8-P1: Estimate the mean of Bernoulli RV – Using the MLE, find the probability
of getting heads for a biased coin based on multiple, independent coin flips.
Let there be 𝑛 coin flips and let 𝑋𝑖 represent the indicator RV for the ith flip
with the unknow parameter 𝜃 = 𝑃 𝑋𝑖 = 1 .
Then, the likelihood function is given by 𝑛 𝑘 𝑛−𝑘
𝑓𝑋 𝑥; 𝜃 = 𝜃 1−𝜃
𝑘
Here 𝑘 = σ𝑛𝑖=1 𝑥𝑖 and
loglikelihood function is given by 𝑋 = 𝑋1 , 𝑋2 , … , 𝑋𝑛
𝑛
log 𝑓𝑋 𝑥; 𝜃 = log + 𝑘 log 𝜃 + 𝑛 − 𝑘 log 1 − 𝜃
𝑘
Differentiating w.r.t. 𝜃 and equating to zero we have
𝜕 𝑘 𝑛−𝑘 𝑘
log 𝑓𝑋 𝑥; 𝜃 =0+ − =0 ⇒ 𝜃መ𝑛 =
𝜕𝜃 𝜃 1−𝜃 𝑛
Thus MLE estimator is given by To note:
𝑛 • Θ 𝑛 is in fact the sample mean.
𝒦
𝑛 =
Θ 𝒦 = 𝑋𝑖 • Unbiased and consistent.
𝑛
𝑖=1
HW: compare it with a similar scenario of Bayesian framework with flat prior (fair coin).
𝑀𝐴𝑃 = 𝑘 + 1Τ𝑛 + 2. So, both converge asymptotically!
You will obtain Θ
EE343, Dept. of E & E, NITK Surathkal
MLE
T8-P2: Consider a sequence of independent tosses and let 𝜃 be the probability
of heads at each toss.
(a) Fix some 𝑘 and let 𝑁 be the number of tosses until the 𝑘𝑡ℎ head occurs.
𝟏 ) based on the observation 𝑁.
Find the ML estimate of 𝜃 (call it Θ
(b) Compare it with the case where you fix 𝑛 and let 𝒦 be the number of heads
2 . See the previous problem P1.
in these 𝑛 tosses. Call this estimate of 𝜃 as Θ
𝑁 is a RV and we say that at the observation 𝑁 = 𝑛 we obtain 𝑘𝑡ℎ head.
𝑛 𝑛−1
This is like a Binomial RV, except that number will be replaced by
𝑘 𝑘−1
Then, the log likelihood function is given by
𝑛−1
log 𝑓𝑋 𝑥; 𝜃 = log + 𝑘 log 𝜃 + 𝑛 − 𝑘 log 1 − 𝜃
𝑘−1
Differentiating w.r.t. 𝜃 and equating to zero we will get the same result as P1.
That is 𝑘
Θ1 = Comparison:
𝑁 • For Θ1 the term 𝑁 is the RV, whereas
Whereas from P1 we have 2 it is the no. of heads!
for Θ
𝑛 1 is biased and Θ
• Θ 2 is unbiased.
𝒦
Θ2 = 𝒦 = 𝑋𝑖
𝑛 𝑖=1
EE343, Dept. of E & E, NITK Surathkal
MLE
T8-P3: Let the PDF of a random variable X be the mixture of m components:
𝑚 𝑚
𝑓𝑋 𝑥 = 𝑝𝑗 𝑓𝑌𝑗 𝑥 ; 𝑝𝑗 = 1 & 𝑝𝑗 ≥ 0 𝑓𝑜𝑟 1 ≤ 𝑗 ≤ 𝑚
𝑗=1 𝑗=1
Assume that each 𝑌𝑗 ~𝒩 𝜇𝑗 , 𝜎𝑗2 and that we have a set of observation 𝑋 =
(𝑋1 , 𝑋2 , . . 𝑋𝑛 ) each of the entry with PDF 𝑓𝑋 𝑥 and independent.
(a) Write down the likelihood and log-likelihood functions
Likelihood function
𝑛 𝑚
1 2
− 𝑥𝑖 −𝜇𝑗 ൗ2𝜎𝑗2
𝑓𝑋 𝑥; 𝜇, 𝜎 2 = ෑ 𝑝𝑗 𝑒
𝑖=1 𝑗=1
𝜎𝑗 2𝜋
Log-Likelihood function
𝑛 𝑚
2
1 − 𝑥𝑖 −𝜇𝑗
2
ൗ2𝜎𝑗2
log 𝑓𝑋 𝑥; 𝜇, 𝜎 = log 𝑝𝑗 𝑒
𝑖=1 𝑗=1
𝜎𝑗 2𝜋
EE343, Dept. of E & E, NITK Surathkal
MLE
T8-P3: Let the PDF of a random variable X be the mixture of m components:
𝑚 𝑚
𝑓𝑋 𝑥 = 𝑝𝑗 𝑓𝑌𝑗 𝑥 ; 𝑝𝑗 = 1 & 𝑝𝑗 ≥ 0 𝑓𝑜𝑟 1 ≤ 𝑗 ≤ 𝑚
𝑗=1 𝑗=1
Assume that each 𝑌𝑗 ~𝒩 𝜇𝑗 , 𝜎𝑗2
and that we have a set of observation 𝑋 =
(𝑋1 , 𝑋2 , . . 𝑋𝑛 ) each of the entry with PDF 𝑓𝑋 𝑥 and independent.
(b) Consider the case 𝑚 = 2 and 𝑛 = 1 , and assume that 𝜇1 , 𝜇2 , 𝜎1 & 𝜎2 are
known. Find the ML estimates of 𝑝1 & 𝑝2 .
Log-Likelihood function = 𝑐1 = 𝑐2
1 2 Τ2𝜎2 1 2 2
log 𝑓𝑋 𝑥; 𝜇, 𝜎 2 = log 𝑝1 𝑒 − 𝑥−𝜇 1 1 + (1 − 𝑝1 ) 𝑒 − 𝑥−𝜇2 Τ2𝜎2
𝜎1 2𝜋 𝜎2 2𝜋
∵ 𝑝2 = 1 − 𝑝1
This is linear in 𝑝1 log 𝑓𝑋 𝑥; 𝜇, 𝜎 2 = 𝑝1 𝑐1 − 𝑐2 + 𝑐2
𝑝Ƹ1𝑀𝐿 = 0 𝑝Ƹ1𝑀𝐿 = 1
0 1 𝑝1
1 2 Τ2𝜎2 1 2 Τ2𝜎2
1 if 𝑒 − 𝑥−𝜇1 1 > 𝑒 − 𝑥−𝜇2 2
𝑃1𝑀𝐿𝐸 = ൞ 𝜎1 2𝜋 𝜎2 2𝜋
0 otherwise EE343, Dept. of E & E, NITK Surathkal
MLE
T8-P3: Let the PDF of a random variable X be the mixture of m components:
𝑚 𝑚
𝑓𝑋 𝑥 = 𝑝𝑗 𝑓𝑌𝑗 𝑥 ; 𝑝𝑗 = 1 & 𝑝𝑗 ≥ 0 𝑓𝑜𝑟 1 ≤ 𝑗 ≤ 𝑚
𝑗=1 𝑗=1
Assume that each 𝑌𝑗 ~𝒩 𝜇𝑗 , 𝜎𝑗2 and that we have a set of observation 𝑋 =
(𝑋1 , 𝑋2 , . . 𝑋𝑛 ) each of the entry with PDF 𝑓𝑋 𝑥 and independent.
(c) Consider the case 𝑚 = 2 and 𝑛 = 1 , and assume that 𝑝1 , 𝑝2 , 𝜎1 & 𝜎2 are
known. Find the ML estimates of 𝜇1 & 𝜇2 .
Log-Likelihood function
1 2 Τ2𝜎2 1 2 2
log 𝑓𝑋 𝑥; 𝜇, 𝜎 2 = log 𝑝1 𝑒 − 𝑥−𝜇 1 1 + (1 − 𝑝1 ) 𝑒 − 𝑥−𝜇2 Τ2𝜎2
𝜎1 2𝜋 𝜎2 2𝜋
We need to maximize the term inside the parenthesis w.r.t 𝜇1 followed by 𝜇2
By inspection (without doing the differentiation) we can see that the
exponent has to be minimized
Thus the ML estimates are nothing but
𝜇Ƹ 1𝑀𝐿𝐸 = 𝜇Ƹ 2𝑀𝐿𝐸 = 𝑥
EE343, Dept. of E & E, NITK Surathkal
MLE
T8-P4:Consider a study of student heights in a batch. Assume that the
height of a S1 student is normally distributed with mean 𝜇1 and variance 𝜎12 ,
and that the height of a S2 student is normally distributed with mean 𝜇2 and
variance 𝜎22 . Assume that a student is equally likely to be from S1 or S2. A
sample of size 𝑛 = 10 was collected and the following values were recorded
(in centimeters) : 164, 167, 163, 158, 170, 183, 176, 159, 170, 167
(a) Assume that 𝜇1 , 𝜇2 , 𝜎1 & 𝜎2 are unknown and write down the likelihood
function
10
1 2 Τ2𝜎2 1 𝑥𝑖 −𝜇2 2Τ2𝜎22
𝑓𝑋 𝑥; 𝜇1 , 𝜇2 , 𝜎1 , 𝜎2 = ෑ 0.5 𝑒 − 𝑥𝑖−𝜇1 1 + 0.5 𝑒−
𝑖=1
𝜎1 2𝜋 𝜎2 2𝜋
(b) Assume that we know 𝜎12 = 𝜎22 = [Link] the ML estimates of
𝜇1 & 𝜇2 numerically Max @ 𝜇1 ≈ 174, 𝜇2 ≈ 156
We need to write a program for that
searches for this maxima
Using Matlab and using brute-
force (fine grid) search,
𝜇Ƹ 1𝑀𝐿𝐸 = 174, 𝜇Ƹ 2𝑀𝐿𝐸 = 156
EE343, Dept. of E & E, NITK Surathkal
MLE
T8-P4:Consider a study of student heights in a batch. Assume that the
height of a S1 student is normally distributed with mean 𝜇1 and variance 𝜎12 ,
and that the height of a S2 student is normally distributed with mean 𝜇2 and
variance 𝜎22 . Assume that a student is equally likely to be from S1 or S2. A
sample of size 𝑛 = 10 was collected and the following values were recorded
(in centimeters) : 164, 167, 163, 158, 170, 183, 176, 159, 170, 167
(c) Treating the estimates obtained in part (b) as exact values, describe the
MAP rule for deciding a student’s Section based on the student's height
Bayesian framework with Θ = 𝑆1 𝑂𝑅 𝑆2. prior 𝑝Θ = 0.5 each (equally likely)
𝑓X|Θ 𝑥 𝑆1 = 𝒩(𝜇1 , 𝜎12 ) 𝑓X|Θ 𝑥 𝑆2 = 𝒩(𝜇2 , 𝜎22 )
𝑓X|Θ 𝑥 𝑆1 𝑓X|Θ 𝑥 𝑆2
𝑝Θ|X 𝑆1 𝑥 = 𝑝Θ|X 𝑆2 𝑥 =
𝑓X|Θ 𝑥 𝑆1 + 𝑓X|Θ 𝑥 𝑆2 𝑓X|Θ 𝑥 𝑆1 + 𝑓X|Θ 𝑥 𝑆2
𝑆2 𝑆1
𝑆1 if 𝑓X|Θ 𝑥 𝑆1 > 𝑓X|Θ 𝑥 𝑆2 ∵ 𝜎12 = 𝜎22
MAP
Θ =൝
𝑆2 𝑖𝑓 𝑓X|Θ 𝑥 𝑆1 < 𝑓X|Θ 𝑥 𝑆2
174 + 156 𝜇2 ≈ 156 𝜇1 ≈ 174
𝑥=
2 EE343, Dept. of E & E, NITK Surathkal