Maximum Likelihood Estimation - I
Dr. A. Ramesh
DEPARTMENT OF MANAGEMENT STUDIES
1
Agenda
• This lecture will provide intuition behind the MLE using Theory and
examples.
2
Maximum Likelihood Estimation
• The method of maximum likelihood was first introduced by R. A.
Fisher, a geneticist and statistician, in the 1920s.
• Most statisticians recommend this method, at least when the
sample size is large, since the resulting estimators have certain
desirable efficiency properties
• Maximum likelihood estimation(MLE) is a method to find most likely density
function, that would have generated data.
• MLE requires one to make distribution assumption first.
3
An intuitive view on likelihood
= −2, 2 = 1
= 0, 2 = 1
= 0, 2 = 4
4
Maximum Likelihood Estimation: Problem
• A sample of ten new bike helmets manufactured by a certain company is
obtained. Upon testing, it is found that the first, third, and tenth helmets
are flawed, whereas the others are not.
• Let p = P(flawed helmet), i.e., p is the proportion of all such helmets that
are flawed.
• Define (Bernoulli) random variables X1, X2, . . . , X10 by
Source: Probability and Statistics for Engineering and the Sciences, Jay L Devore, 8th Ed, Cengage
5
Maximum Likelihood Estimation: Problem
• Then for the obtained sample, X1 = X3 = X10 = 1 and the other seven Xi’s are
all zero
• The probability mass function of any particular Xi is ,
which becomes p if xi = 1 and 1 – p when xi = 0
• Now suppose that the conditions of various helmets are independent of
one another
• This implies that the Xi’s are independent, so their joint probability mass
function is the product of the individual pmf’s.
6
Maximum Likelihood Estimation: Binomial Distribution
• Joint pmf evaluated at the observed Xi’s is
f(x1, . . . , x10; p) = p(1 – p)p . . . p = p3(1 – p)7 - (1)
• Suppose that p = .25. Then the probability of observing the sample that
we actually obtained is (.25)3(.75)7 = .002086.
• If instead p = .50, then this probability is (.50)3(.50)7 = .000977.
• For what value of p is the obtained sample most likely to have occurred?
• That is, for what value of p is the joint pmf (eq 1) as large as it can be?
• What value of p maximizes (eq 1)
7
Maximum Likelihood Estimation: Binomial Distribution
• Figure shows a graph of the likelihood (eq 1) as a function of p.
• It appears that the graph reaches its peak above p = .3 = the proportion of
flawed helmets in the sample.
Graph of the likelihood (joint pmf) (eq 1)
8
Graph of the natural logarithm of the likelihood
• Figure shows a graph of the
natural logarithm of (eq 1)
• Since ln[g(u)] is a strictly
increasing function of g(u),
finding u to maximize the
function g(u) is the same as
finding u to maximize ln[g(u)].
9
Maximum Likelihood Estimation: Binomial Distribution
• We can verify our visual impression by using calculus to find the value of p
that maximizes (eq 1).
• Working with the natural log of the joint pmf is often easier than working
with the joint pmf itself, since the joint pmf is typically a product so its
logarithm will be a sum.
• Here ln[ f (x1, . . . , x10; p)] = ln[p3(1 – p)7]
• = 3ln(p) + 7ln(1 – p)
10
Maximum Likelihood Estimation: Binomial Distribution
Thus
11
Interpretation
• Equating this derivative to 0 and solving for p gives
3(1 – p) = 7p, from which 3 = 10p and so p = 3/10 = .30 as conjectured
• That is, our point estimate is p = .30.
• It is called the maximum likelihood estimate because it is the parameter
value that maximizes the likelihood (joint pmf) of the observed sample
• In general, the second derivative should be examined to make sure a
maximum has been obtained, but here this is obvious from Figure
12
Maximum Likelihood Estimation: Binomial Distribution
• Suppose that rather than being told the condition of every helmet, we had
only been informed that three of the ten were flawed.
• Then we would have the observed value of a binomial random variable X =
the number of flawed helmets.
• The pmf of X is For x = 3, this becomes
• The binomial coefficient is irrelevant to the maximization, so again p =
0.30.
13
Maximum Likelihood Function Definition
• Let 𝑋1 , 𝑋2 ,…, 𝑋𝑛 have joint pmf or pdf
𝑓(𝑥1 , 𝑥2 , … , 𝑥𝑛 ; 𝜃1 , … , 𝜃𝑚 ) (a)
• Where the parameters 𝜃1 , … , 𝜃𝑚 have unknown values. When 𝑥1 , … , 𝑥𝑛 are the observed
sample values and (a) is regarded as a function of 𝜃1 , … , 𝜃𝑚 , it is called the likelihood
function.
^ ^
• The maximum likelihood estimates (mle’s)
the likelihood function, so that
,...,
1 m
are those values of the i’s that maximize
^ ^
𝑓(𝑥1 , 𝑥2 , … , 𝑥𝑛 ; 1,..., m) ≥ 𝑓(𝑥1 , 𝑥2 , … , 𝑥𝑛 ; 𝜃1 , … , 𝜃𝑚 ) for all 𝜃1 , … , 𝜃𝑚
• When the 𝑋𝑖′ 𝑠 are substituted in place of the 𝑥𝑖′ 𝑠, the maximum likelihood estimators result.
14
Interpretation
• The likelihood function tells us how likely the observed sample is as a
function of the possible parameter values.
• Maximizing the likelihood gives the parameter values for which the
observed sample is most likely to have been generated—that is, the
parameter values that “agree most closely” with the observed data.
15
Estimation of Poisson Parameter
• Suppose we have data generated from a Poisson distribution. We want to
estimate the parameter of the distribution
e− X
• The probability of observing a particular random variable is P( X ; ) =
X!
• Joint likelihood by multiplying the individual probabilities together
e − X1 e − X 2 e− X n
P( X 1 , X 2 ,, X n ; ) =
X 1! X 2! X n!
L ( ; X) = e − X i
i
L( ; X) = e − n nX
16
Estimation of Poisson Parameter
• Note in the likelihood function the factorials have disappeared.
• This is because they provide a constant that does not influence the
relative likelihood of different values of the parameter
• It is usual to work with the log likelihood rather than the likelihood.
• Note that maximising the log likelihood is equivalent to maximising the
likelihood. Take the natural log of the
likelihood function
L( ; X) = e − n nX
( ; X) = −n + nX log Find where the derivative of the log
likelihood is zero
d nX
= −n +
d Note that here the MLE is the same as the
ˆ = X moment estimator
17
Estimation of exponential distribution Parameter
• Suppose X1, X2, . . . , Xn is a random sample from an exponential
distribution with parameter . Because of independence, the likelihood
function is a product of the individual pdf’s:
• The natural logarithm of the likelihood function is
ln[ f (x1, . . . , xn ; )] = n ln() – xi
18
Estimation of exponential distribution Parameter
• Equating (d/d)[ln(likelihood)] to zero results in
n/ – xi = 0, or = n/xi =
• Thus the MLE is
19
Estimation of parameters of Normal Distribution
• Let X1, . . . , Xn be a random sample from a normal distribution.
• The likelihood function is
• so
20
Estimation of parameters of normal distribution
• To find the maximizing values of and 2, we must take the partial derivatives
of ln(f ) with respect to and 2, equate them to zero, and solve the resulting
two equations.
• Omitting the details, the resulting MLE’s are
• The MLE of 2 is not the unbiased estimator, so two different principles of
estimation (unbiasedness and maximum likelihood) yield two different
estimators
21
Thank you
22