0% found this document useful (0 votes)

101 views35 pages

Maximum Likelihood and Bayesian Parameter Estimation: Chapter 3, DHS

- The document discusses maximum likelihood estimation and Bayesian parameter estimation for pattern recognition applications. - It describes the challenges of not having complete probabilistic knowledge and instead having training data. The goal is to use training data to estimate unknown parameters of the classifier. - Maximum likelihood estimation finds the parameter values that maximize the probability of obtaining the observed training samples. Bayesian estimation treats parameters as random variables with a prior distribution, and observation of training samples results in a posterior distribution over the parameters.

Uploaded by

Nikhil Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

101 views35 pages

Maximum Likelihood and Bayesian Parameter Estimation: Chapter 3, DHS

Uploaded by

Nikhil Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

MAXIMUM LIKELIHOOD AND

BAYESIAN PARAMETER
ESTIMATION ‘-

Chapter 3, DHS

1
The Challenges we face

• We have learned about prior probabilities and class-conditional densities

• Unfortunately, in pattern recognition applications we rarely if ever have this kind of complete
knowledge about the probabilistic structure of the problem.
• ‘-
In a typical case we merely have some vague, general knowledge about the situation,
together with a number of design samples or training data — particular representatives of
the patterns we want to classify.
• Goal is to find some way to use this information to design or data train the classifier.

2
General Approach

• The number of available samples always seems too small, and serious problems arise
when the dimensionality of the feature vector 𝒙 is large.
• If we know the number of parameters in advance and our general knowledge about the
‘- then the severity of these
problem permits us to parameterize the conditional densities,
problems can be reduced significantly.
• Suppose, for example, that we can reasonably assume that 𝑝(𝑥|𝜔𝑖 ) is a normal
density with mean 𝜇𝑖 and covariance matrix Σ𝑖 , although we do not know the exact
values of these quantities. This knowledge simplifies the problem from one of
estimating an unknown function 𝑝(𝑥|𝜔𝑖 ) to one of estimating the parameters 𝜇𝑖 and
Σ𝑖 .
• We will consider only the supervised learning case where the true class label for each
sample is known.
3
Estimation Process we will study

• Maximum Likelihood estimation

• Maximum likelihood and several other methods view the parameters as quantities
whose values are fixed but unknown. The best estimate of their value is defined to be
‘- samples actually observed.
the one that maximizes the probability of obtaining the
• Bayesian estimation
• the parameters as random variables having some known a priori distribution.
Observation of the samples converts this to a posterior density, thereby revising our
opinion about the true values of the parameters.

4
Maximum Likelihood Estimation

• Given is a set 𝐷 = {𝑥1, . . . , 𝑥𝑛 } of independent and identically distributed (i.i.d.) samples drawn
from the density 𝑝(𝑥|𝜃).
• We would like to use training samples in 𝐷 to estimate the unknown parameter vector
• Define 𝐿(𝜃|𝐷) as the likelihood function of with respect ‘- to 𝐷 as:
𝐿 𝜃 𝐷 = 𝑝 𝐷 𝜃 = 𝑝 𝑥1 , . . . , 𝑥𝑛 𝜃 = ෑ 𝑝(𝑥𝑖 |𝜃)
x𝑖 ∈𝐷

• The maximum likelihood estimate (MLE) of is, by definition, the value 𝜃መ that maximizes
𝐿 𝜃 𝐷 and can be computed as:
𝜃መ = arg max 𝐿 𝜃 𝐷
𝜃
An equivalent log Likelihood is computed easily as:
𝜃መ = arg max log 𝐿 𝜃 𝐷 = arg max σ𝑖 𝑙𝑜𝑔 𝑝(𝑥𝑖 |𝜃)
𝜃 𝜃
5
Maximum Likelihood Estimation

• If the number of parameters is p, i.e.,𝜽 = (𝜃1, … . . , 𝜃𝑝 ), define the gradient operator as:

‘-

• The maximum likelihood estimate of 𝜽 then satisfies the necessary condition as:

6
Observations on MLE

• The MLE is the parameter point for which the observed sample is the most
likely.
• The procedure with partial derivatives may result‘-in several local extrema. We
should check each solution individually to identify the global optimum.
• Boundary conditions must also be checked separately for extrema.
• Invariance property: if 𝜃መ is the MLE of , then for any function 𝑓(𝜃),
መ the MLE of
መ
𝑓(𝜃) is 𝑓(𝜃).

7
The Gaussian Case: Unknown 𝜇

• Suppose that the samples are drawn from a multivariate normal population with mean 𝜇 and
covariance matrix Σ

‘-
• In this case, we have: 𝜽 = {𝝁}

• 𝝁) for 𝝁 can then be obtained using:

The Maximum Likelihood estimate (ෝ

• ෝ must vanish and therefore:

Observe that each component of 𝝁
8
The Gaussian Case: Unknown 𝜇 and Σ

• In the more general (and more typical) multivariate normal case, neither the mean 𝝁 nor the
covariance matrix 𝚺 is unknown.
• Consider first the univariate case with 𝜃1 = 𝜇 and 𝜃2 = 𝜎 2
‘-

• Computing the partial derivatives:

• The corresponding loglikelihood computation leads to conditions (𝜃መ𝑖 estimates 𝜃𝑖 ):

9
The Gaussian Case: Unknown 𝜇 and Σ

• Therefore, in the univariate case, we finally have the estimates as:

‘-
• For the multivariate case,

Sample mean An arithmetic mean of the n matrix

10
Bias of Estimators
• Bias of an estimator 𝜃መ is the difference between the expected value of 𝜃መ and 𝜃.
• The MLE of 𝜇 is an unbiased estimator for 𝜇 because 𝐸[𝜇Ƹ ] = 𝜇
• The MLE of 𝜎 2 is not an unbiased estimator for because the expected value over all data sets of size
𝑛 of the sample variance is not equal to the true variance:
‘-

• note that we have 𝜎 = 𝜎𝑥2= 𝐸 𝑥 2 − 𝜇2 and 𝐸 𝑥 = 𝐸 𝑥ҧ = 𝜇 . Therefore,

1 2 1 1 1
𝐸 σ 𝑥𝑖 − 𝑥ҧ = 𝐸(σ𝑖 𝑥𝑖2 − 2 σ𝑖 𝑥𝑖 𝑥ҧ + σ𝑖 𝑥ҧ 2) = 𝐸 σ𝑖 𝑥𝑖2 − 2 𝑛𝑥ҧ 2 + 𝑛𝑥ҧ 2 = 𝐸 σ𝑖 𝑥𝑖2 − 𝐸(𝑥ҧ 2)=
𝑛 𝑖 𝑛 𝑛 𝑛
1
𝐸 𝑥 2 − 𝐸 𝑥ҧ 2 = (𝜎𝑥2+𝜇2) -(𝜎𝑥2ҧ +𝜇2) = (𝜎𝑥2− 𝜎𝑥2ҧ )
𝑛
σ𝑖 𝑥𝑖 1 1 1 1
• Now, 𝜎𝑥2ҧ = var
n
=
n2
var(σ𝑖 𝑥𝑖 ) = σ 𝑣𝑎𝑟
𝑛2 𝑖
𝑥𝑖 =
𝑛2
𝑛 𝑣𝑎𝑟(𝑥) = 𝜎𝑥2 since the samples are i.i.d
𝑛
1 2 1
• Combining we get, 𝐸 σ
𝑛 𝑖
𝑥𝑖 − 𝑥ҧ = (𝜎𝑥2− 𝜎𝑥2) = (𝑛 − 1)𝜎𝑥2/𝑛
𝑛 11
Goodness of Fit

• To measure how well a fitted distribution resembles the sample data (goodness-of-fit), we can
use the Kolmogorov-Smirnov test statistic.
• It is defined as the maximum value of the absolute difference between the cumulative
distribution function estimated from the sample and the one‘-calculated from the fitted
distribution.
• After estimating the parameters for different distributions, we can compute the Kolmogorov-
Smirnov statistic for each distribution and choose the one with the smallest value as the best fit
to our sample.

12
MLE Examples

‘-

13
Bayesian Estimation

• Although the answers we get by this method will generally be nearly identical to those
obtained by maximum likelihood, there is a conceptual difference:
• In maximum likelihood methods we view the true parameter vector we seek, θ, to be
fixed ‘-
• In Bayesian learning we consider θ to be a random variable, and training data allows
us to convert a distribution on this variable into a posterior probability density.

14
Bayesian Estimation

• The computation of the posterior probabilities 𝑃(𝜔𝑖 |𝒙) lies at the heart of Bayesian classification.
Bayes’ formula allows us to compute these probabilities from the prior probabilities 𝑃(𝜔𝑖 ) and the
class-conditional densities 𝑝 𝒙 𝜔𝑖
• Given the sample collection 𝐷, the Bayes Formula: ‘-

• We will assume that the a priori probabilities are known and can be obtained from a trivial
calculation:
• In fact for different classes, we can compute 𝑐 different prior based on {𝐷1, … . 𝐷𝑐 }. Samples from
class 𝑖 has no influence in 𝑃𝑥|𝜔𝑗 , 𝐷), 𝑖 ≠ 𝑗

15
Parameter Distribution

• 𝑝(𝒙) is unknown but has known parametric form by saying that the function 𝑝(𝒙|𝜽) is
completely known.
• Any information we might have about 𝜽 prior to observing the samples is assumed to
be contained in a known prior density 𝑝(𝜽). ‘-
• Observation of the samples converts this to a posterior density 𝑝(𝜽|𝐷), which, we
hope, is sharply peaked about the true value of 𝜽.

16
MLE Vs Bayes Estimates

• Maximum likelihood estimation finds an estimate of based on

the samples in 𝐷 but a different sample set would give rise to a
different estimate. ‘-
• Bayes estimate takes into account the sampling variability.
• We assume that we do not know the true value of , and
instead of taking a single estimate, we take a weighted sum
of the densities 𝑝(𝑥|𝜃) weighted by the distribution 𝑝(𝜃|𝐷)

17
Gaussian Case

• We want to calculate the posterior density 𝑝(𝜽|𝐷), and the desired posterior density 𝑝 𝒙 𝐷
• Assume that 𝑝 𝑥 𝜇 = 𝑁(𝜇, Σ) and 𝜃 = [𝜇, Σ]
• Univariate Case: only 𝜇 is unknown, but its parametric form (both 𝜇0𝑎𝑛𝑑 𝜎0) is known
‘-

• Intuitively, 𝜇0 represents our guess for 𝜇 and 𝜎0 represents the uncertainty in our guess
• Suppose now {𝑥1, … . 𝑥𝑛 } are independently drawn from the resulting population. Using Bayes
Formula

Normalizati
on factor 18
Gaussian Case

• Since we have

‘-
• We compute,

Quadratic in 𝜇, so again a Gaussian

Factors that do not depend on 𝜇 have been absorbed into the

constants 𝛼, 𝛼 ′ , 𝛼 ′′ 19
Gaussian Distribution
• Typical Gaussian function:

‘-
• Identifying the coefficients we have: Sample mean

• Simplifying:

20
Gaussian Distribution: Observations
• 𝜇0 is our best prior guess and 𝜎 2 is the uncertainty about this guess.
• 𝜇𝑛 is our best guess after observing D and 𝜎𝑛2 is the uncertainty about this guess.
• 𝜇𝑛 always lies between 𝑥𝑛 and 𝜇0 with coefficients that are non-negative and sum to one..
• ‘-
If 𝜎0 ≠ 0, then 𝜇𝑛 approaches the sample mean as n approaches infinity
• If 𝜎0 = 0, we have a degenerate case in which our a priori certainty that 𝜇 = 𝜇0 is so strong that
no number of observations can change our opinion
• Otherwise, 𝜎0 ≫ 𝜎, we are so uncertain about our a priori guess that we take 𝜇𝑛 = 𝑥𝑛 , using only
the samples to estimate μ.
• In general, the relative balance between prior knowledge and empirical data is set by the ratio of
𝜎 2 to 𝜎02, which is sometimes called the dogmatism. If the dogmatism is not infinite, after enough
samples are taken the exact values assumed for 𝜇0 and 𝜎02 will be unimportant, and 𝜇𝑛 will
converge to the sample mean.
21
Univariate Case: 𝑝(𝑥|𝐷)

• Having obtained the a posteriori density for the mean, 𝑝(𝜇|𝐷), we obtain the “class-conditional”
density for 𝑝(𝑥|𝐷).

‘-

22
Multivariate Case

• 𝜇 is known, while Σ is unknown

• However, 𝜇0 and Σ0 are known. We have observed a set 𝑥𝑖 𝑛

‘- 𝑖=1

• Assuming the Gaussian form again ( ) we can compare the above

definition with the following

23
Multivariate Case

• Equating the corresponding coefficients

‘-

• Where the sample mean is

• To estimate the unknown 𝝁𝑛 and Σ𝑛 we simplify the above highlighted equation

24
Multivariate Case

• Given the posterior density 𝑝(𝜇|𝐷), the conditional density 𝑝(𝑥|𝐷)

can be computed as
𝑝 𝒙 𝐷 = 𝑁 𝝁𝒏 , 𝚺 + 𝚺‘-𝒏
which can be viewed as the sum of a random vector μ with
𝑝(𝝁|𝐷) = 𝑁(𝝁𝒏 , 𝚺𝒏 ) and an independent random vector 𝑦 with
𝑝(𝒚) = 𝑁(0,1).

25
Recursive Bayesian Learning

• Ideally, we want 𝑝(𝑥|𝐷) to converge to 𝑝(𝑥)

• So, given 𝐷𝑛 = {𝑥1, … 𝑥𝑛 } we have

‘-

• Corresponding posterior density function satisfies the relation:

• So, the iterative process goes like 𝑝 𝜃 𝐷0 , 𝑝 𝜃 𝑥1 , … … … . . , 𝑝 𝜃 𝑥1, … . 𝑥𝑛

• This is, too, our first example of an incremental or on-line learning Bayes incremental learning
method, where learning goes on as the data is collected.
26
Sufficient Statistics

• In principle, it requires that we preserve all the training points in 𝐷𝑛−1 in order to calculate p(θ|Dn)
but for some distributions, just a few parameters associated with 𝑝(𝜽|𝐷𝑛−1) contain all the
information needed. Such parameters are the sufficient statistics of those distributions.
‘-
• Sometime, it is also called True Recursive Bayesian Learning

27
When do Maximum Likelihood and Bayes methods
differ?
• In virtually every case, maximum likelihood and Bayes solutions are equivalent in
the asymptotic limit of infinite training data
• However since practical pattern recognition problems
‘-
invariably have a limited set
of training data, it is natural to ask when maximum likelihood and Bayes solutions
may be expected to differ, and then which we should prefer.

28
Observation

• Criterial to influence our choice:

• Computational complexity: maximum likelhood methods are often to be prefere
since they require merely differential calculus techniques or gradient search for 𝜃መ
‘-
rather than a possibly complex multidimensional integration needed in Bayesian
estimation
• Interpretability. The maximum likelihood solution will be easier to interpret and
understand since it returns the single best model from the set the designer
provided (and presumably understands). In contrast Bayesian methods give a
weighted average of models (parameters), often leading to solutions more
complicated and harder to understand than those provided by the designer.

29
Problems of Dimensionality
• In practical multicategory applications, it is not at all unusual to encounter problems involving
fifty or a hundred features, particularly if the features are binary valued.
• Two issues to deal with:
• how classification accuracy depends upon the dimensionality
‘- (and amount of training
data);
• the computational complexity of designing the classifier.
• Different sources of Classfication error:
• Bayes error: due to overlapping class-conditional densities (related to the features used).
• Model error: due to incorrect model.
• Estimation error: due to estimation from a finite sample (can be reduced by increasing the
amount of training data).
30
Bayes Error

• For a two class problem where,

Thus, the probability of error

decreases as r increases, approaching
‘-
zero as r approaches infinity.
• The Bayes error is:

the most useful features are the

ones for which the difference
Where 𝑟 is the Mahalanobis Distance, computed as: between the means is large relative
to the standard deviations. However
no feature is useless if its means for
In the conditionally independent case, we have, the two classes differ

and 31
Bayes Error

‘-

32
Model Error

• Exponential algorithms are generally so complex that for reasonable size cases we avoid them
altogether and resign ourselves to approximate solutions that can be found by polynomially
complex algorithms.
‘-

33
Estimation Error
Overfitting: It frequently happens that the number of available samples is inadequate, and the
question of how to proceed arises.
• Reduce the dimensionality, either by redesigning the feature extractor, by selecting an
appropriate subset of the existing features, or by combining the existing features in
some way. ‘-
• assume that all 𝑐 classes share the same covariance matrix, and to pool the available
data.
• look for a better estimate for Σ. If any reasonable a priori estimate Σ0 is available, a
Bayesian or pseudo-Bayesian estimate of the form λΣ0 + (1 − λ)Σ might be employed.
If Σ0 is diagonal, this diminishes the troublesome effects of “accidental” correlations.
• remove chance correlations heuristically by thresholding the sample covariance matrix. For
example, one might assume that all covariances for which the magnitude of the correlation
coefficient is not near unity are actually zero. An extreme of this approach is to assume
statistical independence, thereby making all the off-diagonal elements be zero, regardless
34
of empirical evidence to the contrary — an 𝑂(𝑛𝑑) calculation.
Overfitting Issue
In fitting the points in Fig. 3.4, then, we
might consider beginning with a
highorder polynomial (e.g., 10th order),
and successively smoothing or
simplifying our model by eliminating the
highest-order terms. While this would in
‘- virtually all cases lead to greater error
on the “training data,” we might expect
the generalization to improve.

Assignment #6, CS4/531
No ratings yet
Assignment #6, CS4/531
6 pages
4.ML_Estimation
No ratings yet
4.ML_Estimation
19 pages
جلسه پنجم-1
No ratings yet
جلسه پنجم-1
15 pages
Max Likelihood
No ratings yet
Max Likelihood
4 pages
Bayesian and MLE
No ratings yet
Bayesian and MLE
30 pages
Dr. Arslan Shaukat
No ratings yet
Dr. Arslan Shaukat
18 pages
Lecture 4
No ratings yet
Lecture 4
51 pages
Learning Models From Data: 1 Parametric Estimation
No ratings yet
Learning Models From Data: 1 Parametric Estimation
14 pages
Point Estimation: Definition of Estimators
No ratings yet
Point Estimation: Definition of Estimators
8 pages
CS775 Lec 2
No ratings yet
CS775 Lec 2
66 pages
Chapter 3
No ratings yet
Chapter 3
34 pages
ML-Map-and-Bayseian
No ratings yet
ML-Map-and-Bayseian
35 pages
Wk04 machine learning
No ratings yet
Wk04 machine learning
6 pages
Maximum Likelihood
No ratings yet
Maximum Likelihood
11 pages
unsupervised_learning_clustering_math
No ratings yet
unsupervised_learning_clustering_math
28 pages
ACFrOgDxHI9RLajsdAAleI AMD3fD8GMumHY4hP954G9Nc5wG y r Km6yewAtD6KPaLn4JtmlryIevFHyE5hLCpCG9kYiN y2aUEiWWoofQYGd7Z10 ETX5BGeaw6ImvJ9HjlO8aNIJuqL7FlX9wq3pZ2PgZnbra RuhNZrYg==
No ratings yet
ACFrOgDxHI9RLajsdAAleI AMD3fD8GMumHY4hP954G9Nc5wG y r Km6yewAtD6KPaLn4JtmlryIevFHyE5hLCpCG9kYiN y2aUEiWWoofQYGd7Z10 ETX5BGeaw6ImvJ9HjlO8aNIJuqL7FlX9wq3pZ2PgZnbra RuhNZrYg==
16 pages
PBM Notes
No ratings yet
PBM Notes
130 pages
4.2 Bayes Decision Theory
No ratings yet
4.2 Bayes Decision Theory
49 pages
Likelihood, Bayesian, and Decision Theory
No ratings yet
Likelihood, Bayesian, and Decision Theory
50 pages
M3 DensityEstimation v1
No ratings yet
M3 DensityEstimation v1
65 pages
Maximum Likelihood Estimation by K.Kashin
No ratings yet
Maximum Likelihood Estimation by K.Kashin
34 pages
11 Parameter Estimation
No ratings yet
11 Parameter Estimation
6 pages
A Pattern Is An Abstract Object, Such As A Set of Measurements Describing A Physical Object
No ratings yet
A Pattern Is An Abstract Object, Such As A Set of Measurements Describing A Physical Object
12 pages
ML Notes
No ratings yet
ML Notes
4 pages
4.4 Parametric and Non-parametric Estimator
No ratings yet
4.4 Parametric and Non-parametric Estimator
47 pages
Notes4_BayesianLearning
No ratings yet
Notes4_BayesianLearning
8 pages
Maximum-Likelihood & Bayesian Parameter Estimation: Srihari: CSE 555
No ratings yet
Maximum-Likelihood & Bayesian Parameter Estimation: Srihari: CSE 555
9 pages
Bayesian Learning
No ratings yet
Bayesian Learning
21 pages
Chap - 2point - Estimation
No ratings yet
Chap - 2point - Estimation
11 pages
2 Mle
No ratings yet
2 Mle
28 pages
I2ml3e Chap4
No ratings yet
I2ml3e Chap4
28 pages
Probability Theory For Machine Learning: Chris Cremer September 2015
No ratings yet
Probability Theory For Machine Learning: Chris Cremer September 2015
40 pages
Lecture 03 Maximum Likelihood Estimation
No ratings yet
Lecture 03 Maximum Likelihood Estimation
22 pages
Introduction To Bayesian Methods: Jessi Cisewski Department of Statistics Yale University
No ratings yet
Introduction To Bayesian Methods: Jessi Cisewski Department of Statistics Yale University
53 pages
Session 32 - Point Estimate
No ratings yet
Session 32 - Point Estimate
53 pages
PRML Slides 2
No ratings yet
PRML Slides 2
86 pages
15.097: Probabilistic Modeling and Bayesian Analysis
No ratings yet
15.097: Probabilistic Modeling and Bayesian Analysis
42 pages
Assignment 10 solution
No ratings yet
Assignment 10 solution
8 pages
CSCE 970 Lecture 2: Bayesian-Based Classifiers: Most Probable
No ratings yet
CSCE 970 Lecture 2: Bayesian-Based Classifiers: Most Probable
5 pages
Statistical Inference
No ratings yet
Statistical Inference
55 pages
Statistics
No ratings yet
Statistics
60 pages
Main Parameterestimation PDF
No ratings yet
Main Parameterestimation PDF
73 pages
DS 630_Lec 02_St
No ratings yet
DS 630_Lec 02_St
34 pages
Estimation Theory
100% (1)
Estimation Theory
8 pages
Questions_for_Unit_4 (2)
No ratings yet
Questions_for_Unit_4 (2)
6 pages
Statistical Machine Learning W4400 Lecture Slides PDF
No ratings yet
Statistical Machine Learning W4400 Lecture Slides PDF
520 pages
MA40189 Notes
No ratings yet
MA40189 Notes
70 pages
Assign 1
No ratings yet
Assign 1
5 pages
10.0 Lesson Plan: Answer Questions Robust Estimators Maximum Likelihood Estimators
No ratings yet
10.0 Lesson Plan: Answer Questions Robust Estimators Maximum Likelihood Estimators
15 pages
10.0 Lesson Plan: Answer Questions Robust Estimators Maximum Likelihood Estimators
No ratings yet
10.0 Lesson Plan: Answer Questions Robust Estimators Maximum Likelihood Estimators
15 pages
10.0 Lesson Plan: Answer Questions Robust Estimators Maximum Likelihood Estimators
No ratings yet
10.0 Lesson Plan: Answer Questions Robust Estimators Maximum Likelihood Estimators
15 pages
MLE_Assingnment (1)
No ratings yet
MLE_Assingnment (1)
7 pages
확통1 LectureNote09 on Bayesian Statistical Inference
No ratings yet
확통1 LectureNote09 on Bayesian Statistical Inference
78 pages
Modelling The Data
No ratings yet
Modelling The Data
13 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
46 pages
Weatherwax Theodoridis Solutions
No ratings yet
Weatherwax Theodoridis Solutions
212 pages
Bayesian Decision Theory and Learning: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
No ratings yet
Bayesian Decision Theory and Learning: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
56 pages
Agricultural Land Use in Kerala
No ratings yet
Agricultural Land Use in Kerala
5 pages
sta255 Week 11-1 pre
No ratings yet
sta255 Week 11-1 pre
37 pages
Maximum Likelihood
No ratings yet
Maximum Likelihood
7 pages
Exercises of Statistical Inference
From Everand
Exercises of Statistical Inference
Simone Malacrida
No ratings yet
To Pattern Recognition: CSE555, Fall 2021 Chapter 1, DHS
100% (1)
To Pattern Recognition: CSE555, Fall 2021 Chapter 1, DHS
39 pages
Bayes Decision Theory
No ratings yet
Bayes Decision Theory
53 pages
HW 2 Solutions
No ratings yet
HW 2 Solutions
5 pages
Assignment #5, CS4/531
No ratings yet
Assignment #5, CS4/531
9 pages
Assignment #5, CS4/531
No ratings yet
Assignment #5, CS4/531
9 pages
HW 3 Sol
No ratings yet
HW 3 Sol
8 pages
HW 1 Solutions
No ratings yet
HW 1 Solutions
3 pages
HW4 Solutions
No ratings yet
HW4 Solutions
7 pages
Midterm I - Version B: 1 2 1.5 3 Log N
No ratings yet
Midterm I - Version B: 1 2 1.5 3 Log N
5 pages
Hw1-Sol CSE 531
No ratings yet
Hw1-Sol CSE 531
9 pages
Armored Kinorhynch Like Scalidophoran An
No ratings yet
Armored Kinorhynch Like Scalidophoran An
10 pages
Tomagano Community Report 2021 For Debswana Assignmnet
No ratings yet
Tomagano Community Report 2021 For Debswana Assignmnet
15 pages
Icfo Kimia 7
No ratings yet
Icfo Kimia 7
9 pages
PHY101
No ratings yet
PHY101
336 pages
Qiskit Pocket Guide (Early Release) James L. Weaver download
100% (6)
Qiskit Pocket Guide (Early Release) James L. Weaver download
79 pages
Rivers Flowing Into Bay of Bengal: Meghna River Basin
No ratings yet
Rivers Flowing Into Bay of Bengal: Meghna River Basin
9 pages
GenePath DX BCR ABL - IFU
No ratings yet
GenePath DX BCR ABL - IFU
4 pages
Jessica Mott Do My Homework
100% (1)
Jessica Mott Do My Homework
6 pages
Pathways Listening, Speaking, And Critical Thinking Foundations, 2e
No ratings yet
Pathways Listening, Speaking, And Critical Thinking Foundations, 2e
196 pages
스크린샷,
No ratings yet
스크린샷,
8 pages
Matura 1999
No ratings yet
Matura 1999
6 pages
RPH Lecture Notes
No ratings yet
RPH Lecture Notes
8 pages
Manualportugues Txrail Usb Transmitter 4-20ma v20x B en
No ratings yet
Manualportugues Txrail Usb Transmitter 4-20ma v20x B en
4 pages
Travelogues
No ratings yet
Travelogues
42 pages
June 2016 MS
No ratings yet
June 2016 MS
13 pages
22ise251-Qb-Module 2 Updated 1
No ratings yet
22ise251-Qb-Module 2 Updated 1
4 pages
Advance Welding Question Paper 21 22
100% (1)
Advance Welding Question Paper 21 22
3 pages
Unit Test 1
No ratings yet
Unit Test 1
2 pages
Drives and The CNS
No ratings yet
Drives and The CNS
12 pages
The Seven Quality Control Tools PDF
No ratings yet
The Seven Quality Control Tools PDF
50 pages
Lesson Plan_Listening Skill 1
No ratings yet
Lesson Plan_Listening Skill 1
9 pages
Screenshot 2025-01-07 at 20.19.04
No ratings yet
Screenshot 2025-01-07 at 20.19.04
10 pages
Submitted By: Dakshyani Murari Hemanya Ahuja Krishvardhan Maini Lavanya Chaudhary Viyom Gupta
No ratings yet
Submitted By: Dakshyani Murari Hemanya Ahuja Krishvardhan Maini Lavanya Chaudhary Viyom Gupta
10 pages
QSM EXTRUDER 2019 09 - Web
No ratings yet
QSM EXTRUDER 2019 09 - Web
6 pages
Prelim Lab Quiz 1 - Attempt Review
No ratings yet
Prelim Lab Quiz 1 - Attempt Review
8 pages
Bevel Gear
No ratings yet
Bevel Gear
28 pages
Sun Yat Sen University Cs 02.08.23
No ratings yet
Sun Yat Sen University Cs 02.08.23
3 pages
Military
No ratings yet
Military
3 pages
Environmental Management System: January 2009
No ratings yet
Environmental Management System: January 2009
9 pages
Pengaruh Citra Merek (Brand Image) Terhadap Keputusan Pembelian Kerudung Deenay (Studi Pada Konsumen Gea Fashion Banjar)
No ratings yet
Pengaruh Citra Merek (Brand Image) Terhadap Keputusan Pembelian Kerudung Deenay (Studi Pada Konsumen Gea Fashion Banjar)
13 pages

Maximum Likelihood and Bayesian Parameter Estimation: Chapter 3, DHS

Uploaded by

Maximum Likelihood and Bayesian Parameter Estimation: Chapter 3, DHS

Uploaded by

MAXIMUM LIKELIHOOD AND

• We have learned about prior probabilities and class-conditional densities

• Maximum Likelihood estimation

• 𝝁) for 𝝁 can then be obtained using:

• ෝ must vanish and therefore:

• Computing the partial derivatives:

• The corresponding loglikelihood computation leads to conditions (𝜃መ𝑖 estimates 𝜃𝑖 ):

• Therefore, in the univariate case, we finally have the estimates as:

Sample mean An arithmetic mean of the n matrix

• note that we have 𝜎 = 𝜎𝑥2= 𝐸 𝑥 2 − 𝜇2 and 𝐸 𝑥 = 𝐸 𝑥ҧ = 𝜇 . Therefore,

• Maximum likelihood estimation finds an estimate of based on

Quadratic in 𝜇, so again a Gaussian

Factors that do not depend on 𝜇 have been absorbed into the

• 𝜇 is known, while Σ is unknown

• However, 𝜇0 and Σ0 are known. We have observed a set 𝑥𝑖 𝑛

• Assuming the Gaussian form again ( ) we can compare the above

• Equating the corresponding coefficients

• Where the sample mean is

• To estimate the unknown 𝝁𝑛 and Σ𝑛 we simplify the above highlighted equation

• Given the posterior density 𝑝(𝜇|𝐷), the conditional density 𝑝(𝑥|𝐷)

• Ideally, we want 𝑝(𝑥|𝐷) to converge to 𝑝(𝑥)

• Corresponding posterior density function satisfies the relation:

• So, the iterative process goes like 𝑝 𝜃 𝐷0 , 𝑝 𝜃 𝑥1 , … … … . . , 𝑝 𝜃 𝑥1, … . 𝑥𝑛

• Criterial to influence our choice:

• For a two class problem where,

Thus, the probability of error

the most useful features are the

You might also like