1-MS2 (Intro Bayes)
1-MS2 (Intro Bayes)
1 / 25
1 Introduction to Bayesian inference
1.1 Introduction
2 / 25
1 Introduction to Bayesian inference
1.1 Introduction
Both philosophies have pros and cons. Those will not be discussed in this
course.
3 / 25
1 Introduction to Bayesian inference
1.1 Introduction
Bayes’ rule
P(A|B)P(B)
P(B|A) =
P(A)
P(A|B)P(B)
= .
P(A|B)P(B) + P(A|B c )P(B c )
4 / 25
1 Introduction to Bayesian inference
1.1 Introduction
5 / 25
1 Introduction to Bayesian inference
1.2 Bayesian approach
P(Y = y |θ = 1)P(θ = 1)
P(θ = 1|Y = y ) =
P(Y = y |θ = 1)P(θ = 1) + P(Y = y |θ = 0)P(θ = 0)
6 / 25
1 Introduction to Bayesian inference
1.2 Bayesian approach
P(Y = y |θ = 1)P(θ = 1)
P(θ = 1|Y = y ) =
P(Y = y |θ = 1)P(θ = 1) + P(Y = y |θ = 0)P(θ = 0)
Terminology:
• P(θ = 1), P(θ = 0) are prior probabilities
6 / 25
1 Introduction to Bayesian inference
1.2 Bayesian approach
P(Y = y |θ = θk )P(θ = θk )
P(θ = θk |Y = y ) = PK
i=1 P(Y = y |θ = θi )P(θ = θi )
7 / 25
1 Introduction to Bayesian inference
1.2 Bayesian approach
P(Y = y |θ = θk )P(θ = θk )
P(θ = θk |Y = y ) = PK
i=1 P(Y = y |θ = θi )P(θ = θi )
Short-hand notation
P(y |θk )P(θk )
P(θk |y ) = PK
i=1 P(y |θi )P(θi )
7 / 25
1 Introduction to Bayesian inference
1.2 Bayesian approach
P(θ)L(θ|Y = y )
P(θ|y ) = R
P(θ)L(θ|Y = y )dθ
8 / 25
1 Introduction to Bayesian inference
1.2 Bayesian approach
P(θ)L(θ|Y = y )
P(θ|y ) = R
P(θ)L(θ|Y = y )dθ
P(θ|y ) ∝ P(θ)L(θ|Y = y )
8 / 25
1 Introduction to Bayesian inference
1.3 Prior & posterior distribution
Examples
• Binomial case
• Poisson case
9 / 25
1 Introduction to Bayesian inference
1.4 Bayesian inference
10 / 25
1 Introduction to Bayesian inference
1.4 Bayesian inference
10 / 25
1 Introduction to Bayesian inference
1.4 Bayesian inference
• Posterior variance
10 / 25
1 Introduction to Bayesian inference
1.4 Bayesian inference
• Posterior variance
• Credible intervals
10 / 25
1 Introduction to Bayesian inference
1.4 Bayesian inference
• Posterior variance
• Credible intervals
10 / 25
1 Introduction to Bayesian inference
1.4 Bayesian inference
Posterior mode
Proposition 1.2
Let Y = (Y1 , ..., Yn )|θ a conditional iid sample, and θ a random variable
on Θ with prior distribution P(θ) and posterior distribution
P(θ|y ) := P(θ|Y = y ). Let θ̂M the posterior mode. The following
properties are true
• if P(θ) is constant for all possible values of θ, then θ̂M = θ̂MLE
11 / 25
1 Introduction to Bayesian inference
1.4 Bayesian inference
Posterior mean
R
θ̂ = E[θ|y ] = θP(θ|y )dθ
Proposition 1.3
Let Y = (Y1 , ..., Yn )|θ a conditional iid sample, and θ a random variable
on Θ with prior distribution P(θ) and posterior distribution
P(θ|y ) := P(θ|Y = y ). Let θ̂ the posterior mean. The following
properties are true
• Z
θ̂ = arg min
∗
(θ − θ∗ )2 P(θ|y )dθ
θ ∈Θ
12 / 25
1 Introduction to Bayesian inference
1.4 Bayesian inference
Posterior median
θ̂Med =
1
2 (inf {θ∗ : P(θ ≤ θ∗ |y ) ≥ 1/2} + sup {θ∗ : P(θ ≥ θ∗ |y )dθ ≥ 1/2})
Proposition 1.4
Let Y = (Y1 , ..., Yn )|θ a conditional iid sample, and θ a random variable
on Θ with prior distribution P(θ) and posterior distribution
P(θ|y ) := P(θ|Y = y ). Let θ̂Med the posterior median. The following
properties are true
• Z
θ̂Med = arg min
∗
|θ − θ∗ |P(θ|y )dθ
θ ∈Θ
13 / 25
1 Introduction to Bayesian inference
1.4 Bayesian inference
Posterior variance
Proposition 1.5
Let Y = (Y1 , ..., Yn )|θ a conditional iid sample, and θ a random variable
on Θ with prior distribution P(θ) and posterior distribution
P(θ|y ) := P(θ|Y = y ). Let θ̂ the posterior mean and Var(θ|y ) the
posterior variance. The following properties are true
•
Var(θ|y ) = E[θ2 |y ] − θ̂2
•
Var(θ) = E[Var(θ|y )] + Var(θ̂)
(The posterior variance is on average smaller than the prior variance)
14 / 25
1 Introduction to Bayesian inference
1.4 Bayesian inference
Credible sets
Definition 1.6
Let Y = (Y1 , ..., Yn )|θ a conditional iid sample, and θ a random variable
on Θ with prior distribution P(θ) and posterior distribution
P(θ|y ) := P(θ|Y = y ). The set Ĉ ⊂ Θ is a (1 − α)-credible set if it
verifies the following:
P(θ ∈ Ĉ |y ) ≥ (1 − α).
15 / 25
1 Introduction to Bayesian inference
1.4 Bayesian inference
Credible sets
Two special types of credible set
• Highest posterior density set ĈHPD :
for all θ ∈ ĈHPD and θ0 ∈
/ ĈHPD , we have P(θ|y ) ≥ P(θ0 |y )
16 / 25
1 Introduction to Bayesian inference
1.4 Bayesian inference
17 / 25
1 Introduction to Bayesian inference
1.4 Bayesian inference
Bayes factor
Definition 1.7
Let Y = (Y1 , ..., Yn )|θi a conditional iid sample, and θ1 , θ2 two random
variables on Θ1 , Θ2 respectively with prior distributions P1 (θ1 ) and
P2 (θ2 ). The Bayes factor B12 is the marginal likelihood ratio
R
L(θ1 |Y = y )P1 (θ1 )dθ1
B12 = R .
L(θ2 |Y = y )P2 (θ2 )dθ2
18 / 25
1 Introduction to Bayesian inference
1.4 Bayesian inference
19 / 25
1 Introduction to Bayesian inference
1.4 Bayesian inference
19 / 25
1 Introduction to Bayesian inference
1.5 Choice of prior
20 / 25
1 Introduction to Bayesian inference
1.5 Choice of prior
... but choosing a prior without thinking of the posterior might lead to
computational intractability of the posterior
20 / 25
1 Introduction to Bayesian inference
1.5 Choice of prior
Conjugate priors
Definition 1.8
Let P a family of probability distributions. Let Y = (Y1 , ..., Yn )|θ a
conditional iid sample, and θ a random variable on Θ with prior
distribution P(θ) and posterior distribution P(θ|y ) := P(θ|Y = y ). We
say that the prior and posterior distributions are conjugate distributions
of P for the likelihood L(θ|Y = y ) if P(θ), P(θ|y ) ∈ P. Moreover, we
say call the prior P(θ) a conjugate prior.
21 / 25
1 Introduction to Bayesian inference
1.5 Choice of prior
Exponential family
Definition 1.9
A family of probability distribution defined by its likelihood L(θ|Y = y )
depending on a parameter θ is called a k-dimensional exponential family
if there exist functions c, h, Qj and Vj such that
X k
L(θ|Y = y ) = c(θ)h(y ) exp Qj (θ)Vj (y ) .
j=1
22 / 25
1 Introduction to Bayesian inference
1.5 Choice of prior
Exponential family
Definition 1.9
A family of probability distribution defined by its likelihood L(θ|Y = y )
depending on a parameter θ is called a k-dimensional exponential family
if there exist functions c, h, Qj and Vj such that
X k
L(θ|Y = y ) = c(θ)h(y ) exp Qj (θ)Vj (y ) .
j=1
22 / 25
1 Introduction to Bayesian inference
1.5 Choice of prior
Proposition 1.10
Let a k-dimensional exponential family defined by its likelihood
L(θ|Y = y ). All distributions of the family
Xk
Pα,β = P(θ) ∝ c(θ)β exp Qj (θ)αj
j=1
23 / 25
1 Introduction to Bayesian inference
1.5 Choice of prior
24 / 25
1 Introduction to Bayesian inference
1.5 Choice of prior
Definition 1.11
Let Y = (Y1 , ..., Yn )|θ a conditional iid sample with likelihood
L(θ|Y = y ). The non-informative Jeffreys prior is the prior verifying
p
PJ (θ) ∝ |I (θ)|,
∂ 2 log(L(θ|Y1 = y1 ))
∂ log(L(θ|Y1 = y1 ))
I (θ) = Varθ = Eθ .
∂θ ∂θ2
24 / 25
1 Introduction to Bayesian inference
1.5 Choice of prior
Definition 1.11
Let Y = (Y1 , ..., Yn )|θ a conditional iid sample with likelihood
L(θ|Y = y ). The non-informative Jeffreys prior is the prior verifying
p
PJ (θ) ∝ |I (θ)|,
∂ 2 log(L(θ|Y1 = y1 ))
∂ log(L(θ|Y1 = y1 ))
I (θ) = Varθ = Eθ .
∂θ ∂θ2
Careful:
The Fisher information uses the log-likelihood of one observation
24 / 25
1 Introduction to Bayesian inference
1.5 Choice of prior
dθ
PJ (φ) = PJ (θ)
dφ
• Most Jeffreys priors are improper (i.e. they cannot be probability
R
distribution because PJ (θ)dθ is not well-defined)
25 / 25