0% found this document useful (0 votes)

10 views

BT_Wk3_LectureNotes(2)

Chapter 3 discusses Bayesian theory, focusing on specifying hyperparameters for prior distributions, including Beta, Gamma, and Normal priors. It explains how to derive conjugate posteriors for parameters of normal distributions and provides examples of calculating posterior distributions for unknown means with known variances. The chapter emphasizes the importance of prior specifications and their impact on posterior estimates in Bayesian analysis.

Uploaded by

recruitment.sgschool

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

BT_Wk3_LectureNotes(2)

Uploaded by

recruitment.sgschool

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Chapter 3

Week 3

L3: Bayesian Theory—Priors, Part 2

3.1 Specifying hyperparameters

Hyperparameters are sometimes calculated on the basis of subjective specifications of summary measures for
the prior distribution. One common procedure is to equate a priori guesses of expected values, variances,
or coefficients of variances to the algebraic formulas for those values in the prior distribution1 . Another is
to specify the probability that θ lies in some range [θL , θU ]; e.g., Pr(3 ≤ θ ≤ 9) = 0.8.

Beta prior for Binomial. For example, a Beta(α, β) prior will be used for a binomial datap distribution,
Binomial(n, θ), and the desired expected value of θ is 0.8 with a coefficient of variation of 0.25 ( V [θ]/E[θ]).
This implies a desired variance of (0.25 ∗ 0.8)2 = 0.04. One can calculate α and β by solving the following
system of two equations with two unknowns2 .
α
E[θ] =
α+β
αβ
V [θ] =
(α + β)2 (α + β + 1)

Gamma prior for Poisson. A similar exercise can be carried out when specifying a Gamma(α, β) prior
for a Poisson data distribution, Poisson(θ). Desired characteristics
p of the prior distribution are that the
expected value of θ is 10 with a coefficient of variation of 0.30 ( V [θ]/E[θ]). One can calculate α and β
using the following3 :
α
E[θ] =
β
α
V [θ] = 2
β
1 This is referred to moment matching in Bayesian Models: A Statistical Primer for Ecologists, by Hobbs and Hooten (2015).

I really like this book and note that it is written for ecologists rather than statisticians. It is available online through the
University Library webpage.
2 Try this out: given E[θ]=0.8 and CV =0.25, show that α=2.4 and β=0.6.
3 Given E[θ]=12 and CV =0.3, show that α=11.11111 and β=0.9259259.

40
Normal prior for Normal µ. Instead of specifying moment-related measures of parameters, one might
specify desired quantiles. For example, the data distribution is Normal(θ, σ 2 ), where σ 2 is known, and a
normal distribution will be used as the prior for θ, i.e., θ ∼ Normal(µo , σo2 ). Specifying a 95% credible
interval, the 2.5th and 97.5th percentiles for the prior should be 15 and 30. One can calculate µ0 and σ02
using the following equations4 :

15 = µo − 1.960 ∗ σo
30 = µo + 1.960 ∗ σo

In general, letting zp denote the standard normal, Normal(0,1), quantile for probability p, and yp denote the
normal quantile for probability p for Normal(µo , σo2 ):

yp1 = µo + zp1 ∗ σo
yp2 = µo + zp2 ∗ σo

where p1 < p2 (and yp1 < yp2 ).

4 With 2.5th and 97.5th percentiles equal to 15 and 30, µo =22.5; σo = 3.826531.

41
3.2 Normal Distribution Priors

Given the ubiquity and utility of the normal distribution, it is useful to thoroughly explore the posterior for
the parameters of that distribution for various priors. We begin with conjugate priors for just one parameter
of a univariate normal distribution, here denoted Normal(µ, σ 2 ), either µ or σ 2 . Later we’ll examine the
case of specifying joint prior distributions for µ and σ 2 .
To derive the conjugate posteriors, there is a fair amount of algebraic manipulation involved. It is worth
understanding and being able to reproduce the following for yourself as such techniques are useful in other
circumstances.
In all cases considered, the data distribution is Normal(µ, σ 2 ) from which there are n independent and
identically distributed (iid) observations, y = (y1 , y2 , . . ., yn ). The joint probability density function is
n Pn
(yi − µ)2

Y 1 1 − n
f (y|µ, σ 2 ) = √ exp − 2 (yi − µ)2 = 2πσ 2 2 exp − i=1 2 (3.1)
i=1 2πσ 2 2σ 2σ

Remarks.

• Re-expressing the Normal pdf. A useful re-expression of the normal pdf is the following.
Pn 2
n(ȳ − µ)2

i=1 (yi − ȳ)
2
n
2 −2
f (y|µ, σ ) = 2πσ exp − exp − (3.2)
2σ 2 2σ 2

Check the validity of this re-expression. (Hint: rewrite (yi − µ)2 as ([yi − ȳ] + [ȳ − µ])2 .)

• Sufficient statistics. Suppose that σ 2 is known. Note the only term in eq’n 3.2 containing µ includes
ȳ, thus ȳ is the relevant data for inference about µ. We say that ȳ is a sufficient statistic for µ, as this
statistic contains all the information in the data that is useful for estimating the unknown parameter
(here it is µ).
Pn 2
i=1 (yi −y)
Now suppose that σ 2 is also unknown, letting s2 = n−1 , then

n(ȳ − µ)2

2
n
2 −2 (n − 1) 2
f (y|µ, σ ) = 2πσ exp − s exp − (3.3)
2σ 2 2σ 2

and (y, s2 ) is the joint sufficient statistic for (µ, σ 2 ).

• Precision. Instead of writing the normal pdf in terms of µ and σ 2 , it is sometimes written in terms of
µ and τ , whereτ is called the Precision, and is the inverse of the variance, τ = 1/σ 2 . Symbolically, y
∼ Normal µ, τ1 , and mathematically
√
τ (y − µ)2

τ
f (y|µ, τ ) = √ exp − (3.4)
2π 2

Some software packages for Bayesian inference, e.g. JAGS and WinBUGS, specify the normal distri-
bution in terms of the precision τ .
The term precision has an intuitive interpretation, if a random variable is more precise, it is less
variable, i.e., as τ increases, σ 2 decreases.

42
3.2.1 Normal distribution: unknown µ and known σ 2

While assuming that σ 2 is known is usually not realistic, the methods used for inference for µ are helpful
building blocks for more realistic situations.
To begin we write the likelihood for µ, given that σ 2 is known, and examine the portion of the pdf in Eq’n
(3.2) that involves µ alone:

n(ȳ − µ)2 (µ − ȳ)2

f (y|µ, σ 2 ) ≡ L(µ|y, σ 2 ) ∝ exp − = exp − (3.5)
2σ 2 2σ 2 /n

Note that the last expression in (3.5) can be seen as the kernel of a Normal distribution. Thus the conjugate
distribution for µ (when σ 2 , or τ , is known) is the Normal distribution:

Prior for µ: µ ∼ Normal µ0 , σ02

where µ0 and σ02 are the hyperparameters of the prior. An alternative formulation for the prior (see Reich
and Ghosh, p 47), which leads to a tidier posterior distribution, is to write σ02 = σ 2 /m, where m is some
positive number. Of course if one specifies σ02 first, then m is σ 2 /σ02 .

σ2

Prior for µ: µ ∼ Normal µ0 , (3.6)
m

The posterior distribution for µ given a normal prior is then:

Posterior: p(µ|y, σ 2 ) ∝ π(µ)p(y|µ)

Pn 2
(µ − µ0 )2

i=1 (yi − µ)
1 n
2 −2
= (2πσ 2 /m)− 2 exp − ∗ 2πσ exp −
2σ 2 /m 2σ 2
m(µ − µ0 )2 n(ȳ − µ)2

∝ exp − 2
exp −
2σ 2σ 2
mµ2 − 2mµ0 µ + nµ2 − 2nȳµ

∝ exp −
2σ 2
2

(m + n)µ − 2(mµ0 + ny)µ
= exp −
2σ 2
" mµ0 +ny 2
#
(µ − m+n )
∝ exp − (3.7)
2σ 2 /(m + n)

where eq’n 3.7 is the kernel for a normal distribution, in other words:

σ2

2 mµ0 + nȳ
Posterior for µ|y, σ : Normal , (3.8)
m+n m+n

The posterior mean for µ is thus a weighted combination of the prior mean, µ0 , and the sample mean, ȳ:
mµ0 + nȳ m n
E[µ|y, σ 2 ] = = µ0 + ȳ = (1 − w)µ0 + wȳ
m+n m+n m+n
n
where w= m+n .

43
Comments.

• As n increases, the posterior mean is dominated by the sample mean:

2 m n
lim E[µ|y, σ ] = lim µ0 + ȳ = y
n→∞ n→∞ m + n m+n

And the posterior becomes concentrated at y

σ2
lim V [µ|y, σ 2 ] = lim =0
n→∞ n→∞ m + n

• Conversely, as m increases, σ 2 /m decreases, and the posterior is dominated by the prior.

mµ0 + nȳ
lim E[µ|y, σ 2 ] = lim = µ0
m→∞ m→∞ m+n
σ2
lim V [µ|y, σ 2 ] = lim =0
m→∞ m→∞ m + n

Thus a point mass at µ0 as σ02 goes to 0.

• Conversely, as m decreases, σ 2 /m increases (a “vaguer” prior), the influence of the prior decreases:
mµ0 + nȳ
lim E[µ|y, σ 2 ] = lim =y
m→0 m→0 m+n
σ2 σ2
lim V [µ|y, σ 2 ] = lim =
m→0 m→0 m + n n

Unknown µ and known τ

If express the sampling distribution for y in terms of precision, τ = 1/σ 2 , then the prior for µ (given known
τ ) is:

1
µ ∼ Normal µ0 , (3.9)
τm

And the posterior for µ:

mµ0 + nȳ 1
Posterior for µ|y, σ 2 : Normal , (3.10)
m + n τ (m + n)

Posterior predictive distribution

The second equality results from y new being conditionally independent on yold given θ.

44
In this normal distribution case with known σ 2 , letting µ1 and σ12 denote the posterior mean and variance
for µ:
(y new − µ)2 (µ − µ1 )2
Z Z
1 1
p(y new |y) = p(y new |µ)p(µ|yold )dµ = √ exp − exp − dµ
2σ 2 2σ12
p
2πσ 2 2πσ12
(y new − µ1 )2

1
=p exp − (3.12)
2π(σ12 + σ 2 ) 2(σ12 + σ 2 )

Thus, y new |yold ∼ Normal µ1 , σ12 + σ 2 5 . Note that the variance of y new is the sum of the variance for the
distribution of y if µ were known, namely, σ 2 , and the variance due to the uncertainty in µ.

Example: inference for µ

Assume that the amount of money spent during September on food by individual students, y, is Normally
distributed with an unknown mean µ and known standard deviation, σ, of £50, or a precision, τ , of 1/502
= 0.0004. You would like to estimate µ and will take a simple random sample of n=20 students.
2
Before doing so, you specify a Normal(µ0 , 50
m ) prior for µ. You think that the average is around µ0 =
£200 and set µ0 = 200. Further, you guess that the 25th and 75th percentiles are 150 to 250. Given that
the corresponding standard normal percentiles are -0.6744898 and 0.6744898, you can solve for m using the
following equation:
150 − 200
−0.6744898 = √
50/ m
Thus m = 0.67448982 = 0.4549. Then the prior for µ:
502

µ ∼ Normal 200,
0.4549

The simple random sample of n=20 was then taken and the average was £165.
The mean and variance for the posterior distribution for µ are based on (3.8):
0.4549 ∗ 200 + 20 ∗ 165
E[µ|ȳ] = = 165.778
0.4549 + 20
502
V [µ|ȳ] = = 122.2199 = 11.052
0.4549 + 20
and the posterior for µ is
µ|ȳ ∼ Normal 165.778, 11.052

A weight of 0.4549/20.4549, about 2%, was given to the prior mean√ (and 98% to the sample mean) and the
posterior standard deviation for µ went from 74 in the prior (50/ 0.4549)) to 11.05 in the posterior.
Figure 3.1 shows the prior and posterior distributions for µ.

Posterior predictive distributions. If a student was randomly sampled, the prior and posterior predic-
tive distributions for that student’s food expenditures are:
Prior y new ∼ Normal 200, 502

Posterior y new |ȳ old = 165 ∼ Normal 165.778, 11.052 + 502 = 51.22

5 For details on the derivation see Lecture 3A: Supplement Posterior Predictive For Normal Dist’n on Learn.

45
Figure 3.1: Prior and posterior distribution for µ, average amount spent on food during September, assuming
Normal(µ, 502 ) distribution.

Prior
Posterior

0.03
0.02
0.01
0.00

0 100 200 300 400

Comparing the posterior to the prior, the mean has decreased by around £34 while the prediction variance
of 51.22 is slightly larger than 502 , with the additional variance due to the uncertainty in the value of µ.

46
3.2.2 Normal distribution: unknown τ or σ 2 and known µ

Again this is usually not a realistic situation, but the results are useful for more complex and realistic models.

Conjugate prior for τ

Before presenting results for σ 2 , we begin with the precision, τ = 1/σ 2 .

We examine the likelihood for τ , keeping only terms that involve τ (see Eq’n 3.1).
Pn
τ i=1 (yi − µ)2 τ z2

n n
f (y|µ, τ ) ≡ L(τ |y, µ) ∝ (τ ) 2 exp − = (τ ) 2 exp − (3.13)
2 2

where to reduce notation

n
X
z2 = (yi − µ)2
i=1

The likelihood (3.13) is the kernel of a Gamma distribution. Thus the conjugate prior for τ when µ is known
is

τ ∼ Gamma (α, β) (3.14)

and the posterior for τ :

z2τ

n
α−1
Posterior: τ |y, µ ∝ (τ ) exp (−βτ ) (τ ) exp −
2
2
= (τ )(α+ 2 −1) exp −(β + z 2 /2)τ
n
(3.15)
⇒ (3.16)
z2

n
τ |y, µ ∼ Gamma α + , β + (3.17)
2 2

47
Comments.

• Recall that if θ ∼ Gamma(α, β), then E[θ]=α/β and V [θ] = α/β 2 .

• Examining the posterior mean:

α + n/2 α n/2 2α n
E[τ |y] = = + = + (3.18)
β + z 2 /2 β + z 2 /2 β + z 2 /2 2β + z 2 2β + z 2
2α 2 n
Referring to 2β+z 2 , as n increases z increases, thus the term goes to zero. Referring to 2β+z 2 goes to
Pn 1
n/ i=1 (yi − µ)2 = σ̂2 , where σ̂2 is the maximum likelihood estimate (mle) for σ 2 . Thus as n increases
E[τ |y] approaches 1/σ̂ 2 or τ̂ , the mle for τ .

• One approach for selecting the hyperparameters for the Gamma dist’n prior is to specify approximate
values for E[τ ] and V [τ ] and then solve for α and β. (Admittedly, specifying a value for V [τ ] might
be a little involved.)
• An Aside: Re-expression as a χ2 distribution. The χ2 distribution with ν degrees of freedom has the
following pdf (see Appendix A of King and Ross’s notes).

2−ν/2 ν −1

θ
p(θ) = θ 2 exp −
Γ(ν/2) 2

Note that this is the same pdf as for Gamma ν2 , 12 . Focusing on the kernel of the posterior for τ in

(3.15):

p(τ |y, µ) ∝ (τ )(α+ 2 −1) exp −(β + z 2 /2)τ

τ (2β + z 2 )

2 2α+n −1 2α+n −1
∝ (2β + z ) 2 τ 2 exp − (3.19)
2
2

2 2α+n −1 τ (2β + z )
= (τ (2β + z )) 2 exp − (3.20)
2
2α+n
where the multiplier (2β + z 2 ) 2 −1 in (3.19) is a constant that does not affect the kernel. Then it
can be seen that (3.20) is the kernel of a χ2 distribution for τ (2β + z 2 ) with 2α + n degrees of freedom:

τ (2β + z 2 ) ∼ χ22α+n (3.21)

This can also be written as what is called a scaled χ2 distribution6 :

1
τ∼ χ2 (3.22)
2β + z 2 2α+n

Conjugate prior for σ 2

Again examine the likelihood for σ 2 keeping only terms that involve σ 2 .
Pn 2
z2

i=1 (yi − µ)
2 2
n
2 −2
n
2 −2
f (y|µ, σ ) ≡ L(σ |y, µ) ∝ σ exp − = σ exp − 2 (3.23)
2σ 2 2σ
6 If a random variable θ multiplied by a constant c follows a distribution D, cθ ∼ D, then θ follows a scaled distribution

(1/c)D.

48
The term on the right-hand side of Eq’n (3.23) is the kernel of an Inverse Gamma distribution. The pdf for
an Inverse Gamma with parameters α and β (see Appendix A of King and Ross’s notes):

βα

β
p(θ) = (θ)−(α+1) exp − (3.24)
Γ(α) θ

Thus Inverse Gamma(α, β) is the conjugate prior for σ 2 when µ is known.

Then the posterior for σ 2 :

z2

2 2 −(α+1) β n
2 −2
Posterior: σ |y, µ ∝ (σ ) exp − 2 σ exp − 2
σ 2σ
2

β + z /2
= (σ 2 )−(α+ 2 +1) exp −
n
(3.25)
σ2

where eq’n 3.25 is the kernel for the Inverse Gamma:

z2

n
Posterior for σ 2 |µ: Inverse Gamma α + , β + (3.26)
2 2

Comments.

• If θ ∼ Inverse Gamma(α, β), E[θ] = β/(α − 1) if α > 1. and V [θ] is β 2 /((α − 1)2 (α − 2)) if α > 2.
Thus, values for the hyperparameters, α and β, can be deduced given prior notions about the mean
and the variance of σ 2 .
• Relatively small values for α and β, e.g., 0.01 or 0.001, are often used in practice. Such choices are not
universally considered a good idea. However, examining the sensitivity of the posterior to choices of α
and β is good practice.
• As for τ , the posterior for σ 2 can be written as a scaled negative χ2 distribution (See Appendix A of
King and Ross).
• If x ∼ Gamma(α, β), then y = 1/x ∼ Inverse Gamma(α, β)7 .

Example: inference for σ 2

The construction timber called 2x4 has cross-piece dimensions of 1.5 inches by 3.5 inches on average. Let
Y be the longer dimension and assume that Y ∼ Normal(µ, σ 2 ). The higher the quality control the closer
the value of Y should be to 3.5 inches, in other words, σ 2 should be relatively small. A building company is
considering purchasing timber from a new supplier but before doing so would like to see just how precisely
the 2 by 4’s are cut. They will take a random sample of n=10 2x4s and measure the length of the longer
side. They are comfortable assuming that the average length µ is 3.5, thus the data distribution is Y ∼
Normal(3.5, σ 2 ).
Before taking the sample, they decide to use a Inverse Gamma(α, β) prior distribution for σ 2 , and need to
specify the hyperparameters. They would like to be cautious and will assume a priori that the average value
of σ 2 is 0.05 with a variance of 0.02. This results in Inverse Gamma(2.125, 0.05625) (check this). A simple
random sample of n=10 2x4s yielded the following measurements:

3.527 3.387 3.466 3.382 3.612 3.680 3.471 3.475 3.603 3.680
7 This can be shown using the so-called change of variable theorem.

49
P10
where i=1 (yi − 3.5)2 = 0.117997. The posterior distribution for σ 2 |y:

10 0.117997
σ 2 |y ∼ Inverse Gamma 2.125 + , 0.05624 + = Inverse Gamma (7.125, 0.1152385)
2 2
0.11523852
Thus the posterior mean for σ 2 is 0.1152385
7.125−1 = 0.0188 and posterior variance is (7.125−1)2 ∗(7.125−2) = 6.908e-05.
Figure 3.2 plots the prior and posterior distributions for σ 2 .

Figure 3.2: Prior and posterior distribution for σ 2 , variance of the longer side of 2x4s, assuming sides are
Normal(3.5, σ 2 ).

Prior
Posterior
60
50
40
30
20
10
0

0.0 0.1 0.2 0.3 0.4

σ2

R does not have “built-in” Inverse Gamma distribution functions which could be used to calculate the
quantiles of the posterior distribution for σ 2 . However, we can use the quantile function for the Gamma
distribution in R, namely qgamma which will yield quantiles for τ = 1/σ 2 , and invert the results. The
posterior distribution for τ is Γ(7.125, 0.1152385) and the 2.5 and 97.5 percentiles can be found with
qgamma(c(0.025,0.975),shape=7.125, rate=0.1152385) = (25.10391, 114.81646). Thus
1
0.95 = Pr(25.104 ≤ τ ≤ 114.8) = Pr(25.104 ≤ 2 ≤ 114.8)
σ
1 2 1
= Pr .00871 ≤ σ 2 ≤ 0.03983

= Pr ≤σ ≤
114.8 25.104
Thus a 95% credible interval for σ 2 is (0.00871, 0.03983).

3.3 Bayes Theorem for multiple parameters

Often there will be q > 1 parameters:

Θ = {θ1 , θ2 , . . . , θq }
Given data y, Bayes theorem has the same form
Pr(y|Θ) Pr(Θ)
Pr(Θ|y) = ∝ Pr(y|Θ) Pr(Θ)
Pr(y)

50
3.3.1 Comments
• The posterior distribution for multiple parameters can be high dimensional, e.g., q parameters = q
dimensions, and summarising a high dimensional space can be complicated.
• Often one dimensional summaries, namely marginal posterior distributions, are examined instead:
Z
p(θi |y) = p(Θ|y)dθ1 dθ2 . . . dθi−1 dθi+1 . . . dθq

This is the posterior distribution for θi found by “averaging” over all the other parameters, and thus
“projecting” (collapsing) the q-dimensional posterior distribution on a single dimension. One can then
examine a single posterior density plot, for example, calculate posterior mean, variance, and credible
interval for θi .
• However, one dimensional summaries can fail to capture important features of the joint posterior.
• Two-dimensional graphical summaries are useful: contour plots or perspective plots, or in the case of
samples from the posterior distribution, pairwise scatterplots.

• Two-dimensional numerical summaries include correlations or covariances.

• With more than two-dimensions, however, detecting patterns, if they exist, can be more difficult and
complex.

3.3.2 Normal Dist’n: Unknown µ and σ 2

Without deriving any results, we discuss two approaches to the situation where both µ and σ 2 are unknown.

Joint prior constructed with independent marginal priors

One approach to arriving at a joint prior for both µ and σ 2 is to specify independent informative marginal
priors for µ and σ 2 and multiply the two to yield a joint prior.
The joint posterior distribution:
− n2 Pn
(y −µ)2

π(µ)π(σ 2 ) 2πσ 2 exp − i=12σ2i
p(µ, σ 2 |y) = R R Pn (3.27)
−n (y −µ)2
π(µ)π(σ 2 ) (2πσ 2 ) 2
exp − i=12σ2i dµ dσ 2

In general (3.27) will not be something that can be calculated analytically depending on the choices of π(µ)
and π(σ 2 ), because of the integral in the denominator. Numerical or simulation-based integration methods,
which we will discuss later, can be used to yield approximate results.
Consider the 2x4 timber example, but now assume that both µ and σ 2 are unknown. Suppose one specified
that the prior for µ is Lognormal(µ0 , σ02 ) and the prior for σ 2 is Gamma(α, β). The denominator of (3.27)
in this case:
Z ∞Z ∞ Pn 2
(ln(µ) − µ0 )2
α
1 1 β i=1 (yi − µ)
2 α−1 2
n
2 −2
p
2 µ
exp − 2 (σ ) exp(−βσ ) 2πσ exp − 2
dµ dσ 2
0 −∞ 2πσ0
2σ 0 Γ(α) 2σ

which is “probably” not analytically tractable (no closed form solution, I’m guessing as I’ve not tried to
solve it).

51
Comment. While the joint prior distribution for µ and σ 2 was constructed by multiplying two independent
marginal pdfs, the joint posterior distribution is not the product of two independent marginal pdfs. This
is common: joint priors constructed as products of independent distributions do not usually yield a joint
posterior that can be written as products of independent distributions.

Conjugate prior for µ and σ 2

There is a joint conjugate prior density for µ and σ 2 . It is defined in terms of a marginal prior density for σ 2 ,
which is an Inverse Gamma, and then a conditional prior density for µ given σ 2 , which is a Normal where
the variance hyperparameter is a function of σ 2 . Namely,

σ2

2
µ, σ ∼ Inverse Gamma(α, β) × N µ0 , (3.28)
κ

There are 4 hyperparameters, α, β, µ0 , and κ. As can be seen, conditional on the value of σ 2 , prior uncertainty
about the variance of µ around the expected value µ0 increases as σ 2 increases and as κ decreases. Thus
decreasing the values for α and β and the values for κ leads to a more dispersed prior for µ.
The resulting joint posterior density is then the product of an inverse gamma density and a normal density
(conditional on σ 2 ):

(n − 1)s2 κn(ȳ − µ0 )2 κµ0 + nȳ σ 2

2 n
µ, σ |y ∼ Inverse Gamma α + , β + + × Normal , (3.29)
2 2 2(κ + n) κ+n κ+n

The marginal density for σ 2 is Inverse Gamma, while the marginal density for µ conditional on σ 2 is a
students’ t distribution (see “Applied Bayesian Statistics”, 2013, Cowles, M.K.).

3.3.3 Example: Multiple Parameter Inference

As a demonstration of multiple parameter inference, we fit a Bayesian linear regression of the lengths of
dugongs8 (sea cows, a type of marine mammal; see Figure 3.3) against their age in years. There were n=27
dugongs measured and the sampling model was the following.
iid
Lengthi ∼ Normal β0 + β1 ln(age), σ 2

The relationship is shown in Figure 3.4.

Dugong length vs log(age)

●
●

●●
2.6

● ● ●

●
● ● ●
●
2.4

●●
●

● ● ●
2.2

●
●

●
2.0

●
●
1.8

●
●

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5

Log(Age)

Figure 3.3: Dugong (image from Wikipedia)

Figure 3.4: Dugong lengths plotted against log(age).
8 Example motivated from Bayesian Methods for Data Analysis, 2009. Carlin and Louis.

52

There are three parameters, Θ = β0 , β1 , σ 2 . The following independent marginal prior distributions were
chosen to construct a joint prior distribution.

β0 , β1 ∼ Uniform (−50, 50) , σ 2 ∼ Uniform (0.01, 20)

The resulting (estimated) posterior distribution was found using JAGS (code in the Appendix). The posterior
means and standard deviation are shown below, along with the maximum likelihood estimates (except σ̂ 2 is
bias-correction of the mle).

Bayesian Frequentist
Mean SD mle std error
β0 1.76098 0.053089 1.762 0.0424
β1 0.27757 0.023531 0.277 0.0188
σ2 0.01277 0.002719 0.0081

Figure 3.5 shows the marginal posterior distributions for the three parameters as well as a scatterplot of the
samples of β0 and β1 . The scatterplot shows that there is a negative relationship between β0 and β1 .

53
Figure 3.5: Marginal posterior distributions for β0 , β1 , σ 2 , and the joint distribution of β0 and β1

β0 β1

15
6

10
4

5
2
0

1.6 1.7 1.8 1.9 0.20 0.25 0.30 0.35

β0 β1

σ2 Joint dist β0 β1

● ●
250

0.35

● ●●●●●●
●●
●
●
●●●
●●
●●
●
●●
●●
●●
●
●●
●●●
●
150

●●●
●●
●●
●●
●●
●
●
●●
●
●
●●
●
●●
●
●●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●●
●●
●
●●
● ●
●●
●●
● ●●●
●●
●●
●
●●
●
●
●●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●●
●●
●●
●
●● ●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●●
●
●
●●
●
●
●●
●●
●
●●
●
●●●
● ●●●
●●
●●●●
●●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●
●●●●
●●●
●●● ●
●●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●●
●
●●
●
●●
●
●●
●
●
●●
●
●
●
●●
●
●● ●
●●
●●
●●●
●
●
●●●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●●
●●
●●
●
●●
●
●●●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●●
●
●●
●
●
●●
●●●●●
●
●
●●●
●
●●
●
●●
●●
●
●●
●
●●
●
●●
●
●
●
● ●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●●
●●
●
●●●● ●
●●●
●●
●
●●
●●
●
●●
●
●●
●●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●●
●●
●
●●
●
●
●●
●
●
●●
●
●●
●
●
●●●
●
●●●
●●●●
● ●●
●●●
● ●●
●
● ●
●●
●●
●●●●● ●
0.25

● ●●
●●
●●●
●
●●
●
●●
●
●●●
●
●
●●●
●
●
●●●
● ●
●
●●●
● ●● ● ●
●● ●● ● ●●
●●●
●
●●
●●●●
●●●●
●●
●●●
● ● ●●
●
●●●
●●●
●●
●
●
●●
●●●
● ●
●
●●
●● ●●
● ●●●
● ●●●●●●
0.20

● ●
0

0.010 0.020 0.030 1.60 1.70 1.80 1.90

σ2 β0

54
3.4 R Code

3.4.1 Example: Posterior µ with known σ 2

#-- Posterior mu given known sigma
sigma <- 50; m <- 0.6744898^2; n <- 20; ybar <- 165; mu.0<-200
w <- n/(m+n)
post.E <- w*ybar+ (1-w)*mu.0;
post.V <- sigma^2/(m+n)
cat("w=",w,"1-w=",1-w,"post.E=",post.E, "post.V=",post.V,"post.sd=",sqrt(post.V),"\n")

#--plot prior and posterior

x.seq <- 10:450
y.prior <- dnorm(x.seq,mean=mu.0,sd=sigma/sqrt(m))
y.post <- dnorm(x.seq,mean=post.E,sd=sqrt(post.V))
my.ylim <- range(c(y.prior,y.post))
plot(x.seq,y.prior,xlab=expression(mu),ylab="",ylim=my.ylim,type="l")
lines(x.seq,y.post,lty=2,col=2)
legend("topright",legend=c("Prior","Posterior"),col=1:2,lty=1:2)

3.4.2 Example: Posterior σ 2 with known µ

library(MCMCpack) #this has dinvgamma() function

inv.gamma.param.calc <- function(mu,V) {

alpha <- 2+mu^2/V
beta <- mu*(alpha-1)
out <- list(alpha=alpha,beta=beta)
return(out)
}
set.seed(742)
n <- 10
y <- rnorm(n=n,mean=3.5,sd=sqrt(0.01))
y <- round(y,3)
sse <- sum((y-3.5)^2)
cat("sse=",sse,"\n")

temp <- inv.gamma.param.calc(0.05,0.02)

prior.alpha <- temp$alpha
prior.beta <- temp$beta

post.alpha <- prior.alpha+n/2

post.beta <- prior.beta + sse/2

post.mean <- post.beta/(post.alpha-1)

post.var <- post.beta^2/((post.alpha-1)^2*(post.alpha-2))

cat("Prior alpha and beta=",prior.alpha,prior.beta,"\n")

cat("Post alpha and beta=", post.alpha,post.beta,"\n")
cat("Post mean=",post.mean,"var=",post.var,"\n")

theta.seq <- seq(0.01,0.4,length=100)

prior.density <- dinvgamma(x=theta.seq, shape=prior.alpha, scale = prior.beta)
post.density <- dinvgamma(x=theta.seq, shape=post.alpha, scale = post.beta)
my.ylim <- range(c(prior.density,post.density))
plot(theta.seq,prior.density,type="l",xlab=expression(sigma^2),ylab="",ylim=my.ylim)
lines(theta.seq,post.density,col=2,lty=2)
legend("topright",legend=c("Prior","Posterior"),lty=1:2,col=1:2)

# 95% credible interval

x <- qgamma(c(0.025,0.975),shape=7.125, rate=0.1152385)

55
print(x)
print(1/x)

56
3.4.3 Example: Linear regression with Dugong data
The R code for fitting the Dugong data and the call to JAGS are shown below.

library(rjags)

dugong.data <-
list(age = c( 1.0, 1.5, 1.5, 1.5, 2.5, 4.0, 5.0, 5.0, 7.0,
8.0, 8.5, 9.0, 9.5, 9.5, 10.0, 12.0, 12.0, 13.0,
13.0, 14.5, 15.5, 15.5, 16.5, 17.0, 22.5, 29.0, 31.5),
length = c(1.80, 1.85, 1.87, 1.77, 2.02, 2.27, 2.15, 2.26, 2.47,
2.19, 2.26, 2.40, 2.39, 2.41, 2.50, 2.32, 2.32, 2.43,
2.47, 2.56, 2.65, 2.47, 2.64, 2.56, 2.70, 2.72, 2.57), n = 27)

log.age <- log(dugong.data$age)

plot(dugong.data$length ~ log.age,xlab="Log(Age)",ylab="",main="Dugong length vs log(age)")

# Initial values for running 3 MCMC chains in JAGS

num.chains <- 3
beta0.set <- c(-1,0,1)
beta1.set <- c(-1,0,1)
sigma2.set <- c(3,10,15)
dugong.inits <- list()
for(i in 1:num.chains) {
dugong.inits[[i]] <- list(beta0=beta0.set[i],beta1=beta1.set[i],
sigma2=sigma2.set[i])
}

dugong.model <- "model {

# data that will be read in are age and length and n
#Hyperparameters
beta.low <- -50
tau <- 1/sigma2

#priors
beta0 ~ dunif(-50,50)
beta1 ~ dunif(-50,50)
sigma2 ~ dunif(0.01,20)

#Likelihood
for(i in 1:n) {
logage[i] <- log(age[i])
mu[i] <- beta0 + beta1 * logage[i]
length[i] ~ dnorm(mu[i], tau)
}
}"

set.seed(742)
burnin <- 2000
inference.length <- 10000
dugong.results.initial <- jags.model(file=textConnection(dugong.model),
data=dugong.data, inits=dugong.inits,
n.chains=num.chains)
update(dugong.results.initial, n.iter=burnin)
dugong.results.final <- coda.samples(model=dugong.results.initial,
variable.names=c("beta0","beta1","sigma2"),
n.iter=inference.length,thin=10)
summary(dugong.results.final)

#--- for looking at the entire combined results convert the mcmc.list object to a data frame
dugong.results.df <- as.data.frame(as.matrix(dugong.results.final))
head(dugong.results.df)
par(mfrow=c(2,2),oma=c(0,0,3,0))

57
plot(density(dugong.results.df$beta0),xlab=expression(beta[0]),ylab="",
main=expression(beta[0]))
plot(density(dugong.results.df$beta1),xlab=expression(beta[1]),ylab="",
main=expression(beta[1]))
plot(density(dugong.results.df$sigma2),xlab=expression(sigma^2),ylab="",
main=expression(sigma^2))
plot( dugong.results.df$beta0,dugong.results.df$beta1,xlab=expression(beta[0]),ylab="",
main= expression(paste("Joint dist ",beta[0]," ",beta[1])))
par(mfrow=c(1,1))
#if(plot.it) dev.copy2pdf(file=paste0(output,"L5_F_dugong_posterior_plots.pdf"))

#Frequentist results
dugong.freq.lm <- lm(length ~ log.age,data=dugong.data)
summary(dugong.freq.lm)

Oxford. SILL
100% (1)
Oxford. SILL
23 pages
BT_Wk3_LectureNotes(3)
No ratings yet
BT_Wk3_LectureNotes(3)
16 pages
Lecture 4
No ratings yet
Lecture 4
7 pages
Week 11
No ratings yet
Week 11
11 pages
Lecture 5
No ratings yet
Lecture 5
6 pages
CH 5
No ratings yet
CH 5
45 pages
Stat 111
No ratings yet
Stat 111
7 pages
MCMC Bayes PDF
No ratings yet
MCMC Bayes PDF
27 pages
Bayesian Modelling Tuts-4-9
No ratings yet
Bayesian Modelling Tuts-4-9
6 pages
Lecture4 More Bayes
No ratings yet
Lecture4 More Bayes
24 pages
ln13
No ratings yet
ln13
5 pages
Bayes Gauss
100% (1)
Bayes Gauss
29 pages
Minka - Inferring A Gaussian Distribution
No ratings yet
Minka - Inferring A Gaussian Distribution
15 pages
Slides 1
No ratings yet
Slides 1
73 pages
Introduction To Bayesian Methods: Jessi Cisewski Department of Statistics Yale University
No ratings yet
Introduction To Bayesian Methods: Jessi Cisewski Department of Statistics Yale University
53 pages
Introduction To Bayesian Methods With An Example
No ratings yet
Introduction To Bayesian Methods With An Example
25 pages
Multi Parametric Models
No ratings yet
Multi Parametric Models
5 pages
MAS3301 Bayesian Statistics: M. Farrow School of Mathematics and Statistics Newcastle University Semester 2, 2008-9
No ratings yet
MAS3301 Bayesian Statistics: M. Farrow School of Mathematics and Statistics Newcastle University Semester 2, 2008-9
18 pages
Slides 535 Day 5 SPR 2014
No ratings yet
Slides 535 Day 5 SPR 2014
13 pages
Bayesian Data Analysis - Reading Instructions 2: Chapter 2 - Outline
No ratings yet
Bayesian Data Analysis - Reading Instructions 2: Chapter 2 - Outline
36 pages
Lecture 20 - Bayesian Analysis
No ratings yet
Lecture 20 - Bayesian Analysis
4 pages
Single Parameter Models
No ratings yet
Single Parameter Models
37 pages
i i 2 i 1 2 θ i 2 2 3 2
No ratings yet
i i 2 i 1 2 θ i 2 2 3 2
159 pages
Conjugate Prior
No ratings yet
Conjugate Prior
5 pages
bayesian-inference
No ratings yet
bayesian-inference
18 pages
Ch3 - 2009 Conjugate Families of Distributions
No ratings yet
Ch3 - 2009 Conjugate Families of Distributions
67 pages
Chapter 5. Bayesian Statistics (II)
No ratings yet
Chapter 5. Bayesian Statistics (II)
30 pages
Prior Distribution
No ratings yet
Prior Distribution
14 pages
Bayesian Statistics 01
100% (1)
Bayesian Statistics 01
22 pages
Bayesian-inference-slides-2021
No ratings yet
Bayesian-inference-slides-2021
37 pages
Normal Statistics Estimation
No ratings yet
Normal Statistics Estimation
8 pages
Lec12 13 BayesianInferenceForTheGaussian
No ratings yet
Lec12 13 BayesianInferenceForTheGaussian
57 pages
Covariance Matrix (W Krzanowski)
No ratings yet
Covariance Matrix (W Krzanowski)
5 pages
Bayesian Week2 LectureNotes
No ratings yet
Bayesian Week2 LectureNotes
14 pages
Lecture 2 - 4 Prior
No ratings yet
Lecture 2 - 4 Prior
51 pages
Chapter 2 B
No ratings yet
Chapter 2 B
18 pages
Single Parametric Models
No ratings yet
Single Parametric Models
10 pages
2 Statistical Definitions: 2.1 Probability Density Function
No ratings yet
2 Statistical Definitions: 2.1 Probability Density Function
9 pages
Conjugate Prior
No ratings yet
Conjugate Prior
6 pages
8. Bayesian_Lec_3
No ratings yet
8. Bayesian_Lec_3
24 pages
Chapter 1 B
No ratings yet
Chapter 1 B
35 pages
Description: Package Name: Author: Date
No ratings yet
Description: Package Name: Author: Date
14 pages
IntroBayesTimeSeries1
No ratings yet
IntroBayesTimeSeries1
72 pages
ProblemSet1Sol
No ratings yet
ProblemSet1Sol
7 pages
R300 Advanced Econometrics Methods Lecture Slides
No ratings yet
R300 Advanced Econometrics Methods Lecture Slides
362 pages
Bayesian Analysis of A Stochastic Volatility
No ratings yet
Bayesian Analysis of A Stochastic Volatility
25 pages
Nonparametric Inference Techniques For High-Dimensional Data: Challenges and Solutions
No ratings yet
Nonparametric Inference Techniques For High-Dimensional Data: Challenges and Solutions
16 pages
Lec17 PriorModeling
No ratings yet
Lec17 PriorModeling
37 pages
BaYesian Models Machine Learning 2016
No ratings yet
BaYesian Models Machine Learning 2016
126 pages
DS 630_Lec 5_St
No ratings yet
DS 630_Lec 5_St
15 pages
02_solution_Bayes_example
No ratings yet
02_solution_Bayes_example
2 pages
Lecture 3
No ratings yet
Lecture 3
4 pages
Point Estimation: Definition of Estimators
No ratings yet
Point Estimation: Definition of Estimators
8 pages
Lecture Notes For Probability and Statistics
No ratings yet
Lecture Notes For Probability and Statistics
7 pages
ST903 Week9sol
No ratings yet
ST903 Week9sol
2 pages
Bayesian Linear Model Gory Details
No ratings yet
Bayesian Linear Model Gory Details
9 pages
Notes BMDA PDF
No ratings yet
Notes BMDA PDF
520 pages
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Computer Solved: Nonlinear Differential Equations
From Everand
Computer Solved: Nonlinear Differential Equations
Joe J. Ettl
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
From Everand
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
Andrew Igla
No ratings yet
Arthur
No ratings yet
Arthur
10 pages
Jad 2017 (2) - 46-69 PDF
No ratings yet
Jad 2017 (2) - 46-69 PDF
24 pages
Quiz On Conducting A Consulting Assignment
No ratings yet
Quiz On Conducting A Consulting Assignment
3 pages
Improving Reading Comprehension of First Year Engineering Students: A Quantitative Study at QUEST, Nawabshah, Pakistan
No ratings yet
Improving Reading Comprehension of First Year Engineering Students: A Quantitative Study at QUEST, Nawabshah, Pakistan
9 pages
QP2 IQA Sampling Strategy
No ratings yet
QP2 IQA Sampling Strategy
3 pages
Lind 18e Chap015 PPT
No ratings yet
Lind 18e Chap015 PPT
34 pages
CUR1 Module 4
No ratings yet
CUR1 Module 4
55 pages
Research Methodology Question Paper
0% (1)
Research Methodology Question Paper
7 pages
Mutio Fredrick Muendo Project 3251010004
No ratings yet
Mutio Fredrick Muendo Project 3251010004
39 pages
A Naturalistic Study of Child and Family Screen Media and Mobile Device Use
No ratings yet
A Naturalistic Study of Child and Family Screen Media and Mobile Device Use
10 pages
Survey16 PDF
No ratings yet
Survey16 PDF
2 pages
FINAL-EXAM-GR12-HUMMS
No ratings yet
FINAL-EXAM-GR12-HUMMS
4 pages
Wang Chao T 2011 PDF
No ratings yet
Wang Chao T 2011 PDF
84 pages
Task Performance Quantitative
No ratings yet
Task Performance Quantitative
7 pages
4 Efficient Process Tracing: Analyzing The Causal Mechanisms of European Integration
No ratings yet
4 Efficient Process Tracing: Analyzing The Causal Mechanisms of European Integration
28 pages
Chapter 5-8 Reviewer
No ratings yet
Chapter 5-8 Reviewer
7 pages
INTRO To Prof Prac 3
No ratings yet
INTRO To Prof Prac 3
5 pages
Christopher Lamont - Research Methods in Politics and International Relations-Sage (2015)
No ratings yet
Christopher Lamont - Research Methods in Politics and International Relations-Sage (2015)
6 pages
The Effect of Motivation On Teachers Performance
No ratings yet
The Effect of Motivation On Teachers Performance
20 pages
Finance research
No ratings yet
Finance research
4 pages
CMR Form 2015 Version
100% (1)
CMR Form 2015 Version
5 pages
Athena Institute: Creates Wisdom
No ratings yet
Athena Institute: Creates Wisdom
9 pages
Impact of Time Management Towards Academic Performances of
No ratings yet
Impact of Time Management Towards Academic Performances of
26 pages
Week 6 Sampling Distribution
No ratings yet
Week 6 Sampling Distribution
41 pages
Action Research Format: Schools Division of Zambales
No ratings yet
Action Research Format: Schools Division of Zambales
1 page
Research Work On Electricity Theft in Ghana
No ratings yet
Research Work On Electricity Theft in Ghana
7 pages
A STUDY ON CUSTOMER ATTITUDE TOWARDS COLGATE TOOTHPASTE WITH REFERENCE TO COIMBATORE DISTRICT Ijariie7308 PDF
No ratings yet
A STUDY ON CUSTOMER ATTITUDE TOWARDS COLGATE TOOTHPASTE WITH REFERENCE TO COIMBATORE DISTRICT Ijariie7308 PDF
6 pages
Coursework Formative Assessment Brief: Programme Name: MSC Management
No ratings yet
Coursework Formative Assessment Brief: Programme Name: MSC Management
4 pages
List of Documents To Be Verified During The Visit
No ratings yet
List of Documents To Be Verified During The Visit
2 pages