0% found this document useful (0 votes)
70 views34 pages

IDS22Bayes Applications

This document discusses Bayesian and frequentist approaches to probability. It explains that Bayesian probability represents degrees of belief that are updated with new information, while frequentist probability is based on long-run relative frequencies. It also introduces key Bayesian concepts like priors, posteriors, Bayes' theorem, Bayes factors, and credible intervals. Credible intervals provide the range of most likely values for a parameter based on combining prior beliefs with new data, unlike frequentist confidence intervals.

Uploaded by

T Do
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views34 pages

IDS22Bayes Applications

This document discusses Bayesian and frequentist approaches to probability. It explains that Bayesian probability represents degrees of belief that are updated with new information, while frequentist probability is based on long-run relative frequencies. It also introduces key Bayesian concepts like priors, posteriors, Bayes' theorem, Bayes factors, and credible intervals. Credible intervals provide the range of most likely values for a parameter based on combining prior beliefs with new data, unlike frequentist confidence intervals.

Uploaded by

T Do
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Introduction to Data Science

Bayesian statistics II

Applications of Bayes theorem


The concept of probability
in the two frameworks
• Frequentist conception: Relative frequency of the
outcome of interest as a proportion of the whole
sample space, in the long run. (“Objective”
probability)

• Bayesian conception: Degree of belief, plausibility.


Beliefs are constantly updated with new
information. (“Subjective” probability)
Probability vs. Likelihood: Probability

Probability

p( z > 1.65 | normal distribution with mean = 0, SD = 1) = 0.05


Probability vs. Likelihood: Probability

Probability

p( z > 1.96 | normal distribution with mean = 0, SD = 1) = 0.025


Probability vs. Likelihood: Likelihood
0.35

0.3

0.25
Probability density

0.2

0.15

0.1

0.05

0
-5 -4 -3 -2 -1 0 1 2 3 4 5
x

L(normal distribution with mean = 0, SD = 1.3 | x = 2) = 0.094


Probability vs. Likelihood: Likelihood
0.35

0.3

0.25
Probability density

0.2

0.15

0.1

0.05

0
-5 -4 -3 -2 -1 0 1 2 3 4 5
x

L(normal distribution with mean = 1, SD = 1.3 | x = 2) = 0.2283


Maximum likelihood estimates:
Introducing Bayes Factors
• NHST has a somewhat convoluted (yet solid) logic:
Assuming that something makes no difference, then
showing that the observed results are unlikely, given
that assumption. Then rejecting this assumptions on
these grounds. And concluding that there probably was
a difference after all.
• Also somewhat weak: What is the difference?
• Bayes Factors provide an alternative/complement to
classical null hypothesis significance testing that
addresses both of these issues:
• p(D|H) à p (H|D)
• Bayes Theorem: Allows to invert conditional
probabilities if (and only if) we know the prior
probabilities of the hypotheses.
The Bayes Factor
!(#! |%) !(%|#! ) !(#! )
= ×
!(#" |%) !(%|#" ) !(#" )

Posterior Bayes Prior


odds Factor odds
So what are Bayes factors?
• In essence, likelihood ratios:

Likelihood
p(A)∗ p(B | A)
p(A | B) =
p(B) Priors
Posterior

p(D | H1 )
BF =
p(D | H 0 )
The history of Bayes factors
• Developed by Jeffreys (1935).
• Called “significance tests”.
• This alternative was not really appreciated until
about the 1990s.
• They are still “catching on”
What do Bayes Factors give us?
• They quantify the strength of the evidence in favor
of a hypothesis, given the data.
• Can be used for model comparison (which model is
more likely, given the data).
Deriving the Bayes factor
Strength of evidence for H1 or H0, given they are
equally likely a priori, from Lee & Wagenmakers, 2013
Example: Determining the most likely effectiveness of
a medication using maximum likelihood estimation
Bayes theorem also allows us to assess
the veracity of the published literature
• What is the probability that a published result
is actually true?
• If we set the significance level α to 0.05, does
this mean that the false positive rate in the
field is 5%?
• No!
• This is a common misconception.
• It also depends on the prior probability of a
true effect in a given field as well as the
statistical power of the study.
Introducing PPV (Ioannidis, 2005):
• PPV = “positive predictive value” – *post* study
probability that a result is true:
• R: Ratio of true to false effects in a field.
• PPV links power, alpha and R:

PPV = (1− β )R / (R − β R + α )
PPV = (1− β )R / ((1− β )R + α )
Deriving PPV
RESEARCH/REALITY
TRUE FALSE TOTAL
SIG 1-β α 1-β + α

NONSIG β 1-α β + 1-α

TOTAL 1 1 2

RESEARCH/REALITY
TRUE FALSE TOTAL
SIG NT (1-β) NFŸα NT (1-β) + NF Ÿα
NONSIG NTŸβ NF (1-α) NT Ÿβ + NF (1-α)
TOTAL NT NF NT+NF=c
R = NT/NF R Ÿ NF=c - NF R = NT/NF NT = cŸR/(R+1)
NT = R Ÿ NF R Ÿ NF + NF= c R/(R+1) = (NT/NF)/c/NF
NT = c - NF R + 1 = c/NF R/(R+1) = NT/c NF = c/(R+1)
Deriving PPV
RESEARCH/REALITY
TRUE FALSE TOTAL
SIG NT (1-β) NFŸα NT (1-β) + NF Ÿα
NONSIG NTŸβ NF (1-α) NT Ÿβ + NF (1-α)
TOTAL NT NF NT+NF=c
R = NT/NF R Ÿ NF=c - NF R = NT/NF NT = cŸR/(R+1)
NT = R Ÿ NF R Ÿ NF + NF= c R/(R+1) = (NT/NF)/c/NF
NT = c - NF R + 1 = c/NF R/(R+1) = NT/c NF = c/(R+1)

RESEARCH/REALITY
TRUE FALSE TOTAL
SIG c(1-β)ŸR/(R+1) cŸα/(R+1) cŸ(RŸ(1-β)+α)/(R+1)
NONSIG cŸβŸR/(R+1) cŸ(1-α) /(R+1) cŸ(1 + RŸβ-α)/(R+1)
TOTAL cŸR/(R+1) c/(R+1) c
Deriving PPV
RESEARCH/REALITY
TRUE FALSE TOTAL
SIG c(1-β)ŸR/(R+1) cŸα/(R+1) cŸ(RŸ(1-β)+α)/(R+1)
NONSIG cŸβŸR/(R+1) cŸ(1-α) /(R+1) cŸ(1 + RŸβ-α)/(R+1)
TOTAL cŸR/(R+1) c/(R+1) c
PPV = p(effect true | significant)= p(true)/p(sig)Ÿp(sig|true)
p(true) = NT/c = R/(R+1)
p(sig|true) = 1-β
p(sig) = p(sig|true)Ÿp(true)+p(sig|false)Ÿp(false)
p(sig) = 1-βŸR/(R+1)+αŸNF/c
p(sig) = 1-βŸR/(R+1)+αŸ(R+1)
PPV = R/(R+1)/1-βŸR/(R+1)+αŸ(R+1)Ÿ1-β
PPV = (1-β)ŸR/((1-β)ŸR+α)
A study converts the prior odds of the
effect being true (R) into the posterior –
PPV, as a function of statistical power:
Confidence interval vs. credible interval
• Frequentist approach: The value of a parameter 𝜽 is
unknown, but fixed.
• We can estimate it by taking samples from the
population. This yields a (tychenic) distribution of
sample means, which we can use to calculate the CI.
• Example: You are an epidemiologist and want to
estimate the prevalence of Herpes Simplex in the
population using a frequentist approach:
𝜽: The prevalence of Herpes simplex in
the population
0.14

0.12

0.1

0.08
Proportion

0.06

0.04

0.02

0
0.44 0.46 0.48 0.5 0.52 0.54 0.56
In a Bayesian framework, 𝜽 is a RV and has
a probability distribution
• This probability distribution corresponds to our
degree of belief.
• If this is our prior belief about the value of the
parameter, this is called the prior distribution.
• The prior distribution can take any shape. A
completely flat prior distribution is called an
“uninformative prior”.
• The sharper the prior distribution, the more
informative it is.
• Often used to model the prior distribution of a
proportion: The Beta distribution
Prior distributions of varying degrees of
informativeness:
So what?
• In Bayesian analysis, we can use the data from a
study (which yields the likelihood) in combination
with the prior distribution to compute a posterior
distribution:
• Posterior = Prior x Likelihood
• In something like Herpes simplex, we could model
the likelihood with a binominal distribution (how
many people infected, as a proportion of sample
size:
• p(𝜽 | y) = p(𝜽) x p (y | 𝜽)
• Data = y
Example: We are epidemiologists and want to
know the likely value of 𝜽 in a certain location.
• We have a somewhat informative prior from the
literature (say a Beta distribution with α = β = 10).
• We take a local sample of 100 people and see how
many are infected with the virus.
• This yields a posterior distribution of 𝜽:
What does our study yield?

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Posterior = Prior x Likelihood


How do we get a new estimate of 𝜽
from the credible interval?
The credible interval is the area under the
posterior distribution:
𝜽 = 0.373, at 95% credibility

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1


The relative strength of the prior distribution to
determine the location and shape of the
posterior distribution at fixed likelihood:
The impact of larger samples (stronger
likelihood) on the posterior distribution at
a fixed prior
At small sample sizes, the prior matters a lot:
n = 20 n = 20 n = 20

0 0.5 1 0 0.5 1 0 0.5 1

n = 20 n = 20 n = 20

0 0.5 1 0 0.5 1 0 0.5 1

n = 20 n = 20 n = 20

0 0.5 1 0 0.5 1 0 0.5 1


At large sample sizes, the prior doesn’t
matter much:
n = 500 n = 500 n = 500

0 0.5 1 0 0.5 1 0 0.5 1

n = 500 n = 500 n = 500

0 0.5 1 0 0.5 1 0 0.5 1

n = 500 n = 500 n = 500

0 0.5 1 0 0.5 1 0 0.5 1

You might also like