0% found this document useful (0 votes)
11 views

What Is Bayesian

This document provides an introduction to Bayesian statistics through an example of a gambling game between Alice and Bob. It explains that in Bayesian statistics, we deal with uncertainty by considering all possible values of unknown parameters, rather than estimating single point values. The key is writing the probability we want to infer in terms of known data and unknown parameters, and solving the resulting equation while integrating over all possible parameter values. This allows us to more rigorously account for uncertainty in unknowns.

Uploaded by

apuntesjmzn
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

What Is Bayesian

This document provides an introduction to Bayesian statistics through an example of a gambling game between Alice and Bob. It explains that in Bayesian statistics, we deal with uncertainty by considering all possible values of unknown parameters, rather than estimating single point values. The key is writing the probability we want to infer in terms of known data and unknown parameters, and solving the resulting equation while integrating over all possible parameter values. This allows us to more rigorously account for uncertainty in unknowns.

Uploaded by

apuntesjmzn
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

PRIMER

What is Bayesian statistics?


© 2004 Nature Publishing Group http://www.nature.com/naturebiotechnology

Sean R Eddy
There seem to be a lot of computational biology papers with ‘Bayesian’ in their titles these days. What’s distinctive
about ‘Bayesian’ methods?

There are excellent introductory books on If p were known, this would be easy Inferring p from the data
Bayesian analysis1–3, but the key ideas behind Because Alice just needs one more point to The problem is that Alice and Bob don’t know
the buzzword can be grasped quickly. Con- win, Bob only wins the game if he takes the p. The very fact that Alice is ahead 5-3 is evi-
sider the following gambling puzzle—one next three points in a row. The probability of dence that the unknown position of the mark
that has ancient roots in the origins of both this is (1 – p)3; Alice will win on any other is probably giving Alice an advantage, but the
classical and Bayesian probability theory. outcome, so the probability of her winning is numbers are small, and she can’t be sure.
[1 – (1 – p)3]. If Alice knew p, it would be easy Maybe the mark is in Bob’s favor and he’s just
The table game for her to calculate fair odds. For instance, if been unlucky so far.
Alice and Bob are playing a game in which the the mark were exactly in the middle of the This sets up a scientific inference problem in
first person to get 6 points wins. The way each table (or if this were the ‘coin game,’ where microcosm. We have a limited amount of data:
point is decided is a little strange. The Casino points are decided by flipping a fair coin), Alice is winning 5-3. We are interested in infer-
has a pool table that Alice and Bob can’t see. p would be 0.5; the probability that Bob would ring an unknown ‘hypothesis’: the value of p.
Before the game begins, the Casino rolls an win would be (1 – 0.5)3, or 1/8; and the proba- We want to use this inference to predict future
initial ball onto the table, which comes to rest bility that Alice would win would be 7/8; fair events: how probable is it that Alice will win?
at a completely random position, which the odds would be 7:1. One approach would be to make a maxi-
Casino marks. Then, each point is decided by What we’re doing here is calculating the mum likelihood estimate of the unknown
the Casino rolling another ball onto the table probability of observing some data (the out- parameter p. This is the frequency at which
randomly. If it comes to rest to the left of the comes of up to the next three points) given a Alice has won so far, 5/8. From this, we esti-
initial mark, Alice wins the point; to the right probability model (the probability p). The mate that Bob’s probability of winning is
of the mark, Bob wins the point. The Casino general notation for such a probability is (3/8)3 = 27/512, and Alice’s probability of win-
reveals nothing to Alice and Bob except who P(data|model), where the ‘|’ sign means ‘given’ ning is 485/512; fair odds would be about 18:1.
won each point. or ‘conditional upon.’ But, as we will see, this is way off.
Clearly, the probability that Alice wins a Calculating the probability of an observed
point is the fraction of the table to the left of outcome given known parameters and known The Bayesian solution
the mark—call this probability p; and Bob’s hypotheses tends to be a familiar process, The Bayesian approach is to write down
probability of winning a point is 1 – p. Because especially if we’re talking about outcomes of exactly the probability we want to infer, in
the Casino rolled the initial ball to a random flipping coins, rolling dice or drawing white terms only of the data we know, and directly
position, before any points were decided every and black balls from urns. Interestingly, solve the resulting equation — which forces us
value of p was equally probable. The mark is though, the ‘share problem’ (A leads B 5:3 in a to deal explicitly with all mathematical diffi-
only set once per game, so p is the same for coin-flipping game to 6; the game is inter- culties, additional assumptions and uncertain-
every point. rupted; how to fairly split the pot?) was con- ties that may arise. One distinctive feature of a
Imagine Alice is already winning 5 points to troversial for centuries after it was first Bayesian approach is that if we need to invoke
3, and now she bets Bob that she’s going to proposed in the 1300s. Published solutions uncertain parameters in the problem, we do
win. What are fair betting odds for Alice to included 2:1 and 3:1 odds, and one mathe- not attempt to make point estimates of these
offer Bob? That is, what is the expected proba- matician sniffed at another’s solution,“there is parameters; instead, we deal with uncertainty
bility that Alice will win? an evident error in the determination of the more rigorously, by integrating over all possi-
shares that even a child should recognize”— ble values that a parameter might assume.
but gave no answer himself4. (Statistics has Here, for instance, what we want to know is
Sean R. Eddy is at Howard Hughes Medical changed since the Renaissance; peer review is the expected probability that Bob will win
Institute and Department of Genetics, much the same.) Blaise Pascal’s mid-1600s (call it E). By definition, this is the weighted
Washington University School of Medicine, correspondence with Fermat describing his average of (1 – p)3 over all possible values of p:
4444 Forest Park Blvd., Box 8510, Saint Louis, reasoning in deriving a correct 7:1 solution is
Missouri 63108, USA. considered to be one of the origins of proba- 1

e-mail: [email protected] bility theory.


E(Bob wins)= ∫ (1–p)
0
3
P (p | A = 5, B = 3) dp

NATURE BIOTECHNOLOGY VOLUME 22 NUMBER 9 SEPTEMBER 2004 1177


PRIMER

where the (1 – p)3 term is the probability that It happens that these integrals have analytic My ‘table game’ is adapted from the key
Bob wins given a particular choice of p and the solutions. A ‘beta integral’ is example in the landmark, posthumous 1763
P(p | A=5, B=3) term is the probability that paper by the Reverend Thomas Bayes. The
that particular choice of p is the correct one, 1 beauty of Bayes’ table-and-balls analogy is that
given the observed data that the score is
Alice 5, Bob 3.
∫ 0 pm–1(1 – p)n–1 dp =
Γ(n)Γ(m)
Γ(m + n)
it circumvented all three difficulties in one
stroke, making it possible to think clearly
What is P(p | A=5, B=3)? The probability of about a verifiable inverse probability problem.
the parameter p given the data is not the same where Γ(x) is a gamma function, a generaliza- Bayes’ example provided a physical mecha-
thing as the more familiar calculation of tion of the better-known factorial function to nism for drawing a probability from a uni-
real numbers: Γ(n + 1) = n! for an integer n. So,
© 2004 Nature Publishing Group http://www.nature.com/naturebiotechnology

P(A=5, B=3 | p), the probability of the data form prior; the resulting integrals have
given a known parameter p. It is a so-called plugging in and solving, we get an answer of analytic solutions; and every term has a fre-
inverse probability problem. Rather than (5!6!/12!)/(5!3!/9!) = 1/11 for Bob’s expected quentist interpretation, because we can repeat
P(data | model), we need P(model | data). probability of winning, and Alice’s expected the physical process of rolling a trial ball to
The solution to inverse probability prob- probability is 10/11. Thus, the Bayesian calcula- choose p. Indeed, it is easy to verify that the
lems is the grandiosely named “Bayes’ theo- tion estimates fair odds to be 10:1—which is correct answer to the table game problem is
rem”, which actually is a trivial algebraic verifiably correct, as we’ll see below. 10:1—write a computer program to simulate
truism for two random variables X and Y: the table game many times, and count the fre-
Difficulties with Bayesian statistics quency with which Alice versus Bob ends up
Bayesian analysis (explicit probabilistic winning after a match reaches a 5-3 score in
P(Y | X)P(X) P(Y | X)P(X)
P(X | Y) = = inference) is an attractively direct, formal Alice’s favor.
P(Y)
Σ X′ P(Y | X′)P(X′) means of dealing with uncertainty in scientific
inference, but there are three important Applications in computational biology
or, in this case, difficulties. There is no shortage of problems in biology
One difficulty is computational. Bayesian where we want to infer something from
P(A = 5, B = 3 | p) P(p) calculations almost invariably require integra- observed data, but the inference depends on
P(p | A = 5, B = 3) =
1 tions over uncertain parameters. These inte- uncertain parameters or missing data in a
∫ 0
P(A = 5, B = 3 | p) P(p) dp
grations often have no analytical solution, probability model. For example, in phyloge-
and instead require computationally intensive netic analysis, the probability of an evolution-
That is, the probability of a particular numerical integration (such as Markov- ary tree given some observed DNA sequences
choice of p given the data (the ‘posterior prob- chain Monte Carlo methods). Until the advent is conditional on a multiple alignment, an
ability’ of p) is proportional to the probability of computers, Bayesian approaches often evolutionary model, and branch lengths on
that we would get the observed data if that p weren’t feasible. the tree, all of which are subject to substantial
were true (the ‘likelihood’ of p) multiplied by Second, Bayesian methods require specify- uncertainty, but for which traditional meth-
the a priori probability of this p relative to all ing prior probability distributions, which are ods try to make single point estimates. Using
other possible values of p (the ‘prior probabil- often themselves unknown. Bayesian analyses Bayesian methods, we can instead integrate
ity’ of p). To make this come out as a probabil- generally assume so-called ‘uninformative’ over varying degrees of uncertainty in differ-
ity, we divide by a summation over all possible (often uniform) priors in such cases. Intro- ent aspects of the analysis. The robustness of
values of p; because p is a continuous variable, ducing subjective assumptions into an infer- Bayesian methods in the face of partial infor-
this means an integration from p = 0 to p = 1. ence is unpalatable to some statisticians. The mation and poorly determined parameters
The use of inverse probability calculations and usual counterargument is that non-Bayesian lets us use more complicated, more realistic
Bayes’ theorem is a second distinctive feature methods make comparable assumptions probability models. This is proving to be
of Bayesian approaches. implicitly, and it’s probably better to have highly useful in the ‘post-genomic’ world of
The likelihood term is the term we know one’s assumptions out in the open. analyzing large, noisy biological data sets.
how to calculate; P(A=5,B=3 | p) is a binomial Third, though Bayes’ theorem is trivially
(8!/5!3!)p5(1 – p)3. The prior term P(p) is true for random variables X and Y, it is 1. Gelman, A., Carlin, J.B., Stern, H.S. & Rubin, D.B.
Bayesian Data Analysis (Chapman & Hall/CRC, Boca
potentially problematic. By definition, P(p) is not clear to everyone that parameters or Raton, Florida, USA, 1995).
a probability of p before any data have been hypotheses should be treated as random vari- 2. MacKay, D.J.C. Information Theory, Inference, and
Learning Algorithms (Cambridge Univ. Press,
observed. How do we know anything about p ables. Everyone accepts that we can talk about Cambridge, UK, 2003).
before we’ve seen any data? the probability of observed data given a 3. Jaynes, E.T. Probability Theory: The Logic of Science
A crucial feature of the ‘table game’ is that model, where we mean the frequency with (Cambridge Univ. Press, Cambridge, UK, 2003).
4. Hacking, I. The Emergence of Probability (Cambridge
P(p) is well-defined: the game is contrived which we would obtain those data in the limit Univ. Press, Cambridge, UK, 1975).
such that p is picked from a uniform distri- of infinite trials. But if we talk about the ‘prob-
bution. Because it’s uniform, it’s a constant, ability’ of a one-time, nonrepeatable event
and it cancels out of the Bayes equation; that is either true or false, there is no frequency
after some algebraic rearrangement, we’re interpretation, and we are using probability
left with: in the sense of a confidence or a degree of
belief. This seems common sense, but it Wondering how some other
1 mathematical technique really works?

E(Bob wins) =
∫ 0
5
p (1 – p) dp 6 remains controversial amongst good statisti-
cians. Using probability to represent a degree Send suggestions for future primers to
1
of belief is a third distinctive feature of [email protected].
∫ 0
p 5 (1 – p)3 dp
Bayesian approaches.

1178 VOLUME 22 NUMBER 9 SEPTEMBER 2004 NATURE BIOTECHNOLOGY

You might also like