Introduction To Bayesian Statistics: Foo Lee Kien (PHD)
Introduction To Bayesian Statistics: Foo Lee Kien (PHD)
Bayesian Statistics
Foo Lee Kien (PhD)
1
Motivating Example
A and B are playing a game:
A will write down a number and then flip a coin.
If the flip is head, A will honestly tell B whether the number is even or
odd.
If the flip is tail, A will lie.
B will then guess if the number is odd or even.
2
Motivating Example
Before A and B starts their game
3
Motivating Example
After you observed A and B playing a few rounds
4
Contents
• Frequentist Approach
• Bayesian Approach
• Review on Probability
• Bayesian Inference
5
Frequentist Approach
6
Frequentist Approach
• Assumed a hypothetical infinite sequence of events.
• Look at the relevant frequency.
• Example:
P(getting a head when toss a fair coin)
7
Example
P(getting a head when toss a fair coin)
No of tosses No of heads
10 3
100 51
500 249
1000 503
5000 2521
10000 5067
50000 25101
8
Frequentist Approach
To test whether an event occurs or not, it calculates the probability of
an event in the long run of an experiment.
• p-value
• Confidence Interval
9
Bayesian Approach
10
Thomas Bayes Pierre-Simon Laplace
(1702 –1761) (1749–1827)
11
Bayesian Approach
• Subjective approach to probability.
• Allows us to reason about our beliefs under conditions of uncertainty.
• Provides the tools to update our beliefs in the evidence of new data.
• Under the subjective or “degrees of belief” definition of probability,
we can ask question such as:
• What is the probability that Malaysia will rank number 1 in the 2017 SEA
game?
12
Suppose, out of all the 4 marathon races between 2
players, A and B, A won 3 times while B won only 1.
There is a coming race, which player would you bet your
money on?
Example
1
3
Example
What if you are told that it rained once when B won and once when A
won and it is definitely going to rain on the next marathon race.
So, which player would you bet your money on now?
14
Bayesian
Approach
15
Bayesian Approach
16
Review on Probability
17
Axioms of Probability
• 0 ≤ P(A) ≤ 1
• Certainty: P(A) = 1
• Impossible: P(A) = 0
• Uncertainty: 0 < P(A) < 1
18
Examples
1) P(get a 4 when roll a die)
2) P(get a tail when toss a coin)
3) P(get the sum of 4 when roll a die two times)
19
Joint Probability
• Probability that two events, say A and B, are both true.
• P(A∩B)
20
Example Assume that the engine component of a spacecraft
consists of two engines in parallel. If the main engine is
95% reliable (M), the backup is 80% reliable (B), and the
engine component as a whole is 99% reliable, what is the
probability that both engines will be operable?
P(main)= 0.95
P(back)=0.8
P(Main U Back)= 0.99
P(Main n Back)=P(M)+P(B)-P(M U B)
Conditional Probability
• Probability based on some background information.
• P(A|B): probability of A given that B is true.
𝑃(A∩B)
P(A|B) =
𝑃(B)
22
Example
In studying the causes of power failures, these data have been
gathered. 5% are due to transformer damage (T), 80% are due to line
damage (L) and 1% involve both problems. Based on these
percentages, what is the probability that a power failure involves line
damage given that there is transformer damage.
23
More on Conditional Probability
From the equation of conditional probability:
𝑃(A∩B)
P(A|B) = ⟹𝑃 A∩B =𝑃 A B 𝑃 B
𝑃(B)
𝑃(B∩A)
P(B|A) = ⟹𝑃 B∩A =𝑃 B A 𝑃 A
𝑃(A)
24
Bayes Theorem
𝑃 A∩B =𝑃 A B 𝑃 B and 𝑃 B∩A =𝑃 B A 𝑃 A
𝑃 A∩B =𝑃 B∩A
𝑃 A B 𝑃 B =𝑃 B A 𝑃 A
𝑃(B|A)𝑃(A)
𝑃 AB =
𝑃(B)
25
Example
There are two bowls of cookies. Bowl 1 contains 30 vanilla cookies and
10 chocolate cookies. Bowl 2 contains 20 of each. Now suppose you
choose one of the bowls at random and, without looking, select a
cookie at random. The cookie is vanilla (V). What is the probability that
it came from Bowl 1 (B1)?
26
Practice Questions
27
Bayesian Inference
28
Bayesian Inference
In general:
1. Represent prior uncertainty about model parameter with a probability
distribution.
2. Update the prior with info learnt from current data (likelihood).
3. Produce a posterior probability distribution for the parameter that
contains less uncertainty.
29
Example
• Suppose there are 2 balls in a bag, we know that at least one of
them is black. Thus there will be two hypothesis,
BB: Both black
BW: One black and one white
30
Prior
• Bayesian analysis starts from choosing a prior distribution:
P(BB) = 0.5
P(BW) = 0.5
31
Likelihood
• The probability that you would have observed the data, if that
hypothesis is true:
• Likelihood for BB =
• Likelihood for BW =
32
Bayes’ Box
• We can now represent our calculation in a Bayes’ box:
33
Bayes’ Box
• Rewriting Bayes theorem in terms of Bayes’ box:
𝑃(𝐷|𝐻)𝑃(𝐻)
𝑃 𝐻𝐷 =
𝑃(𝐷)
34
Example
In 1995, the company that produce M&M chocolate introduced blue
M&M’s. Before then, the color mix in a bag of M&M’s was 30% Brown,
20% Yellow, 20% Red, 10% Green, 10% Orange, 10% Tan. After 1995, it
was 24% Blue , 20% Green, 16% Orange, 14% Yellow, 13% Red, 13%
Brown.
Suppose Jack has two bags of M&M’s, and he tells you that one is from
1994 and one from 1996. He won’t tell you which is which, but he gives
you one M&M from each bag. One is yellow and one is green. What is
the probability that the yellow one came from the 1994 bag?
35
M&M
What if Jack gives you one brown M&M and one yellow M&M instead?
36
Example
When you do the experiment, you do actually get a busy signal. Lets
consider the following 4 hypothesis and the corresponding probability:
H1: The phone is working and the number is correct, P(H1) = 0.4
H2: The phone is working and the number is incorrect, P(H2) = 0.4
H3: The phone is broken and the number is correct, P(H3) = 0.1
H3: The phone is broken and the number is incorrect, P(H4) = 0.1
37
Example
You change your job and have a phone installed in your new office. You
can’t remember the number but suspect it might be 555-3226. To test
if this is the correct number, you carry out an experiment by picking up
the phone and dialing 555-3226.
If you are correct about this phone number, you will definitely get a
busy signal. If you are incorrect, the probability of getting a busy signal
is 0.01. However, all of the above is only true if the phone is working. If
the phone is broken, you will always get a busy signal.
39
Bayesian Inference
• An important part of Bayesian inference is the establishment
of parameters and models.
• Models: the mathematical formulation of the observed events.
• Parameters: the factors in the models affecting the observed data.
40
Mathematical Model
Example: The Normal distribution
2
1 1 𝑥−𝜇
𝑓 𝑥 = exp −
𝜎 2𝜋 2 𝜎
Parameters:
41
Normal Distribution
42
Mathematical Model
Example: The Beta distribution
𝑥 𝛼−1 × 1 − 𝑥 𝛽−1
𝑓 𝑥 =
𝐵(𝛼, 𝛽)
Parameters:
43
Beta Distribution
44
Bayesian Inference
• Rewriting Bayes theorem in terms of probability distribution:
𝑃(𝑑𝑎𝑡𝑎|𝜃)𝑃(𝜃)
𝑃 𝜃 𝑑𝑎𝑡𝑎 =
𝑃(𝑑𝑎𝑡𝑎)
45
Bayesian Inference
• Example:
in tossing a coin, the fairness of coin may be defined as the
parameter of coin denoted by θ. The outcome of a series of coin
toss is the data.
• Given an outcome, what is the probability that the coin is fair
(θ=0.5).
𝑃(𝑑𝑎𝑡𝑎|𝜃)𝑃(𝜃)
𝑃 𝜃 𝑑𝑎𝑡𝑎 = ⇒ 𝑃 𝜃 𝑑𝑎𝑡𝑎 ∝ 𝑃 𝑑𝑎𝑡𝑎 𝜃 𝑃(𝜃)
𝑃(𝑑𝑎𝑡𝑎)
46
Fairness of Coin
𝑃 𝜃 𝑑𝑎𝑡𝑎 ∝ 𝑃 𝑑𝑎𝑡𝑎 𝜃 𝑃(𝜃)
47
Fairness of Coin - Likelihood
• Likelihood: the probability of observing a particular number of
heads in a particular number of flips for a given fairness of coin.
• The probability of observing heads/tails depends upon the fairness
of coin (θ).
• If we toss a coin 1 time, what is the probability of observing a head?
48
Fairness of Coin - Bernoulli Likelihood
• Lets y be the outcome of tossing the coin one time
• Lets represent head as 1 and tail as 0
𝑃 𝑦 = 1 𝜃 = 𝜃𝑦 𝑃 𝑦 = 0 𝜃 = (1 − 𝜃)(1−𝑦)
𝑃 𝑦 𝜃 = 𝜃 𝑦 × (1 − 𝜃)(1−𝑦)
49
Fairness of Coin - Binomial Likelihood
• If we toss the coin many times, say n times, the likelihood is
50
Fairness of Coin - Binomial Likelihood
• If we are interested in the probability of getting k
heads in N number of toss then the probability is given by:
𝑃 𝑘, 𝑁 𝜃 = 𝜃 𝑘 × (1 − 𝜃)(𝑁−𝑘)
51
Fairness of Coin – Prior distribution
• Used to represent our beliefs about the parameters based on the
previous experience.
• No previous experience?
• For this example, we can use the beta distribution.
• Probability density function of beta distribution is of the form
52
Fairness of Coin – Posterior distribution
Let’s calculate posterior distribution with Bayesian inference.
𝑃 𝜃 𝑑𝑎𝑡𝑎 ∝ 𝑃 𝑑𝑎𝑡𝑎 𝜃 𝑃 𝜃
𝑃 𝜃 𝑘, 𝑁 ∝ 𝑃 𝑘, 𝑁 𝜃 𝑃 𝜃
• Likelihood: 𝑃 𝑘, 𝑁 𝜃 = 𝜃 𝑘 × (1 − 𝜃)(𝑁−𝑘)
𝜃 𝛼−1 × 1−𝜃 𝛽−1
• Prior: 𝑃 𝜃 =
𝐵(𝛼,𝛽)
53
Fairness of Coin
54
Fairness of Coin
• Suppose you think that a coin is biased. It has a mean (𝜇) of 0.4 with
standard deviation (𝜎) of 0.1.
• For a beta distribution:
1−𝜇 𝜇−𝜇 2 −𝜎 2 𝛽𝜇
𝛽= and 𝛼=
𝜎2 1−𝜇
55
Prior Distribution
56
Fairness of Coin
• Suppose you observed 70 heads out of 100 tosses.
𝑃 𝜃 𝑘, 𝑁 ~ 𝐵(𝑘 + 𝛼, 𝑁 − 𝑘 + 𝛽)
N = 100, k = 70, 𝛼 = 9.2, 𝛽 = 13.8
57
Posterior Distribution
58
Bayesian Inference
59
• Previous studies or published work.
• Researcher intuition.
• Substantive experts.
• Convenience (conjugacy, vagueness).
• Nonparametrics.
• Other data sources.
63
Other topics in Bayesian
1. Credible interval
2. Hypothesis testing and model selection
3. Bayes factor
4. Bayesian modelling
5. Computational techniques
6. Regression
And many others
64
The End
65