0% found this document useful (0 votes)
83 views

Introduction To Bayesian Statistics: Foo Lee Kien (PHD)

The document provides an introduction to Bayesian statistics through examples. It begins with a motivating example of two players, A and B, where A writes a number and flips a coin to determine whether they honestly or dishonestly tell B if the number is even or odd, and B tries to guess. It then outlines the frequentist and Bayesian approaches. The Bayesian approach allows updating beliefs based on new evidence. The document reviews probability concepts like joint and conditional probability. It concludes by explaining Bayesian inference through examples like determining the probability a ball is from one of two bags based on its observed color.

Uploaded by

Ar Junah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views

Introduction To Bayesian Statistics: Foo Lee Kien (PHD)

The document provides an introduction to Bayesian statistics through examples. It begins with a motivating example of two players, A and B, where A writes a number and flips a coin to determine whether they honestly or dishonestly tell B if the number is even or odd, and B tries to guess. It then outlines the frequentist and Bayesian approaches. The Bayesian approach allows updating beliefs based on new evidence. The document reviews probability concepts like joint and conditional probability. It concludes by explaining Bayesian inference through examples like determining the probability a ball is from one of two bags based on its observed color.

Uploaded by

Ar Junah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 65

Introduction to

Bayesian Statistics
Foo Lee Kien (PhD)

1
Motivating Example
A and B are playing a game:
A will write down a number and then flip a coin.
If the flip is head, A will honestly tell B whether the number is even or
odd.
If the flip is tail, A will lie.
B will then guess if the number is odd or even.

Let 𝜃 be probability that B correctly guesses whether the number is


even or odd.

2
Motivating Example
Before A and B starts their game

1. What’s your best guess about 𝜃?

2. What’s the probability that 𝜃 is greater than half?

3
Motivating Example
After you observed A and B playing a few rounds

1. What’s your best guess about 𝜃?

2. What’s the probability that 𝜃 is greater than half?

4
Contents
• Frequentist Approach
• Bayesian Approach
• Review on Probability
• Bayesian Inference

5
Frequentist Approach

6
Frequentist Approach
• Assumed a hypothetical infinite sequence of events.
• Look at the relevant frequency.
• Example:
P(getting a head when toss a fair coin)

7
Example
P(getting a head when toss a fair coin)

No of tosses No of heads
10 3
100 51
500 249
1000 503
5000 2521
10000 5067
50000 25101

8
Frequentist Approach
To test whether an event occurs or not, it calculates the probability of
an event in the long run of an experiment.
• p-value
• Confidence Interval

9
Bayesian Approach

10
Thomas Bayes Pierre-Simon Laplace
(1702 –1761) (1749–1827)

11
Bayesian Approach
• Subjective approach to probability.
• Allows us to reason about our beliefs under conditions of uncertainty.
• Provides the tools to update our beliefs in the evidence of new data.
• Under the subjective or “degrees of belief” definition of probability,
we can ask question such as:
• What is the probability that Malaysia will rank number 1 in the 2017 SEA
game?

12
Suppose, out of all the 4 marathon races between 2
players, A and B, A won 3 times while B won only 1.
There is a coming race, which player would you bet your
money on?

Example
1
3
Example
What if you are told that it rained once when B won and once when A
won and it is definitely going to rain on the next marathon race.
So, which player would you bet your money on now?

14
Bayesian
Approach

15
Bayesian Approach

16
Review on Probability

17
Axioms of Probability
• 0 ≤ P(A) ≤ 1
• Certainty: P(A) = 1
• Impossible: P(A) = 0
• Uncertainty: 0 < P(A) < 1

18
Examples
1) P(get a 4 when roll a die)
2) P(get a tail when toss a coin)
3) P(get the sum of 4 when roll a die two times)

19
Joint Probability
• Probability that two events, say A and B, are both true.
• P(A∩B)

• P(A∪B) = P(A) + P(B) – P(A∩B)

20
Example Assume that the engine component of a spacecraft
consists of two engines in parallel. If the main engine is
95% reliable (M), the backup is 80% reliable (B), and the
engine component as a whole is 99% reliable, what is the
probability that both engines will be operable?

P(main)= 0.95
P(back)=0.8
P(Main U Back)= 0.99
P(Main n Back)=P(M)+P(B)-P(M U B)
Conditional Probability
• Probability based on some background information.
• P(A|B): probability of A given that B is true.

𝑃(A∩B)
P(A|B) =
𝑃(B)

22
Example
In studying the causes of power failures, these data have been
gathered. 5% are due to transformer damage (T), 80% are due to line
damage (L) and 1% involve both problems. Based on these
percentages, what is the probability that a power failure involves line
damage given that there is transformer damage.

P(L|T) = P( L n T)/ P(T)

23
More on Conditional Probability
From the equation of conditional probability:

𝑃(A∩B)
P(A|B) = ⟹𝑃 A∩B =𝑃 A B 𝑃 B
𝑃(B)

𝑃(B∩A)
P(B|A) = ⟹𝑃 B∩A =𝑃 B A 𝑃 A
𝑃(A)

24
Bayes Theorem
𝑃 A∩B =𝑃 A B 𝑃 B and 𝑃 B∩A =𝑃 B A 𝑃 A

𝑃 A∩B =𝑃 B∩A

𝑃 A B 𝑃 B =𝑃 B A 𝑃 A

𝑃(B|A)𝑃(A)
𝑃 AB =
𝑃(B)

25
Example
There are two bowls of cookies. Bowl 1 contains 30 vanilla cookies and
10 chocolate cookies. Bowl 2 contains 20 of each. Now suppose you
choose one of the bowls at random and, without looking, select a
cookie at random. The cookie is vanilla (V). What is the probability that
it came from Bowl 1 (B1)?

26
Practice Questions

27
Bayesian Inference

28
Bayesian Inference
In general:
1. Represent prior uncertainty about model parameter with a probability
distribution.
2. Update the prior with info learnt from current data (likelihood).
3. Produce a posterior probability distribution for the parameter that
contains less uncertainty.

29
Example
• Suppose there are 2 balls in a bag, we know that at least one of
them is black. Thus there will be two hypothesis,
BB: Both black
BW: One black and one white

• Suppose a ball was drawn from the bag, and it is black


D: The ball that was drawn from the bag is black

30
Prior
• Bayesian analysis starts from choosing a prior distribution:
P(BB) = 0.5
P(BW) = 0.5

31
Likelihood
• The probability that you would have observed the data, if that
hypothesis is true:
• Likelihood for BB =
• Likelihood for BW =

32
Bayes’ Box
• We can now represent our calculation in a Bayes’ box:

Hypothesis prior likelihood prior × posterior


likelihood(intersectio
n)
BB 0.5
BW 0.5
Total (condition (Marginal)
al)

33
Bayes’ Box
• Rewriting Bayes theorem in terms of Bayes’ box:

𝑃(𝐷|𝐻)𝑃(𝐻)
𝑃 𝐻𝐷 =
𝑃(𝐷)

• 𝑃 𝐻 : prior distribution of the hypotheses.


• 𝑃(𝐷|𝐻): sampling density of the data, the likelihood.
• 𝑃 𝐷 : marginal probability of the data.
• 𝑃 𝐻 𝐷 : posterior distribution of the hypotheses.

34
Example
In 1995, the company that produce M&M chocolate introduced blue
M&M’s. Before then, the color mix in a bag of M&M’s was 30% Brown,
20% Yellow, 20% Red, 10% Green, 10% Orange, 10% Tan. After 1995, it
was 24% Blue , 20% Green, 16% Orange, 14% Yellow, 13% Red, 13%
Brown.
Suppose Jack has two bags of M&M’s, and he tells you that one is from
1994 and one from 1996. He won’t tell you which is which, but he gives
you one M&M from each bag. One is yellow and one is green. What is
the probability that the yellow one came from the 1994 bag?

35
M&M
What if Jack gives you one brown M&M and one yellow M&M instead?

36
Example
When you do the experiment, you do actually get a busy signal. Lets
consider the following 4 hypothesis and the corresponding probability:

H1: The phone is working and the number is correct, P(H1) = 0.4
H2: The phone is working and the number is incorrect, P(H2) = 0.4
H3: The phone is broken and the number is correct, P(H3) = 0.1
H3: The phone is broken and the number is incorrect, P(H4) = 0.1

37
Example
You change your job and have a phone installed in your new office. You
can’t remember the number but suspect it might be 555-3226. To test
if this is the correct number, you carry out an experiment by picking up
the phone and dialing 555-3226.

If you are correct about this phone number, you will definitely get a
busy signal. If you are incorrect, the probability of getting a busy signal
is 0.01. However, all of the above is only true if the phone is working. If
the phone is broken, you will always get a busy signal.

David MacKay “Information Theory, Inference and Learning algorithm” 38


Phone Example

Hypothesis prior Likelihood prior X likelihood posterior


(get busy
signal)
H1 0.4 1 0.4 0.66
H2 0.4 0.01 0.004 0.0066
H3 0.1 1 0.1 0.17
H4 0.1 1 0.1 0.17
Total 1 3.01 0.604 1

39
Bayesian Inference
• An important part of Bayesian inference is the establishment
of parameters and models.
• Models: the mathematical formulation of the observed events.
• Parameters: the factors in the models affecting the observed data.

40
Mathematical Model
Example: The Normal distribution

2
1 1 𝑥−𝜇
𝑓 𝑥 = exp −
𝜎 2𝜋 2 𝜎

Parameters:

41
Normal Distribution

42
Mathematical Model
Example: The Beta distribution

𝑥 𝛼−1 × 1 − 𝑥 𝛽−1
𝑓 𝑥 =
𝐵(𝛼, 𝛽)

Parameters:

43
Beta Distribution

44
Bayesian Inference
• Rewriting Bayes theorem in terms of probability distribution:

𝑃(𝑑𝑎𝑡𝑎|𝜃)𝑃(𝜃)
𝑃 𝜃 𝑑𝑎𝑡𝑎 =
𝑃(𝑑𝑎𝑡𝑎)

• 𝑃 𝜃 : prior distribution of the parameter.


• 𝑃(𝑑𝑎𝑡𝑎|𝜃): sampling density of the data, proportional to the likelihood.
• 𝑃 𝑑𝑎𝑡𝑎 : marginal probability of the data.
• 𝑃 𝜃 𝑑𝑎𝑡𝑎 : posterior distribution of the parameter.

45
Bayesian Inference
• Example:
in tossing a coin, the fairness of coin may be defined as the
parameter of coin denoted by θ. The outcome of a series of coin
toss is the data.
• Given an outcome, what is the probability that the coin is fair
(θ=0.5).
𝑃(𝑑𝑎𝑡𝑎|𝜃)𝑃(𝜃)
𝑃 𝜃 𝑑𝑎𝑡𝑎 = ⇒ 𝑃 𝜃 𝑑𝑎𝑡𝑎 ∝ 𝑃 𝑑𝑎𝑡𝑎 𝜃 𝑃(𝜃)
𝑃(𝑑𝑎𝑡𝑎)

46
Fairness of Coin
𝑃 𝜃 𝑑𝑎𝑡𝑎 ∝ 𝑃 𝑑𝑎𝑡𝑎 𝜃 𝑃(𝜃)

• To define our model correctly, we need two mathematical models


beforehand.
• One to represent the distribution of prior, 𝑃(𝜃).
• Another for representing the likelihood, 𝑃(𝑑𝑎𝑡𝑎|𝜃).
• Note: prior and posterior should have the same mathematical form.

47
Fairness of Coin - Likelihood
• Likelihood: the probability of observing a particular number of
heads in a particular number of flips for a given fairness of coin.
• The probability of observing heads/tails depends upon the fairness
of coin (θ).
• If we toss a coin 1 time, what is the probability of observing a head?

48
Fairness of Coin - Bernoulli Likelihood
• Lets y be the outcome of tossing the coin one time
• Lets represent head as 1 and tail as 0

𝑃 𝑦 = 1 𝜃 = 𝜃𝑦 𝑃 𝑦 = 0 𝜃 = (1 − 𝜃)(1−𝑦)

• Combine the above into a single definition to represent the


probability of both the outcomes.

𝑃 𝑦 𝜃 = 𝜃 𝑦 × (1 − 𝜃)(1−𝑦)
49
Fairness of Coin - Binomial Likelihood
• If we toss the coin many times, say n times, the likelihood is

𝑃(𝑦1, 𝑦2, ⋯ , 𝑦𝑛) = 𝑃 𝑦1 𝜃 𝑃 𝑦2 𝜃 ⋯ 𝑃 𝑦𝑛 𝜃

𝑃(𝑦1, 𝑦2, ⋯ , 𝑦𝑛) = ς𝑛𝑖=1 𝑃(𝑦𝑖|𝜃)

𝑃(𝑦1, 𝑦2, ⋯ , 𝑦𝑛) = ς𝑛𝑖=1 𝜃 𝑦𝑖 × (1 − 𝜃)(1−𝑦𝑖)

50
Fairness of Coin - Binomial Likelihood
• If we are interested in the probability of getting k
heads in N number of toss then the probability is given by:

𝑃 𝑘, 𝑁 𝜃 = 𝜃 𝑘 × (1 − 𝜃)(𝑁−𝑘)

51
Fairness of Coin – Prior distribution
• Used to represent our beliefs about the parameters based on the
previous experience.
• No previous experience?
• For this example, we can use the beta distribution.
• Probability density function of beta distribution is of the form

𝜃 𝛼−1 × 1−𝜃 𝛽−1


~ 𝐵(𝛼, 𝛽)
𝐵(𝛼,𝛽)

52
Fairness of Coin – Posterior distribution
Let’s calculate posterior distribution with Bayesian inference.
𝑃 𝜃 𝑑𝑎𝑡𝑎 ∝ 𝑃 𝑑𝑎𝑡𝑎 𝜃 𝑃 𝜃
𝑃 𝜃 𝑘, 𝑁 ∝ 𝑃 𝑘, 𝑁 𝜃 𝑃 𝜃

• Likelihood: 𝑃 𝑘, 𝑁 𝜃 = 𝜃 𝑘 × (1 − 𝜃)(𝑁−𝑘)
𝜃 𝛼−1 × 1−𝜃 𝛽−1
• Prior: 𝑃 𝜃 =
𝐵(𝛼,𝛽)

𝑃 𝜃 𝑘, 𝑁 ∝ 𝜃 𝑘+𝛼−1 × (1 − 𝜃)(𝑁−𝑘+𝛽−1) / 𝐵(𝑘 + 𝛼, 𝑁 − 𝑘 + 𝛽)


𝑃 𝜃 𝑘, 𝑁 ~ 𝐵(𝑘 + 𝛼, 𝑁 − 𝑘 + 𝛽)

53
Fairness of Coin

54
Fairness of Coin
• Suppose you think that a coin is biased. It has a mean (𝜇) of 0.4 with
standard deviation (𝜎) of 0.1.
• For a beta distribution:

1−𝜇 𝜇−𝜇 2 −𝜎 2 𝛽𝜇
𝛽= and 𝛼=
𝜎2 1−𝜇

• 𝜇 = 0.4 and 𝜎 = 0.1 ⇒

55
Prior Distribution

56
Fairness of Coin
• Suppose you observed 70 heads out of 100 tosses.

𝑃 𝜃 𝑘, 𝑁 ~ 𝐵(𝑘 + 𝛼, 𝑁 − 𝑘 + 𝛽)
N = 100, k = 70, 𝛼 = 9.2, 𝛽 = 13.8

𝑃 𝜃 70,100 ~ 𝐵(79.2, 43.8)

57
Posterior Distribution

58
Bayesian Inference

As more and more tosses of coins made


⇒ new data is observed
⇒ distribution of parameters get updated.

59
• Previous studies or published work.
• Researcher intuition.
• Substantive experts.
• Convenience (conjugacy, vagueness).
• Nonparametrics.
• Other data sources.

Prior in Bayesian Inference


6
0
The Frequentist paradigm
• Defines probability as a long-run frequency on independent
identical trials.
• Looks at parameters as fixed quantities.
The Bayesian paradigm
• Defines probability as a subjective belief (which must be consistent
with all of one’s other beliefs).
• Looks at parameters as random quantities.
Differences Between Bayesians and Non-Bayesians
What is Fixed?
• Frequentist:
• Data are a repeatable random sample, there is a frequency.
• Underlying parameters remain constant during this repeatable process.
• Parameters are fixed.
• Bayesian:
• Data are observed from the realized sample.
• Parameters are unknown and described probabilistically.
• Data are fixed.

63
Other topics in Bayesian
1. Credible interval
2. Hypothesis testing and model selection
3. Bayes factor
4. Bayesian modelling
5. Computational techniques
6. Regression
And many others

64
The End

65

You might also like