8 Probability As A Variable

This document discusses representing probabilities as distributions rather than single values to capture uncertainty. It introduces the beta distribution as a way to represent the probability of outcomes like coin flips. The beta distribution takes on different shapes depending on its parameters to reflect the likelihood of different probabilities. It is commonly used as a prior in Bayesian analysis and updates in a conjugate way when observing new data. An example shows calculating the probability of a student's grade being below the average from their class's grade distribution, which is modeled as a beta distribution.

Uploaded by

hamkarim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

53 views

8 Probability As A Variable

Uploaded by

hamkarim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

8.

Distributions of Probabilities
Chris Piech and Mehran Sahami

May 2017

In this chapter we are going to have a very meta discussion about how we represent probabilities. Until
now probabilities have just been numbers in the range 0 to 1. However, if we have uncertainty about our
probability, it would make sense to represent our probabilities as random variables (and thus articulate the
relative likelihood of our belief).

1 Estimating Probabilities
Imagine we have a coin and we would like to know its probability of coming up heads (p). We flip the
coin (n + m) times and it comes up head n times. One way to calculate the probability is to assume that it
n
is exactly p = n+m . That number, however, is a coarse estimate, especially if n + m is small. Intuitively it
doesn’t capture our uncertainty about the value of p. Just like with other random variables, it often makes
sense to hold a distributed belief about the value of p.
To formalize the idea that we want a distribution for p we are going to use a random variable X to represent
the probability of the coin coming up heads. Before flipping the coin, we could say that our belief about the
coin’s success probability is uniform: X ∼ Uni(0, 1).
If we let N be the number of heads that came up, given that the coin flips are independent, (N|X) ∼
Bin(n + m, x). We want to calculate the probability density function for X|N. We can start by applying
Bayes Theorem:
P(N = n|X = x) fX (x)
fX|N (x|n) = Bayes Theorem
P(N = n)
n+m n
x (1 − x)m
= n Binomial PMF, Uniform PDF
P(N = n)
n+m
= n
xn (1 − x)m Moving terms around
P(N = n)
Z 1
1
= · xn (1 − x)m where c = xn (1 − x)m dx
c 0

2 Beta Distribution
The equation that we arrived at when using a Bayesian approach to estimating our probability defines a
probability density function and thus a random variable. The random variable is called a Beta distribution,
and it is defined as follows:
The Probability Density Function (PDF) for a Beta X ∼ Beta(a, b) is:
(
1
xa−1 (1 − x)b−1 if 0 < x < 1
Z 1
f (x) = B(a,b) where B(a, b) = xa−1 (1 − x)b−1 dx
0 otherwise 0

1
a
A Beta distribution has E[X] = a+b and Var(X) = (a+b)2ab
(a+b+1)
. All modern programming languages have a
package for calculating Beta CDFs. You will not be expected to compute the CDF by hand in CS109.
To model our estimate of the probability of a coin coming up heads as a beta set a = n + 1 and b = m + 1. Beta
is used as a random variable to represent a belief distribution of probabilities in contexts beyond estimating
coin flips. It has many desirable properties: it has a support range that is exactly (0, 1), matching the values
that probabilities can take on and it has the expressive capacity to capture many different forms of belief
distributions.
Let’s imagine that we had observed n = 4 heads and m = 2 tails. The probability density function for X ∼
Beta(5, 3) is:

Notice how the most likely belief for the probability of our coin is when the random variable, which represents
the probability of getting a heads, is 4/6, the fraction of heads observed. This distribution shows that we hold
a non-zero belief that the probability could be something other than 4/6. It is unlikely that the probability is
0.01 or 0.09, but reasonably likely that it could be 0.5.
It works out that Beta(1, 1) = Uni(0, 1). As a result the distribution of our belief about p before (“prior”)
and after (“posterior”) can both be represented using a Beta distribution. When that happens we call Beta a
“conjugate” distribution. Practically conjugate means easy update.

Beta as a Prior
You can set X ∼ Beta(a, b) as a prior to reflect how biased you think the coin is apriori to flipping it. This is
a subjective judgment that represent a + b − 2 “imaginary” trials with a − 1 heads and b − 1 tails. If you then
observe n+m real trials with n heads you can update your belief. Your new belief would be, X|(n heads in n+
m trials) ∼ Beta(a + n, b + m). Using the prior Beta(1, 1) = Uni(0, 1) is the same as saying we haven’t seen
any “imaginary” trials, so apriori we know nothing about the coin. This form of thinking about probabilities is
representative of the “Bayesian” field of thought where computer scientists explicitly represent probabilities
as distributions (with prior beliefs). That school of thought is separate from the “Frequentest” school which
tries to calculate probabilities as single numbers evaluated by the ratio of successes to experiments.

Assignment Example
In class we talked about reasons why grade distributions might be well suited to be described as a Beta
distribution. Let’s say that we are given a set of student grades for a single exam and we find that it is best
fit by a Beta distribution: X ∼ Beta(a = 8.28, b = 3.16). What is the probability that a student is below the
mean (i.e. expectation)?
The answer to this question requires two steps. First calculate the mean of the distribution, then calculate the
probability that the random variable takes on a value less than the expectation.
a 8.28
E[X] = =
a + b 8.28 + 3.16
≈ 0.7238

2
Now we need to calculate P(X < E[X]). That is exactly the CDF of X evaluated at E[X]. We don’t have
a formula for the CDF of a Beta distribution but all modern programming languages will have a Beta CDF
function. In JavaScript we can execute: jStat.beta.cdf which takes the x parameter first followed by the alpha
and beta parameters of your Beta distribution.

P(X < E[X]) = FX (0.7238) = jStat.beta.cdf(0.7238, 8.28, 3.16)

≈ 0.46

Yi-Fu Tuan - Escapism
100% (5)
Yi-Fu Tuan - Escapism
171 pages
The Ultimate Probability Cheatsheet
No ratings yet
The Ultimate Probability Cheatsheet
8 pages
Probability Cheatsheet
100% (1)
Probability Cheatsheet
10 pages
100سؤال برمجة PDF
No ratings yet
100سؤال برمجة PDF
237 pages
Beta
No ratings yet
Beta
2 pages
The Random Variable For Probabilities: Chris Piech CS109, Stanford University
No ratings yet
The Random Variable For Probabilities: Chris Piech CS109, Stanford University
58 pages
Probability Theory - Towards Data Science
No ratings yet
Probability Theory - Towards Data Science
19 pages
Probability_FoundationalMathofAI_S24
No ratings yet
Probability_FoundationalMathofAI_S24
7 pages
Probability Review
No ratings yet
Probability Review
12 pages
chap 8 2
No ratings yet
chap 8 2
14 pages
chap 8 21
No ratings yet
chap 8 21
7 pages
Lecture 3 - Probability - BMSLec02
No ratings yet
Lecture 3 - Probability - BMSLec02
16 pages
Lecture 9 - Probability COMP7180
No ratings yet
Lecture 9 - Probability COMP7180
58 pages
Lec 07
No ratings yet
Lec 07
51 pages
Lecture 01 Probability
No ratings yet
Lecture 01 Probability
51 pages
Chapter6 Probability
No ratings yet
Chapter6 Probability
20 pages
Probability Basics
No ratings yet
Probability Basics
19 pages
340 Printable Course Notes
No ratings yet
340 Printable Course Notes
184 pages
Probability Theory: Much Inspired by The Presentation of Kren and Samuelsson
No ratings yet
Probability Theory: Much Inspired by The Presentation of Kren and Samuelsson
27 pages
ML Unit2-1
No ratings yet
ML Unit2-1
11 pages
Intro To Probability (Pattern Recognition)
No ratings yet
Intro To Probability (Pattern Recognition)
94 pages
Probability Theory 2013
No ratings yet
Probability Theory 2013
61 pages
Probability and Random Variables
No ratings yet
Probability and Random Variables
14 pages
AAS24_1
No ratings yet
AAS24_1
29 pages
CHP 5
No ratings yet
CHP 5
63 pages
P (C - P) P (P) P (C) : Alden Walker
No ratings yet
P (C - P) P (P) P (C) : Alden Walker
4 pages
Operations_Research_Lesson_3-1
No ratings yet
Operations_Research_Lesson_3-1
42 pages
Probability
100% (1)
Probability
145 pages
Nonlinearity in Structural Dynamics Chapter App A
No ratings yet
Nonlinearity in Structural Dynamics Chapter App A
10 pages
3171617_Probability_360
No ratings yet
3171617_Probability_360
74 pages
Discrete Probability Distributions
No ratings yet
Discrete Probability Distributions
5 pages
Discrete Distributions: Bernoulli Random Variable
No ratings yet
Discrete Distributions: Bernoulli Random Variable
27 pages
SI_Chapter-1
No ratings yet
SI_Chapter-1
30 pages
Introduction To Discrete Probability Theory and Bayesian Networks
No ratings yet
Introduction To Discrete Probability Theory and Bayesian Networks
26 pages
Statistics and Probability Katabasis
No ratings yet
Statistics and Probability Katabasis
7 pages
Fundamentals of Data Analytics Lecture 01. Probability: Instructional Team
No ratings yet
Fundamentals of Data Analytics Lecture 01. Probability: Instructional Team
54 pages
01 Lectureslides ProbTheory
No ratings yet
01 Lectureslides ProbTheory
42 pages
Basics of Probability: He Shuangchi
No ratings yet
Basics of Probability: He Shuangchi
42 pages
Basic Probability
No ratings yet
Basic Probability
14 pages
Sample Space.: How Many Different Equally Likely Possibilities Are There?
No ratings yet
Sample Space.: How Many Different Equally Likely Possibilities Are There?
12 pages
Learn Distribute
No ratings yet
Learn Distribute
23 pages
Slides-Probability and Random Processes, 4, March 2024
No ratings yet
Slides-Probability and Random Processes, 4, March 2024
116 pages
Foundations of Machine Learning: Part A: Probability Basics
No ratings yet
Foundations of Machine Learning: Part A: Probability Basics
75 pages
Chapter 2
No ratings yet
Chapter 2
83 pages
Probability
No ratings yet
Probability
9 pages
Bayes' Rule and Law of Total Probability: C C C C C 1 2 3
No ratings yet
Bayes' Rule and Law of Total Probability: C C C C C 1 2 3
8 pages
Presntation Slides
No ratings yet
Presntation Slides
43 pages
Seeing Theory
No ratings yet
Seeing Theory
66 pages
Unit 4
No ratings yet
Unit 4
45 pages
Probability Cheatsheet 140718
100% (1)
Probability Cheatsheet 140718
7 pages
Multivariate Probability: 1 Discrete Joint Distributions
No ratings yet
Multivariate Probability: 1 Discrete Joint Distributions
10 pages
Probability Cheatsheet
100% (1)
Probability Cheatsheet
8 pages
MFCS Notes
No ratings yet
MFCS Notes
88 pages
Distribution in Statistice
No ratings yet
Distribution in Statistice
6 pages
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
Algebraic Equations
From Everand
Algebraic Equations
Demetrios P. Kanoussis
No ratings yet
Group Theory I Essentials
From Everand
Group Theory I Essentials
Emil Milewski
No ratings yet
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Pre-Calculus Essentials
From Everand
Pre-Calculus Essentials
Ernest Woodward
No ratings yet
Set Theory Essentials
From Everand
Set Theory Essentials
Emil Milewski
No ratings yet
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Sigma Notation
No ratings yet
Sigma Notation
18 pages
CH.1 Go Web
No ratings yet
CH.1 Go Web
17 pages
Name-Ahel Kundu Ms-Excel Gr-8 Roll No. STAT02 1st Data Set
No ratings yet
Name-Ahel Kundu Ms-Excel Gr-8 Roll No. STAT02 1st Data Set
30 pages
Knapsack Problem
100% (6)
Knapsack Problem
6 pages
Grade 9 SHARP Mathematics Practice
No ratings yet
Grade 9 SHARP Mathematics Practice
4 pages
A First Course On Kinetics and Reaction Engineering Example 33.2
No ratings yet
A First Course On Kinetics and Reaction Engineering Example 33.2
9 pages
MurreyMathCalculator, Version 1.3
No ratings yet
MurreyMathCalculator, Version 1.3
3 pages
Final Exam Questions - Part 2-1
No ratings yet
Final Exam Questions - Part 2-1
6 pages
Hash Solution
100% (2)
Hash Solution
3 pages
Gödel's Incompleteness Theorems
No ratings yet
Gödel's Incompleteness Theorems
6 pages
Time Table
No ratings yet
Time Table
6 pages
Lecture Notes 1
No ratings yet
Lecture Notes 1
27 pages
Xii Maths Formulas
No ratings yet
Xii Maths Formulas
6 pages
Synergetic Control
No ratings yet
Synergetic Control
13 pages
Fractal Geometry and Chaos
No ratings yet
Fractal Geometry and Chaos
38 pages
Local Dynamic Modeling With Self-Organizing Maps and Applications To Nonlinear System Identification and Control
No ratings yet
Local Dynamic Modeling With Self-Organizing Maps and Applications To Nonlinear System Identification and Control
19 pages
Unit 2 - CFedited
No ratings yet
Unit 2 - CFedited
32 pages
Logic Gates
No ratings yet
Logic Gates
20 pages
9 Solutions Week 9-Three Dimensional Force Systems
No ratings yet
9 Solutions Week 9-Three Dimensional Force Systems
9 pages
2223 S3 Ma Ut4@4
No ratings yet
2223 S3 Ma Ut4@4
13 pages
11 Math - Test Maker @
No ratings yet
11 Math - Test Maker @
2 pages
Portal Frames With Sway
No ratings yet
Portal Frames With Sway
9 pages
3na Quadratic Equations 1
No ratings yet
3na Quadratic Equations 1
3 pages
Measures of Central Tendency - 1 (AM, GM)
No ratings yet
Measures of Central Tendency - 1 (AM, GM)
24 pages
Matrix Algebra
No ratings yet
Matrix Algebra
56 pages
CMT
No ratings yet
CMT
8 pages
The Cosmic Laws of Creation and Destruction - Georgi Stankov
No ratings yet
The Cosmic Laws of Creation and Destruction - Georgi Stankov
185 pages
BPJ Lesson 15
No ratings yet
BPJ Lesson 15
4 pages

8 Probability As A Variable

Uploaded by

8 Probability As A Variable

Uploaded by

8.

P(X < E[X]) = FX (0.7238) = jStat.beta.cdf(0.7238, 8.28, 3.16)

You might also like