8 - Probability
8 - Probability
• Unobserved variables: Agent needs to reason about other aspects (e.g. where an object is or what
disease is present)
• Model: Agent knows something about how the known variables relate to the unknown variables
• Probabilistic reasoning gives us a framework for managing our beliefs and knowledge
Uncertainty
Let action At = leave for airport t minutes before flight
Will At get me there on time?
Some Problems:
1. partial observability (road state, other drivers' plans, etc.)
2. noisy sensors (traffic reports)
3. uncertainty in action outcomes (flat tire, etc.)
4. immense complexity of modeling and predicting traffic
If just TRUE/FALSE
1. risks falsehood: “A25 will get me there on time”, or
2. leads to conclusions that are too weak for decision making:
“A25 will get me there on time if there's no accident on the bridge and it doesn't rain and
my tires remain intact etc etc.” (also look up qualification problem)
“ A1440 might reasonably be said to get me there on time but I'd have to stay overnight in
the airport ”
Probability
• Cannot list all possible conditions for a given statement
• Cannot deduce the truth value for all the statements for sure
• Instead of absolute statements, use probability to summarize uncertainty
• Probabilities relate to the degree that an agent believes a statement to be true
𝑃(𝐴25 𝑛𝑜 𝑟𝑒𝑝𝑜𝑟𝑡𝑒𝑑 𝑎𝑐𝑐𝑖𝑑𝑒𝑛𝑡𝑠 = 0.06
• The probability changes with new information (evidence)
𝑃(𝐴25 𝑛𝑜 𝑟𝑒𝑝𝑜𝑟𝑡𝑒𝑑 𝑎𝑐𝑐𝑖𝑑𝑒𝑛𝑡𝑠, 5𝐴𝑀 = 0.15
• For this class, we treat probability statements as not assertions about the world but as
assertions about the knowledge state of the agent
Decision Making Under Uncertainty
• Which action would you chose given the following?
P(A25 gets me there on time | …) = 0.04
P(A90 gets me there on time | …) = 0.70
P(A120 gets me there on time | …) = 0.95
P(A1440 gets me there on time | …) = 0.9999
• Depends on preferences for missing flight vs. time spent waiting, etc. and willingness to
take risk
• How to represent these? Utilities! Not going to go into detail
• Utility theory is used to represent and infer preferences
• Decision theory = probability theory + utility theory
Worlds where
A is true
Immediate consequences:
• σω ϵΩ P ω = 1 (from 2 and 3)
• 1 ≥ P E ≥ 0 (from 1,2 and 3)
• If A ⊆ B, P(A) ≤ P(B) (from 3)
Any set is disjoint with the empty set, including other empty sets
Ω = … ∅ڂ∅ڂ∅ڂΩ
P(Ω = ) … ∅ڂ∅ڂ∅ڂP(Ω) = 1
P Ω +P ∅ +P ∅ +P ∅ +⋯=1
1+P ∅ +P ∅ +P ∅ +⋯=1
P ∅ =0
• Temperature: ▪ Weather:
W P
T P
sun 0.6
hot 0.5
rain 0.1
cold 0.5
fog 0.3
meteor 0.0
Prior Probability
• Prior or unconditional probabilities reflect agent’s belief prior to arrival of any (new) evidence
W P
T P sun 0.6
hot 0.5 rain 0.1
cold 0.5 fog 0.3
meteor 0.0
• Probability distributions, in the form of a table, gives values for all possible assignments
• They must sum up to 1
• Note that distributions can be continuous as well!
Continuous Variables
• Express distribution as a parameterized
function of value.
• Let f be a probability density function that
integrates to 1.
E.g. f x = U 18,26 x : Uniform density 1 − 𝑥−𝜇 2 /2𝜎2
E.g. f x = 𝑒 : Gaussian
2𝜋𝜎 2
between 18 and 26
distribution density function
Joint Probability Distributions
• A joint distribution over a set of random variables:
specifies a real number for each assignment (or outcome):
T W P
hot sun 0.4
• Must obey:
hot rain 0.1
cold sun 0.2
cold rain 0.3
• Every question about a domain can be answered by the joint distribution because every event
is a sum of sample points
• Size of distribution if n variables with domain sizes d?
• For all but the smallest distributions, impractical to write out!
Exercise: Joint Probabilities and Events
• P(+x, +y) ?
0.2
X Y P
• P(+x) ? +x +y 0.2
0.2+0.3 +x -y 0.3
-x +y 0.4
• P(-y OR +x) ? -x -y 0.1
P(-y ڂ+x) = P(-y)+P(+x) – P(-y ⋂+x)
(0.3+0.1)+(0.2+0.3)-0.3=0.6
T P
hot 0.5
T W P
cold 0.5
hot sun 0.4
hot rain 0.1
cold sun 0.2 W P
cold rain 0.3 sun 0.6
rain 0.4
Conditional Probabilities
• Conditional or posterior probabilities
e.g., P(cavity | toothache) = 0.8
i.e., given that toothache is all I know NOT “if toothache then 80% chance of cavity”
P(a,b)
P(a) P(b)
T W P
hot sun 0.4
hot rain 0.1
cold sun 0.2
cold rain 0.3
Exercise: Conditional Probabilities
• P(+x | +y) ?
0.2/(0.2+0.4)=1/3
X Y P
+x +y 0.2 • P(-x | +y) ?
+x -y 0.3 0.4/(0.2+0.4)=2/3
-x +y 0.4
-x -y 0.1
• P(-y | +x) ?
0.3/(0.2+0.3)=0.6
W P
T W P sun 0.8
hot sun 0.4 rain 0.2
hot rain 0.1
cold sun 0.2
cold rain 0.3 W P
sun 0.4
rain 0.6
Conditional Distributions
T W P
hot sun 0.4
W P
hot rain 0.1
sun 0.4
cold sun 0.2
rain 0.6
cold rain 0.3
Normalization Trick
▪ Step 1: Select the ▪ Step 2: Sum out H to get joint of ▪ Step 3: Normalize
entries consistent Query and evidence (marginalize)
with the evidence
▪ Obvious problems:
▪ Worst-case time complexity O(dn)
▪ Space complexity O(dn) to store the joint distribution
The Product Rule
• Example:
D W P D W P
wet sun 0.1 wet sun 0.08
R P
dry sun 0.9 dry sun 0.72
sun 0.8
wet rain 0.7 wet rain 0.14
rain 0.2
dry rain 0.3 dry rain 0.06
The Product Rule
• Example:
D W P D W P
wet sun 0.1 wet sun 0.08
R P
dry sun 0.9 dry sun 0.72
sun 0.8
wet rain 0.7 wet rain 0.14
rain 0.2
dry rain 0.3 dry rain 0.06
The Chain Rule
• More generally, can always write any joint distribution as an incremental product of
conditional distributions
• Chain rule is the product rule applied multiple times, turning a joint probability into
conditional probabilities
Bayes’ Rule
• Two ways to factor a joint distribution over two variables:
• Dividing, we get:
• Example:
• M: meningitis, S: stiff neck
given
D W ഥ
P P
dry sun 0.9*0.8=0.72 0.9
dry rain 0.4*0.2=0.08 0.1
▪ Product rule
▪ Chain rule
▪ Bayes rule
Probability Summary
• Probability is a rigorous formalism for uncertain knowledge
• Joint probability distribution specifies probability of every atomic event
𝑃(𝑋, 𝑌)
• Queries can be answered by summing over atomic events
𝑃 𝑞, 𝑒 = 𝑃 𝑞, 𝑒, 𝐻 , 𝑍 = 𝑃 𝑄, 𝑒 , 𝑃 𝑞 𝑒 = 𝑃 𝑞, 𝑒 /𝑍
ℎ 𝑞
• For nontrivial domains, we must find a way to reduce the joint size
• How?
Independence
• Two variables are independent in a joint distribution if:
• Says the joint distribution factors into a product of two simple ones
• Absolute independence is very powerful but not so common
T P
hot 0.5
cold 0.5
T W P T W P
hot sun 0.4 hot sun 0.3
hot rain 0.1 hot rain 0.2
cold sun 0.2 cold sun 0.3
cold rain 0.3 cold rain 0.2
W P
sun 0.6
rain 0.4
Example: Independence
• N fair, independent coin flips:
• 32 entries reduced to 12
• Dentistry is a large field with hundreds of variables, none of which are independent.
What to do?
Conditional Independence
• P(Toothache, Cavity, Catch) has 23 – 1 = 7 independent entries
• If I have a cavity, the probability that the probe catches it doesn't depend on whether I have a toothache:
(1) P(catch | toothache, cavity) = P(catch | cavity)
• The same independence holds if I do not have a cavity:
(2) P(catch | toothache,cavity) = P(catch | cavity)
• Equivalent statements:
P(Toothache | Catch, Cavity) = P(Toothache | Cavity)
P(Toothache, Catch | Cavity) = P(Toothache | Cavity) P(Catch | Cavity)
One can be derived from the other easily
Conditional Independence
• Write out full joint distribution using chain rule:
P(Toothache, Catch, Cavity)
= P(Toothache | Catch, Cavity) P(Catch, Cavity)
= P(Toothache | Catch, Cavity) P(Catch | Cavity) P(Cavity)
= P(Toothache | Cavity) P(Catch | Cavity) P(Cavity) (conditional independence)
I.e., 2 + 2 + 1 = 5 independent numbers
• In most cases, the use of conditional independence reduces the size of the
representation of the joint distribution from exponential in n to linear in n.
Conditional Independence
• Unconditional (absolute) independence is very rare
• Conditional independence is our most basic and robust form of knowledge about
uncertain environments.
• X is conditionally independent of Y given Z
▪ Product rule
▪ Chain rule