0% found this document useful (0 votes)
2 views

8 - Probability

The document discusses the representation and quantification of uncertainty in AI, highlighting the importance of probabilistic reasoning to manage beliefs and knowledge. It covers various sources of uncertainty, decision-making under uncertainty, and the foundational concepts of probability, including random variables, joint distributions, and conditional probabilities. The material emphasizes the need for probability theory and utility theory in decision-making processes, particularly in complex real-world scenarios.

Uploaded by

caggllayan47
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

8 - Probability

The document discusses the representation and quantification of uncertainty in AI, highlighting the importance of probabilistic reasoning to manage beliefs and knowledge. It covers various sources of uncertainty, decision-making under uncertainty, and the foundational concepts of probability, including random variables, joint distributions, and conditional probabilities. The material emphasizes the need for probability theory and utility theory in decision-making processes, particularly in complex real-world scenarios.

Uploaded by

caggllayan47
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

COMP 341 Intro to AI

Representing and Quantifying Uncertainty

Asst. Prof. Barış Akgün


Koç University
COMP341 So Far
• Agents
• Search
• Uninformed
• Informed
• Adversarial
• Local Search
• CSPs
• Adversarial Search
Uncertainty
• Real world is uncertain
• Where is this uncertainty coming from?
• Partial Observability (e.g. fog of war, opponents' hand in poker, traffic)
• Noisy sensors (e.g. GPS, cameras in low light, traffic reports, sonar)
• Uncertain action outcomes (e.g. flat tire, wheel slip, object too heavy to lift)
• Unexpected Events (e.g. sudden car accident, earthquake, meteor hitting)
• Inherent Stochasticity (e.g. quantum physics) - this affects sensors and actuators as well
• Complexity of Modelling (e.g. market behavior, predicting traffic) – related to unexpected events and
other agents

• We need to represent and quantify uncertainty to be able to solve problems!


Uncertainty
• General situation:
• Observed variables (evidence): Agent knows certain things about the state of the world (e.g., sensor
readings or symptoms)

• Unobserved variables: Agent needs to reason about other aspects (e.g. where an object is or what
disease is present)

• Model: Agent knows something about how the known variables relate to the unknown variables

• Probabilistic reasoning gives us a framework for managing our beliefs and knowledge
Uncertainty
Let action At = leave for airport t minutes before flight
Will At get me there on time?
Some Problems:
1. partial observability (road state, other drivers' plans, etc.)
2. noisy sensors (traffic reports)
3. uncertainty in action outcomes (flat tire, etc.)
4. immense complexity of modeling and predicting traffic
If just TRUE/FALSE
1. risks falsehood: “A25 will get me there on time”, or
2. leads to conclusions that are too weak for decision making:
“A25 will get me there on time if there's no accident on the bridge and it doesn't rain and
my tires remain intact etc etc.” (also look up qualification problem)
“ A1440 might reasonably be said to get me there on time but I'd have to stay overnight in
the airport ”
Probability
• Cannot list all possible conditions for a given statement
• Cannot deduce the truth value for all the statements for sure
• Instead of absolute statements, use probability to summarize uncertainty
• Probabilities relate to the degree that an agent believes a statement to be true
𝑃(𝐴25 𝑛𝑜 𝑟𝑒𝑝𝑜𝑟𝑡𝑒𝑑 𝑎𝑐𝑐𝑖𝑑𝑒𝑛𝑡𝑠 = 0.06
• The probability changes with new information (evidence)
𝑃(𝐴25 𝑛𝑜 𝑟𝑒𝑝𝑜𝑟𝑡𝑒𝑑 𝑎𝑐𝑐𝑖𝑑𝑒𝑛𝑡𝑠, 5𝐴𝑀 = 0.15
• For this class, we treat probability statements as not assertions about the world but as
assertions about the knowledge state of the agent
Decision Making Under Uncertainty
• Which action would you chose given the following?
P(A25 gets me there on time | …) = 0.04
P(A90 gets me there on time | …) = 0.70
P(A120 gets me there on time | …) = 0.95
P(A1440 gets me there on time | …) = 0.9999

• Depends on preferences for missing flight vs. time spent waiting, etc. and willingness to
take risk
• How to represent these? Utilities! Not going to go into detail
• Utility theory is used to represent and infer preferences
• Decision theory = probability theory + utility theory

Example: Utility is exp(-t/500)


Will cover basics
Random Variables
• A random variable is some aspect of the world about which we are
uncertain of
• Cavity: Do I have a tooth cavity?
• Weather: How is the weather today?
• A: How long will it take me to drive to the airport
• D: Dice roll
• Random variables have domains (remember CSP variables!)
• Cavity: {true, false}
• Weather: {sunny, rain, cloudy, snow}
• A: [0, )
• D: {1,2,3,4,5,6}
• The domain of a random variable is also called the sample space

Leaving this slide for self-study for Spring2025


A Simple Notion of Probability
P(A): Fraction of all possible worlds where A is true

Worlds where A is false

Worlds where
A is true

Space of all possible worlds


Leaving this slide for self-study for Spring2025
Notation
• Let the set Ω be the sample space (e.g. 6 possible rolls of a dice)
• Let ω ϵ Ω be a sample point/possible world/atomic event (e.g. a roll of 3)
• A probability space/probability model is a sample space with an
assignment P(D = ω) for every ω ϵ Ω
• Shorthand P D = ω = P ω if all elements are unique
• An event A is any subset of Ω , P A = σω ϵ A P ω (e.g. rolling 3 or 6)
• The event space, F, is the power set of Ω
• Random variables are or start with capital letters (e.g. Weather)
• The values are or start with lower case letters (e.g. cloudy)

Leaving this slide for self-study for Spring2025


Probability Axioms
1. Probability of an event is a non-negative real number
P E ϵ ℝ, P E ≥ 0, ∀E ϵ F
2. Probability of the entire sample space is 1
P Ω =1
3. Probability of observing mutually exclusive events (aka disjoint sets) is additive
P ‫∞ڂ‬ ∞
i=1 Ei = σi=1 P Ei ,
where events Ei are mutually exclusive (i.e. they are disjoint sets)

Immediate consequences:
• σω ϵΩ P ω = 1 (from 2 and 3)
• 1 ≥ P E ≥ 0 (from 1,2 and 3)
• If A ⊆ B, P(A) ≤ P(B) (from 3)

Leaving this slide for self-study for Spring2025


Exercise
• Show that P ∅ = 0
Axiom 2: P Ω = 1
Axiom 3: P ‫∞ڂ‬ E
i=1 i = σ∞
i=1 P Ei

Any set is disjoint with the empty set, including other empty sets
Ω‫ = … ∅ڂ∅ڂ∅ڂ‬Ω
P(Ω‫ = ) … ∅ڂ∅ڂ∅ڂ‬P(Ω) = 1
P Ω +P ∅ +P ∅ +P ∅ +⋯=1
1+P ∅ +P ∅ +P ∅ +⋯=1
P ∅ =0

Leaving this slide for self-study for Spring2025


Exercise
• Show that P A‫ڂ‬B = P A + P B − P A⋂B
A and B\(A⋂B) are mutually exclusive where “\” is the
set subtraction: B\A = a ∈ B a ∉ A}.
Then (axiom 3):
P A, B\(A⋂B) = P A + P B\(A⋂B)
We also have:
P B = P B\(A⋂B) + P A⋂B
P B\(A⋂B) = P B − P A⋂B
Plug in:
P A, B\(A⋂B) = P A + P B − P A⋂B

Leaving this slide for self-study for Spring2025


Exercise
• Show that P(Ac ) = P F\A = 1 − P(A)
It is easy to see that the event and its complement are mutually exclusive
It is also easy to see that P F = P Ω = 1
P F = P(Ac ‫ = )𝐴ڂ‬P(Ac ) + P(A)
1 = P(Ac ) + P(A)
P(Ac ) = 1 − P(A)

Leaving this slide for self-study for Spring2025


Probability Distributions
• Associate a probability with each value

• Temperature: ▪ Weather:

W P
T P
sun 0.6
hot 0.5
rain 0.1
cold 0.5
fog 0.3
meteor 0.0
Prior Probability
• Prior or unconditional probabilities reflect agent’s belief prior to arrival of any (new) evidence

W P
T P sun 0.6
hot 0.5 rain 0.1
cold 0.5 fog 0.3
meteor 0.0

• Probability distributions, in the form of a table, gives values for all possible assignments
• They must sum up to 1
• Note that distributions can be continuous as well!
Continuous Variables
• Express distribution as a parameterized
function of value.
• Let f be a probability density function that
integrates to 1.
E.g. f x = U 18,26 x : Uniform density 1 − 𝑥−𝜇 2 /2𝜎2
E.g. f x = 𝑒 : Gaussian
2𝜋𝜎 2
between 18 and 26
distribution density function
Joint Probability Distributions
• A joint distribution over a set of random variables:
specifies a real number for each assignment (or outcome):

T W P
hot sun 0.4
• Must obey:
hot rain 0.1
cold sun 0.2
cold rain 0.3

• Every question about a domain can be answered by the joint distribution because every event
is a sum of sample points
• Size of distribution if n variables with domain sizes d?
• For all but the smallest distributions, impractical to write out!
Exercise: Joint Probabilities and Events
• P(+x, +y) ?
0.2
X Y P
• P(+x) ? +x +y 0.2
0.2+0.3 +x -y 0.3
-x +y 0.4
• P(-y OR +x) ? -x -y 0.1
P(-y ‫ ڂ‬+x) = P(-y)+P(+x) – P(-y ⋂+x)
(0.3+0.1)+(0.2+0.3)-0.3=0.6

• P(-y XOR +x) ? (exclusive OR)

Leaving this slide for self-study for Spring2025


Marginal Distributions
• Marginal distributions are sub-tables which eliminate variables
• Marginalization (summing out): Combine collapsed rows by adding

T P
hot 0.5
T W P
cold 0.5
hot sun 0.4
hot rain 0.1
cold sun 0.2 W P
cold rain 0.3 sun 0.6
rain 0.4
Conditional Probabilities
• Conditional or posterior probabilities
e.g., P(cavity | toothache) = 0.8
i.e., given that toothache is all I know NOT “if toothache then 80% chance of cavity”

• If we know more, e.g., cavity is also given, then we have


P(cavity | toothache,cavity) = 1

• New evidence may be irrelevant, allowing simplification, e.g.,


P(cavity | toothache, sunny) = P(cavity | toothache) = 0.8

• This kind of inference, sanctioned by domain knowledge, is crucial


Conditional Probabilities
• Definition of conditional probability: (P(b) ≠ 0)

P(a,b)

P(a) P(b)

T W P
hot sun 0.4
hot rain 0.1
cold sun 0.2
cold rain 0.3
Exercise: Conditional Probabilities
• P(+x | +y) ?
0.2/(0.2+0.4)=1/3

X Y P
+x +y 0.2 • P(-x | +y) ?
+x -y 0.3 0.4/(0.2+0.4)=2/3
-x +y 0.4
-x -y 0.1

• P(-y | +x) ?
0.3/(0.2+0.3)=0.6

Leaving this slide for self-study for Spring2025


Conditional Distributions
• Conditional distributions are probability distributions over
some variables given fixed values of others

Conditional Distribution given T


Joint Distribution

W P
T W P sun 0.8
hot sun 0.4 rain 0.2
hot rain 0.1
cold sun 0.2
cold rain 0.3 W P
sun 0.4
rain 0.6
Conditional Distributions

T W P
hot sun 0.4
W P
hot rain 0.1
sun 0.4
cold sun 0.2
rain 0.6
cold rain 0.3
Normalization Trick

SELECT the joint NORMALIZE the


probabilities selection
T W P matching the (make it sum to one)
hot sun 0.4 evidence T W P W P
hot rain 0.1 cold sun 0.2 sun 0.4
cold sun 0.2 cold rain 0.3 rain 0.6
cold rain 0.3
Normalization Trick

SELECT the joint NORMALIZE the


probabilities selection
T W P matching the (make it sum to one)
hot sun 0.4 evidence T W ഥ
P W P
hot rain 0.1 cold sun 0.2 sun 0.4
cold sun 0.2 cold rain 0.3 rain 0.6
cold rain 0.3

Why does this work? Sum of selection is P(evidence)! (P(T=c), here)


Exercise: Normalization Trick
• P(X | Y=-y) ?

SELECT the joint NORMALIZE the


probabilities selection
X Y P matching the (make it sum to one)
X ഥ
P
+x +y 0.2 evidence X P
+x -y 0.3 +x 0.3 +x 0.75
-x +y 0.4 -x 0.1 -x 0.25
-x -y 0.1
Exercise
• P(X | +y, -z)
X Y Z P
X Y Z ഥ
P P
+x +y +z 0.12
+x +y -z 0.18 0.6
+x +y -z 0.18
-x +y -z 0.12 0.4
+x -y +z 0.04
+x -y -z 0.16
• P(Y, Z | +x)
-x +y +z 0.18
-x +y -z 0.12 X Y Z ഥ
P P
-x -y +z 0.07 +x +y +z 0.12 0.24
-x -y -z 0.13 +x +y -z 0.18 0.36
+x -y +z 0.04 0.08
+x -y -z 0.16 0.32

Leaving this slide for self-study for Spring2025


Probabilistic Inference
• Problem relevant things are represented as random variables: X = {𝑋1 , 𝑋2 , … , 𝑋𝑛 }
• Observed variables (evidence): Some of the variables are observed (e.g., sensor
readings, symptoms): E = 𝐸1 , 𝐸2 , … , 𝐸𝑘 , E ⊂ X
• Unobserved variables: Remaining are unobserved
• Query variables: Variables of importance (e.g. where an object is, what disease is
present): Q = 𝑄1 , 𝑄2 , … , 𝑄𝑙 , Q ⊂ X (we will focus on a single query variable)
• Hidden variables: Other unobserved variables that are not the focus (e.g.
environment lighting, patient genetics) H = 𝐻1 , 𝐻2 , … , 𝐻𝑟 , H ⊂ X
• Model: Relationship between variables represented as a joint probability distribution:
P(𝑋1 , 𝑋2 , … , 𝑋𝑛 )
• Inference: Using the model and the evidence variables to reason about query variables:
Assuming a single query variable: P(𝑄|𝐸1 , 𝐸2 , … , 𝐸𝑘 )
Probabilistic Inference
• Problem relevant things are represented as random variables: X = {𝑋1 , 𝑋2 , … , 𝑋𝑛 }
• Observed variables (evidence): E = 𝐸1 , 𝐸2 , … , 𝐸𝑘 , E ⊂ X
• Query variable: 𝑄 ∈ X
• Hidden variables: H = 𝐻1 , 𝐻2 , … , 𝐻𝑟 , H ⊂ X
• Model: P(𝑋1 , 𝑋2 , … , 𝑋𝑛 )
• Inference: P(𝑄|𝐸1 , 𝐸2 , … , 𝐸𝑘 )
• How do we perform inference using probability rules?
Inference by Enumeration
• General case: ▪ We want:
• Evidence variables:
• Query* variable:
All variables
• Hidden variables:

▪ Step 1: Select the ▪ Step 2: Sum out H to get joint of ▪ Step 3: Normalize
entries consistent Query and evidence (marginalize)
with the evidence

* Works fine with multiple query variables, too


Inference by Enumeration
• P(W)? S T W P
• Step 1: All of the entries summer hot sun 0.30
• Step 2: Marginalize out S and T (all hidden no evidence) summer hot rain 0.05
W ഥ
P P summer cold sun 0.10
sun 0.3+0.1+0.1+0.15 0.65 summer cold rain 0.05
rain 0.05+0.05+0.05+0.2 0.35 winter hot sun 0.10
• Step 3: Normalize (actually no need for this case) winter hot rain 0.05
winter cold sun 0.15
winter cold rain 0.20
Inference by Enumeration
S T W P
• P(W | winter)?
• Step 1: Rows with S=winter W ഥ
P P summer hot sun 0.30
• Step 2: Marginalize out T (hidden) sun 0.1+0.15 0.5 summer hot rain 0.05
• Step 3: Normalize rain 0.05+0.2 0.5 summer cold sun 0.10
summer cold rain 0.05
𝑃 𝑊 = 𝑠𝑢𝑛 𝑆 = 𝑤𝑖𝑛𝑡𝑒𝑟 =
𝑃 𝑊 = 𝑠𝑢𝑛, 𝑇 = ℎ𝑜𝑡 𝑆 = 𝑤𝑖𝑛𝑡𝑒𝑟) + 𝑃 𝑊 = 𝑠𝑢𝑛, 𝑇 = 𝑐𝑜𝑙𝑑 𝑆 = 𝑤𝑖𝑛𝑡𝑒𝑟) winter hot sun 0.10
= 𝛼(0.1 + 0.15) winter hot rain 0.05
𝑃 𝑊 = 𝑟𝑎𝑖𝑛 𝑆 = 𝑤𝑖𝑛𝑡𝑒𝑟 =
𝑃 𝑊 = 𝑟𝑎𝑖𝑛, 𝑇 = ℎ𝑜𝑡 𝑆 = 𝑤𝑖𝑛𝑡𝑒𝑟) + 𝑃 𝑊 = 𝑟𝑎𝑖𝑛, 𝑇 = 𝑐𝑜𝑙𝑑 𝑆 = 𝑤𝑖𝑛𝑡𝑒𝑟) winter cold sun 0.15
= 𝛼(0.05 + 0.2) winter cold rain 0.20

• P(W | winter, hot)? W ഥ


P P
• Step 1: Rows with S=winter, T = hot
sun 0.1 2/3
• Step 2: No hidden variables
• Step 3: Normalize rain 0.05 1/3
Inference by Enumeration

▪ Obvious problems:
▪ Worst-case time complexity O(dn)
▪ Space complexity O(dn) to store the joint distribution
The Product Rule

• Sometimes we have the conditional distributions but we want the joint


The Product Rule

• Example:

D W P D W P
wet sun 0.1 wet sun 0.08
R P
dry sun 0.9 dry sun 0.72
sun 0.8
wet rain 0.7 wet rain 0.14
rain 0.2
dry rain 0.3 dry rain 0.06
The Product Rule

• Example:

D W P D W P
wet sun 0.1 wet sun 0.08
R P
dry sun 0.9 dry sun 0.72
sun 0.8
wet rain 0.7 wet rain 0.14
rain 0.2
dry rain 0.3 dry rain 0.06
The Chain Rule
• More generally, can always write any joint distribution as an incremental product of
conditional distributions

• Chain rule is the product rule applied multiple times, turning a joint probability into
conditional probabilities
Bayes’ Rule
• Two ways to factor a joint distribution over two variables:

• Dividing, we get:

• Why is this helpful?


• Let’s us build one conditional from its reverse
• Often one conditional is tricky but the other one is simple
• Foundation of many systems we’ll see later
Inference with Bayes’ Rule
• Example: Diagnostic probability from causal probability:

• Example:
• M: meningitis, S: stiff neck

given

• Note: posterior probability of meningitis still very small


• Note: you should still get stiff necks checked out! Why?
Exercise: Bayes’ Rule
• Given:
D W P
wet sun 0.1
R P
dry sun 0.9
sun 0.8
wet rain 0.6
rain 0.2
dry rain 0.4

• What is P(W | dry) ?

D W ഥ
P P
dry sun 0.9*0.8=0.72 0.9
dry rain 0.4*0.2=0.08 0.1

Leaving this slide for self-study for Spring2025


Probability Rules
▪ Conditional probability

▪ Product rule

▪ Chain rule

▪ Bayes rule
Probability Summary
• Probability is a rigorous formalism for uncertain knowledge
• Joint probability distribution specifies probability of every atomic event
𝑃(𝑋, 𝑌)
• Queries can be answered by summing over atomic events

𝑃 𝑞, 𝑒 = ෍ 𝑃 𝑞, 𝑒, 𝐻 , 𝑍 = ෍ 𝑃 𝑄, 𝑒 , 𝑃 𝑞 𝑒 = 𝑃 𝑞, 𝑒 /𝑍
ℎ 𝑞

• For nontrivial domains, we must find a way to reduce the joint size
• How?
Independence
• Two variables are independent in a joint distribution if:

• Says the joint distribution factors into a product of two simple ones
• Absolute independence is very powerful but not so common

• Can use independence as a modeling assumption


• Independence can be a simplifying assumption
• Empirical joint distributions: at best “close” to independent
• What could we assume for {Weather, Traffic, Cavity}?
Example: Independence?

T P
hot 0.5
cold 0.5
T W P T W P
hot sun 0.4 hot sun 0.3
hot rain 0.1 hot rain 0.2
cold sun 0.2 cold sun 0.3
cold rain 0.3 cold rain 0.2
W P
sun 0.6
rain 0.4
Example: Independence
• N fair, independent coin flips:

H 0.5 H 0.5 H 0.5


T 0.5 T 0.5 T 0.5

From O(2n) →O(n)


Independence

P(Toothache, Catch, Cavity, Weather)= P(Toothache, Catch, Cavity) P(Weather)

• 32 entries reduced to 12

• Dentistry is a large field with hundreds of variables, none of which are independent.
What to do?
Conditional Independence
• P(Toothache, Cavity, Catch) has 23 – 1 = 7 independent entries
• If I have a cavity, the probability that the probe catches it doesn't depend on whether I have a toothache:
(1) P(catch | toothache, cavity) = P(catch | cavity)
• The same independence holds if I do not have a cavity:
(2) P(catch | toothache,cavity) = P(catch | cavity)

• Catch is conditionally independent of Toothache given Cavity:


P(Catch | Toothache, Cavity) = P(Catch | Cavity)

• Equivalent statements:
P(Toothache | Catch, Cavity) = P(Toothache | Cavity)
P(Toothache, Catch | Cavity) = P(Toothache | Cavity) P(Catch | Cavity)
One can be derived from the other easily
Conditional Independence
• Write out full joint distribution using chain rule:
P(Toothache, Catch, Cavity)
= P(Toothache | Catch, Cavity) P(Catch, Cavity)
= P(Toothache | Catch, Cavity) P(Catch | Cavity) P(Cavity)
= P(Toothache | Cavity) P(Catch | Cavity) P(Cavity) (conditional independence)
I.e., 2 + 2 + 1 = 5 independent numbers

• In most cases, the use of conditional independence reduces the size of the
representation of the joint distribution from exponential in n to linear in n.
Conditional Independence
• Unconditional (absolute) independence is very rare
• Conditional independence is our most basic and robust form of knowledge about
uncertain environments.
• X is conditionally independent of Y given Z

if and only if:

or, equivalently, if and only if


A Special Case of Conditional Independence
• Let’s assume “effects” are conditionally independent given the cause

P(Cavity | toothache  catch)


= αP(toothache  catch | Cavity) P(Cavity)
= αP(toothache | Cavity) P(catch | Cavity) P(Cavity)
• Following is an example of a naïve Bayes model:
1/α
P(Cause,Effect1, … ,Effectn) = P(Cause) ς𝑖 P(Effecti|Cause)
• Total number of parameters is linear in n
• Note that not all models are like this
Probability Recap
▪ Conditional probability

▪ Product rule

▪ Chain rule

▪ X, Y independent if and only if:

▪ X and Y are conditionally independent given Z if and only if:


Probability Summary
• Probability is a rigorous formalism for uncertain knowledge
• Joint probability distribution specifies probability of every atomic event
• Queries can be answered by summing over atomic events
• For nontrivial domains, we must find a way to reduce the joint size
• Independence and conditional independence provide the tools

You might also like