0% found this document useful (0 votes)
35 views

Module-5 complete notes-Quantifying Uncertainty 20th February 2024

Module-5 complete notes-Quantifying Uncertainty 20th February 2024

Uploaded by

sandeshssanshi7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views

Module-5 complete notes-Quantifying Uncertainty 20th February 2024

Module-5 complete notes-Quantifying Uncertainty 20th February 2024

Uploaded by

sandeshssanshi7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 66

Fundamentals of Artificial Intelligence

Chapter 13: Quantifying Uncertainty

Dr AJ KAMESWARA PRASAD
PROFESSOR
Department of Acharya Institute of Technology
Bangalore
Copyright notice: Most examples and images displayed in the slides of this course are taken from [Russell &
Norwig, “Artificial Intelligence, a Modern Approach”, 3rd ed., Pearson], including explicitly figures from the above-
mentioned book, so that their copyright is detained by the authors. A few other material, included AJK detain its
copyright.
These slides cannot can be displayed in public without the permission of the author. Following fair use doctrine
these slides shall be shared to students

1 / 44
Fundamentals of Artificial Intelligence
Chapter 13: Quantifying Uncertainty

Module-5
Uncertain Knowledge and Reasoning: Quantifying Uncertainty: Acting
under Uncertainty, Basic Probability Notation, Inference using Full Joint
Distributions, Independence, Baye’s Rule and its use. Wumpus World
Revisited Text Book 1: Chapter 13-13.1, 13.2, 13.3, 13.4, 13.5, 13.6

2 / 44
Module-5
Syllabus Uncertain Knowledge and Reasoning:
Quantifying Uncertainty: Acting under Uncertainty,
Basic Probability Notation, Inference using Full Joint
Distributions, Independence, Baye’s Rule and its use.
Wumpus World Revisited
Outline

1 Acting Under Uncertainty

2 Basics on Probability

3 Probabilistic Inference via Enumeration

4 Independence and Conditional Independence

5 Applying Bayes’ Rule

6 An Example: The Wumpus World Revisited

4 / 44
Outline

1 Acting Under Uncertainty

2 Basics on Probability

3 Probabilistic Inference via Enumeration

4 Independence and Conditional Independence

5 Applying Bayes’ Rule

6 An Example: The Wumpus World Revisited

5 / 44
Acting Under Uncertainty
Agents often make decisions based on incomplete information
partial observability
nondeterministic actions
Partial solution (see previous chapters): maintain belief states
represent the set of all possible world states the agent might be
in generating a contingency plan handling every possible
eventuality
Several drawbacks:
must consider every possible explanation for the observation (even
very-unlikely ones)
=⇒ impossibly complex belief-states
contingent plans handling every eventuality grow arbitrarily large
sometimes there is no plan that is guaranteed to achieve the
goal
Agent’s knowledge cannot guarantee a successful outcome ...
... but can provide some degree of belief (likelihood) on it
A rational decision depends on both the relative importance of (sub)goals and the likelihood
6 / 44
Acting Under Uncertainty: Example
Automated taxi to Airport
Goal: deliver a passenger to the airport on time
Action At : leave for airport t minutes before flight
How can we be sure that A90 will succeed?
Too many sources of uncertainty:
partial observability (ex: road state, other drivers’ plans, etc.)
uncertainty in action outcome (ex: flat tire, etc.)
noisy sensors (ex: unreliable traffic reports)
complexity of modelling and predicting traffic
=⇒ With purely-logical approach it is difficult to
anticipate everything that can go wrong
risks falsehood: “A25 will get me there on
time” or
leads to conclusions that are too weak for
decision making:
“A25 will get me there on time if there’s no accident on the bridge , and it doesn’t rain and my tires
remain intact , and...”
7 / 44
Acting Under Uncertainty: Example (2)

A medical diagnosis
Given the symptoms (toothache) infer the cause (cavity)
How to encode this relation in logic?
diagnostic rules:
Toothache → Cavity (wrong)
Toothache → (Cavity ∨ GumProblem ∨ Abscess
∨ ...) (too many possible causes, some very unlikely)
causal rules:
Cavity → Toothache (wrong)
(Cavity ∧ ...) → Toothache (many possible
(con)causes)
Problems in specifying the correct logical rules:
Complexity: too many possible antecedents or
consequents Theoretical ignorance: no complete theory for
the domain Practical ignorance: no complete knowledge of
the patient
8 / 44
Summarizing Uncertainty

Probability allows to summarize the uncertainty on effects of


laziness: failure to enumerate exceptions, qualifications, etc.
ignorance: lack of relevant facts, initial conditions, etc.
Probability can be derived from
statistical data (ex: 80% of toothache patients so far had cavities)
some knowledge (ex: 80% of toothache patients has cavities)
their combination thereof
Probability statements are made with respect to a state of knowledge (aka evidence), not
with respect to the real world
e.g., “The probability that the patient has a cavity, given that she has a toothache, is 0.8”:
P(HasCavity (patient ) | hasToothAche(patient)) = 0.8
Probabilities of propositions change with new evidence:
“The probability that the patient has a cavity, given that she has a toothache and a history of gum
disease, is 0.4”:
P(HasCavity (patient ) | hasToothAche(patient ) ∧ HistoryOfGum(patient )) = 0.4

9 / 44
Making Decisions Under Uncertainty

Ex: Suppose I believe:


P(A25 gets me there on time |...) = 0.04
P(A90 gets me there on time |...) = 0.70
P(A120 gets me there on time |...) = 0.95
P(A1440 gets me there on time |...) =
0.9999 Which action to choose?
=⇒ Depends on tradeoffs among
preferences:
missing flight vs. costs (airport cuisine,
sleep overnight in airport)
When there are conflicting goals the agent may express preferences among them by means
of a utility function.
Utilities are combined with probabilities in the general theory of rational decisions, aka
decision theory:
Decision theory = Probability theory + Utility theory
Maximum Expected Utility (MEU): an agent is rational if and only if it chooses the action that
yields the maximum expected utility, averaged over all the possible outcomes of the action. 10 / 44
Outline

1 Acting Under Uncertainty

2 Basics on Probability

3 Probabilistic Inference via Enumeration

4 Independence and Conditional Independence

5 Applying Bayes’ Rule

6 An Example: The Wumpus World Revisited

11 / 44
Probabilities Basics: an Artificial Intelligentish Introduction
Probabilistic assertions: state how likely possible worlds are
Sample space Ω: the set of all possible worlds
ω ∈ Ω is a possible world (aka sample point or atomic
event) ex: the dice roll (1,4)
the possible worlds are mutually exclusive and exhaustive
ex: the 36 possible outcomes of rolling two dice: (1,1),
(1,2), ...
A probability model (aka probability space) is a sample space with an assignment P(ω) for
every ω ∈ Ω s.t.
0 ≤ P(ω) ≤ 1, for every ω ∈
Ω Σω∈ΩP(ω) = 1
Ex: 1-die roll: P(1) = P(2) =
P(3) = P(4) = P(5) = P(6) =
1/6
An Event A is any subset of Ω,
s.t. P(A) = Σω∈AP(ω)
events can be described by propositions in some formal 12 / 44
Random Variables

Factored representation of possible worlds: sets of ⟨variable, value⟩ pairs


Variables in probability theory: Random variables
domain: the set of possible values a variable can take on
ex: Die: {1, 2, 3, 4, 5, 6}, Weather: {sunny, rain, cloudy, snow}, Odd:
{true, false}
a r.v. can be seen as a function from sample points to the
domain: ex: Die(ω), Weather (ω),... (“(ω)” typically omitted)
Probability Distribution gives the probabilities of all the possible
values of a random variable
X : P(X = xi ) =def Σ P(ω)
ω∈X (ω)
ex: P(Odd = true) = P(1) + P(3) + P(5) = 1/6 + 1/6 + 1/6 =
1/2

13 / 44
Propositions and Probabilities

We think a proposition a as the event A (set of sample points) where the proposition is true
Odd is a propositional random variable of range {true, false}
notation: a ⇐ ⇒ “A = true′′
Given Boolean random variables A
and B:
a: set of sample points where A(ω) =
true
¬a: set of sample points where A(ω) =
false
a ∧ b: set of sample points where A(ω)
= true, B(ω) = true
=⇒ with Boolean random variables, sample points are PL models
Proposition: disjunction of the sample points in which it is true
ex: (a ∨ b) ≡ (¬a ∧ b) ∨ (a ∧ ¬b) ∨ (a ∧ b)
=⇒ P(a ∨ b) = P(¬a ∧ b) + P(a ∧ ¬b) + P(a ∧ b)
Some derived facts:
P(¬a) = 1 − P(a) 14 / 44
Probability Distributions

Probability Distribution gives the probabilities of all the possible values of a random variable
ex: Weather: {sunny, rain, cloudy, snow}

 P(Weather = sunny ) = 0.6 

P(Weather = rain) = 0.1
=⇒ P(Weather ) = (0.6, 0.1, 0.29, 0.01)
P(Weather = =
⇐⇒  cloudy ) P(Weather 0.29 
normalized: their sum is 1 = snow ) =
Joint Probability Distribution for multiple variables 0.01
gives the probability of every sample point 
Weather = sunny rain snow
ex: P(Weather, Cavity ) cloudy Cavity = true
0.144 0.02 0.016 0.02
= Cavity = false 0.576 0.08 0.064 0.08
Every event is a sum of sample points,
=⇒ its probability is determined by the joint distribution

15 / 44
Probability for Continuous Variables
Express continuous probability distributions:
∫+∞
density functions f (x ) ∈ [0, 1] s.t− ∞ f (x )dx =
P(x ∈1 [a, b]) =∫ b f (x )
a
dx
=⇒ P(x ∈ [val, val]) = 0, P(x ∈ [−∞, +∞])

= 1 ex: P(x ∈ [20, 22]) = 22 0.125 dx =
20
0.25 P(x ) = P(X = x ) =def lim P(X ∈ [x, x + dx
Density: '→0
dx
])/dxex: P(20.1) = limdx'→0 P(X ∈ [20.1, 20.1 + dx ])/dx =
0.125
note: P(v ) /= P(x ∈ [v, v ]) = 0

(© S. Russell & P. Norwig, AIMA)


16 / 44
Conditional Probabilities
Unconditional or prior probabilities refer to degrees of belief in propositions in the absence of
any other information (evidence)
ex: P(cavity ) = 0.2, P(Total = 11) = 1/18, P(double) = 1/6
Conditional or posterior probabilities refer to degrees of belief in proposition a given some
evidence b: P(a|b)
evidence: information already revealed
ex: P(cavity|toothache) = 0.6: p. of a cavity given a toothache (assuming no other information
is provided!)
ex: P(Total = 11|die1 = 5) = 1/6: p. of total 11 given first die is 5
=⇒ restricts the set of possible worlds to those where the first die is 5
Note: P(a|... ∧ a) = 1, P(a|... ∧ ¬a) = 0
ex: P(cavity|toothache ∧ cavity ) = 1, P(cavity|toothache ∧ ¬cavity ) = 0
Less specific belief still valid after more evidence arrives
ex: P(cavity ) = 0.2 holds even if P(cavity|toothache) = 0.6
New evidence may be irrelevant, allowing for simplification
ex: P(cavity|toothache, 49ersWin) = P(cavity|toothache)
= 0.8
17 / 44
Conditional Probabilities [cont.]
P(a∧b)
Conditional probability: P(a|b) =def P(b) , s.t. P(b) >
0 ex: P(Total = 11|die = 5) = P(Total=11∧die1 =5) = 1/6·1/6 = 1/6
1
P(die1 =5) 1/6
observing b restricts the possible worlds to those where b is true
Production rule: P(a ∧ b) = P(a|b) · P(b) = P(b|a) · P(a)
Production rule for whole distributions: P(X, Y ) = P(X|Y ) · P(Y
)
ex: P(Weather, Cavity ) = P(Weather|Cavity )P(Cavity ),
that is:
P(sunny, cavity ) = P(sunny|cavity )P(cavity )
...
P(snow, ¬cavity ) = P(snow|¬cavity )P(¬cavity )
a 4 × 2 set of equations, not matrix multiplication!
Chain rule is derived by successive application of product
rule:
=Q ..
P(X1, ..., Xn )
.= ni=1 P(Xi |X1, ...,
= P(X1, ..., Xn−1)P(Xn|X1, ..., Xn−1)
X i −1 )
= P(X1, ..., Xn−2)P(Xn−1|X1, ..., Xn−2)P(Xn|X1, ..., Xn−1)
18 / 44
Logic vs. Probability

Logic Probability
a P(a) = 1
¬a P(a) = 0
a→ b P(b|a) = 1
(a, a P(a) = 1, P(b|a) = 1
→ b)
b P(b) = 1
(a → b, b → c) P(b|a) = 1, P(c|b) = 1
a→ c P(c|a) = 1
Proof of P(b|a) = 1, P(c|b) = 1 =⇒ P(c|a) =
def
1 P(b|a) = 1 =⇒ P(¬b, a) = P(¬b|a)P(a) =
def
0 P(c|b) = 1 =⇒ P(¬c, b) = P(¬c|b)P(b) =
P(¬c, a) = P(¬c, a, b) + P(¬c, a, ¬b) ≤ P(¬c, b) + P(a,
0
¬b) = 0 x
P(¬c|a) = P(¬c, a)/P(a) =
0
P(c|a) = 1 − P(¬c|a) = 1
(Courtesy of Maria Simi, UniPI)

19 / 44
Outline

1 Acting Under Uncertainty

2 Basics on Probability

3 Probabilistic Inference via Enumeration

4 Independence and Conditional Independence

5 Applying Bayes’ Rule

6 An Example: The Wumpus World Revisited

20 / 44
Probabilistic Inference via Enumeration

Basic Ideas
Start with the joint distribution P(Toothache, Catch, Cavity )
For any proposition ϕ, sum the atomic events where ϕ is true: P(ϕ) = Σω : ω|=ϕP(ω)

21 / 44
Probabilistic Inference via Enumeration: Example

Example: Generic Inference


Start with the joint distribution P(Toothache, Catch, Cavity )
For any proposition ϕ, sum the atomic events where ϕ is true: P(ϕ) = Σω : ω|=ϕP(ω):
Ex: P(cavity ∨ toothache) = 0.108 + 0.012 + 0.072 + 0.008 + 0.016 + 0.064 =
0.28

(© S. Russell & P. Norwig, AIMA)

20 / 44
Probabilistic Inference via Enumeration: Example

Example: Generic Inference


Start with the joint distribution P(Toothache, Catch, Cavity )
For any proposition ϕ, sum the atomic events where ϕ is true: P(ϕ) = Σω : ω|=ϕP(ω):
Ex: P(cavity ∨ toothache) = 0.108 + 0.012 + 0.072 + 0.008 + 0.016 + 0.064 =
0.28

(© S. Russell & P. Norwig, AIMA)

20 / 44
Marginalization

Start with the joint distribution P(Toothache, Catch, Cavity )


Marginalization (aka summing out):
sum up the probabilities for each possible value of the other variables:
Σ
P(Y) = z∈Z P(Y, z)
Σ
Ex: P(Toothache) =
z∈{Catch,Cavity} P(Toothache, z)
Conditioning: variant of marginalization, involving conditional probabilities instead of joint
probabilities (using the product rule)
Σ
P(Y) = z∈Z P(Y|z)P(z)
Σ
Ex: P(Toothache) = z∈{Catch,Cavity} P(Toothache|z)P(z)

21 / 44
Marginalization: Example
Start with the joint distribution P(Toothache, Catch, Cavity )
Marginalization (aka summing out):
sum up the probabilities for each possible value of the other variables:
Σ
P(Y) = z∈Z P(Y, z)
Σ
Ex: P(Toothache) =
z∈{Catch,Cavity} P(Toothache, z)
P(toothache) = 0.108 + 0.012 + 0.016 + 0.064 =
0.2
P(¬toothache) = 1 − P(toothache) = 1 − 0.2 = 0.8
=⇒ P(Toothache) = ⟨0.2, 0.8⟩

(© S. Russell & P. Norwig, AIMA)

22 / 44
Marginalization: Example
Start with the joint distribution P(Toothache, Catch, Cavity )
Marginalization (aka summing out):
sum up the probabilities for each possible value of the other variables:
Σ
P(Y) = z∈Z P(Y, z)
Σ
Ex: P(Toothache) =
z∈{Catch,Cavity} P(Toothache, z)
P(toothache) = 0.108 + 0.012 + 0.016 + 0.064 =
0.2
P(¬toothache) = 1 − P(toothache) = 1 − 0.2 = 0.8
=⇒ P(Toothache) = ⟨0.2, 0.8⟩

(© S. Russell & P. Norwig, AIMA)

22 / 44
Conditional Probability via Enumeration: Example

Start with the joint distribution P(Toothache, Catch,


Cavity )
Conditional Probability: = P(¬cavity∧toothache)
Ex: P(¬cavity|toothache) P(toothache)
0.016+0.064
= 0 .108+0.012+0.016+0.064
= 0.4
P(cavity∧toothache)
Ex: P(cavity|toothache) = P(toothache) = ... = 0.6

(© S. Russell & P. Norwig, AIMA)

23 / 44
Conditional Probability via Enumeration: Example

Start with the joint distribution P(Toothache, Catch,


Cavity )
Conditional Probability: = P(¬cavity∧toothache)
Ex: P(¬cavity|toothache) P(toothache)
0.016+0.064
= 0 .108+0.012+0.016+0.064
= 0.4
P(cavity∧toothache)
Ex: P(cavity|toothache) = P(toothache) = ... = 0.6

(© S. Russell & P. Norwig, AIMA)

23 / 44
Conditional Probability via Enumeration: Example

Start with the joint distribution P(Toothache, Catch,


Cavity )
Conditional Probability: = P(¬cavity∧toothache)
Ex: P(¬cavity|toothache) P(toothache)
0.016+0.064
= 0 .108+0.012+0.016+0.064
= 0.4
P(cavity∧toothache)
Ex: P(cavity|toothache) = P(toothache) = ... = 0.6

(© S. Russell & P. Norwig, AIMA)

23 / 44
Normalization

Let X be all the variables. Typically, we want P(Y|E = e):


the conditional joint distribution of the query variables Y
given specific values e for the evidence variables E
def
let the hidden variables be H = X \ (Y ∪ E)
The summation of joint entries is done by summing out the hidden variables:
P(Y|E = e) = αP(Y, E = e) = αΣh∈HP(Y, E = e, H = h)
where α def
1/P(E = e) (different α’s for different values of e)
=
=⇒ it is easy to compute α by normalization
note: the terms in the summation are joint entries,
because Y, E, H together exhaust the set of random variables X
Idea: compute whole distribution on query variable by:
fixing evidence variables and summing over hidden
Σ
variables normalize the final distribution, so that ... = 1
Complexity: O(2 ), n number of propositions =⇒ impractical
n

for large n’s


24 / 44
Normalization: Example
def
α = 1/P(toothache) can be viewed as a normalization constant
Idea: compute whole distribution on query variable by:
fixing evidence variables and summing over hidden
Σ
variables normalize the final distribution, so that ... = 1
Ex:
P(Cavity|toothache) = αP(Cavity ∧ toothache)
= α[P(Cavity, toothache, catch) + P(Cavity, toothache,
¬catch)]
= α[⟨0.108, 0.016⟩ + ⟨0.012, 0.064⟩]
= α⟨0.12, 0.08⟩ = (normalization) = ⟨0.6, 0.4⟩ [α = 5]
P(Cavity|¬toothache) = ... = α⟨0.08, 0.72⟩ = ⟨0.1, 0.9⟩[α
= 1.25]

(© S. Russell & P. Norwig, AIMA)


25 / 44
Normalization: Example
def
α = 1/P(toothache) can be viewed as a normalization constant
Idea: compute whole distribution on query variable by:
fixing evidence variables and summing over hidden
Σ
variables normalize the final distribution, so that ... = 1
Ex:
P(Cavity|toothache) = αP(Cavity ∧ toothache)
= α[P(Cavity, toothache, catch) + P(Cavity, toothache,
¬catch)]
= α[⟨0.108, 0.016⟩ + ⟨0.012, 0.064⟩]
= α⟨0.12, 0.08⟩ = (normalization) = ⟨0.6, 0.4⟩ [α = 5]
P(Cavity|¬toothache) = ... = α⟨0.08, 0.72⟩ = ⟨0.1, 0.9⟩[α
= 1.25]

(© S. Russell & P. Norwig, AIMA)


25 / 44
Normalization: Example
def
α = 1/P(toothache) can be viewed as a normalization constant
Idea: compute whole distribution on query variable by:
fixing evidence variables and summing over hidden
Σ
variables normalize the final distribution, so that ... = 1
Ex:
P(Cavity|toothache) = αP(Cavity ∧ toothache)
= α[P(Cavity, toothache, catch) + P(Cavity, toothache,
¬catch)]
= α[⟨0.108, 0.016⟩ + ⟨0.012, 0.064⟩]
= α⟨0.12, 0.08⟩ = (normalization) = ⟨0.6, 0.4⟩ [α = 5]
P(Cavity|¬toothache) = ... = α⟨0.08, 0.72⟩ = ⟨0.1, 0.9⟩[α
= 1.25]

(© S. Russell & P. Norwig, AIMA)


25 / 44
Outline

1 Acting Under Uncertainty

2 Basics on Probability

3 Probabilistic Inference via Enumeration

4 Independence and Conditional Independence

5 Applying Bayes’ Rule

6 An Example: The Wumpus World Revisited

26 / 44
Independence
Variables X and Y are independent iff P(X, Y ) = P(X )P(Y )
(or equivalently, iff P(X|Y ) = P(X ) or P(Y|X ) = P(Y ))
ex: P(Toothache, Catch, Cavity, Weather ) = P(Toothache, Catch,
Cavity )P(Weather )
=⇒ e.g. P(toothache, catch, cavity, cloudy ) = P(toothache, catch, cavity )P(cloudy )
typically based on domain knowledge
May drastically reduce the number of entries and computation
=⇒ ex: 32-element table decomposed into one 8-element and one 4-element table
Unfortunately, absolute independence is quite rare

(© S. Russell & P. Norwig, AIMA)

27 / 44
Conditional Independence

Variables X and Y are conditionally independent given Z iff P(X, Y|Z) = P(X|Z)P(Y|Z)
(or equivalently, iff P(X|Y, Z) = P(X|Z) or P(Y|X, Z) = P(Y|Z))
Consider P(Toothache, Cavity, Catch)
if I have a cavity, the probability that the probe catches in it doesn’t depend on whether I have
a toothache: P(catch|toothache, cavity ) = P(catch|cavity )
the same independence holds if I haven’t got a cavity:
P(catch|toothache, ¬cavity ) = P(catch|¬cavity )
=⇒ Catch is conditionally independent of Toothache given Cavity:
P(Catch|Toothache, Cavity ) = P(Catch|Cavity )
or, equivalently: P(Toothache|Catch, Cavity ) = P(Toothache|Cavity ), or
P(Toothache, Catch|Cavity ) = P(Toothache|Cavity )P(Catch|Cavity )
Hint: Toothache and Catch are two (mutually-independent) effects of the same cause
Cavity

28 / 44
Conditional Independence [cont.]

In many cases, the use of conditional independence reduces the size of the representation
of the joint distribution dramatically
even from exponential to linear!
P(Toothache, Catch, Cavity )
= P(Toothache|Catch, Cavity )P(Catch, Cavity )
Ex:
= P(Toothache|Catch, Cavity )P(Catch|
Cavity )P(Cavity )
= P(Toothache|Cavity
=⇒ Passes )P(Catch|Cavity
from 7 to 2+2+1=5 independent )P(Cavity )
numbers
P(Toothache, Catch, Cavity ) contains
Σ 7 independent
8th can be obtained as 1 −
entries
(the
...)
P(Toothache|Cavity ),P(Catch|Cavity ) contain 2 independent entries (2 × 2 matrix, each
row sums to 1)
P(Cavity ) contains 1 independent entry
General Case: if one causes has n independent
Q effects:
P(Cause, Effect1 , ..., Effectn ) = P(Cause) i P(Effecti |Cause)
=⇒ reduces from 2n+1 − 1 to 2n + 1 independent entries

29 / 44
Exercise

Consider the joint probability distribution described in the table in previous section (slide 20
onwards): P(Toothache, Catch, Cavity )
Consider the example in previous slide:
P(Toothache, Catch, Cavity )
= P(Toothache|Catch, Cavity )P(Catch, Cavity )
= P(Toothache|Catch, Cavity )P(Catch|Cavity )P(Cavity )
= P(Toothache|Cavity )P(Catch|Cavity )P(Cavity )
Compute separately the distributions P(Toothache|Catch, Cavity ), P(Catch|Cavity ),
P(Cavity ), P(Toothache|Cavity ).
Recompute P(Toothache, Catch, Cavity ) in two
ways: P(Toothache|Catch, Cavity )P(Catch|
Cavity )P(Cavity ) P(Toothache|Cavity )P(Catch|Cavity
)P(Cavity )
and compare the result with P(Toothache, Catch, Cavity )

30 / 44
Outline

1 Acting Under Uncertainty

2 Basics on Probability

3 Probabilistic Inference via Enumeration

4 Independence and Conditional Independence

5 Applying Bayes’ Rule

6 An Example: The Wumpus World Revisited

31 / 44
Bayes’ Rule

Bayes’ Rule/Theorem/Law
P(a ∧ b) P(b|a)P(a)
Bayes’ rule: P(a|b) = =
P(b)
P(b) P(X|Y )P(Y )
In distribution form P(Y|X ) = P(X = αP(X|Y )P(Y )
def )
α = 1/P(X ): normalization constant to make P(Y|X ) entries sum to
1 (different α′s for different values of X )
A version conditionalized on some background evidence e:

P(X|Y, e)P(Y|e)
P(Y|X, e) =
P(X|e)

32 / 44
Using Bayes’ Rule: The Simple Case
Used to assess diagnostic probability from causal probability:

P(effect|cause)P(cause)
P(cause|effect ) =
P(effect )
P(cause|effect ) goes from effect to cause (diagnostic direction)
P(effect|cause) goes from cause to effect (causal direction)

Example
An expert doctor is likely to have causal knowledge ... P(symptoms|disease)
(i.e., P(effect|cause))
... and needs producing diagnostic knowledge P(disease|symptoms) (i.e., P(cause|effect ))
Ex: let m be meningitis, s be stiff neck
P(m) = 1/50000, P(s) = 0.01 (prior knowledge, from statistics)
“meningitis causes to the0.7
P(s|m)P(m) patient a stiff neck in 70% of cases”: P(s|m) = 0.7 (doctor’s
· 1/50000
=⇒ P(m|s) =
experience) = = 0.0014
P(s) 0.01
33 / 44
Using Bayes’ Rule: Combining Evidence
A naive Bayes model is a probability model that assumes the effects are conditionally
independent, given the cause
Q
=⇒ P(Cause, Effect1 , ..., Effect
n ) = P(Cause) i P(Effecti |Cause)
total number of parameters is linear in n
ex: P(Cavity, Toothache, Catch) = P(Cavity )P(Toothache|Cavity )P(Catch|Cavity
)
Q: How can we compute P(Cause|Effect1, ..., Effectk )?
ex P(Cavity|toothache ∧ catch)?

(© S. Russell & P. Norwig, AIMA)

34 / 44
Using Bayes’ Rule: Combining Evidence [cont.]

Q: How can we compute P(Cause|Effect1, ..., Effectk )?


ex: P(Cavity|toothache ∧ catch)?
P(Cavity|toothache ∧ catch)
A: Apply Bayes’ Rule = P(toothache ∧ catch|Cavity )P(Cavity )/P(toothache ∧ catch)
= αP(toothache ∧ catch|Cavity )P(Cavity )
= αP(toothache|Cavity )P(catch|
def Cavity )P(Cavity )
α = 1/P(toothache ∧ catch) not computed explicitly
Q
General case: P(Cause|Effect1 , ..., Effectn ) = (Cause) i P(Effecti |Cause)
αP α def
= 1/P(Effect1, ..., Effectn ) not computed
explicitly (one α value for every value of
Effect1, ..., Effectn )
=⇒ reduces from 2n+1 − 1 to 2n + 1 independent
entries

35 / 44
Outline

1 Acting Under Uncertainty

2 Basics on Probability

3 Probabilistic Inference via Enumeration

4 Independence and Conditional Independence

5 Applying Bayes’ Rule

6 An Example: The Wumpus World Revisited

36 / 44
An Example: The Wumpus World
A probability model of the Wumpus World
Consider again the Wumpus World (restricted to pit detection)
Evidence: no pit in (1,1), (1,2), (2,1), breezy in (1,2), (2,1)
Q. Given the evidence, what is the probability of having a pit in (1,3), (2,2) or (3,1)?
Two groups of variables:
Pij = true iff [i, j] contains a pit
(“causes”)
Bij = true iff [i, j] is breezy
(“effects”, consider only
B1,1, B1,2, B2,1)
Joint
Distribution:
P(P1,1, ..., P4,4, B1,1,
B1,2, B∗2,1def)
b = ¬b1,1 ∧
Known facts def
b1,2 ∧ b2,1
p∗ = ¬p1,1 ∧ ¬p1,2 ∧ ¬p2,1
(evidence):
Queries: P(P1,3|b∗, p∗)? P(P22|b∗, p∗)?
(P(P3,1|b∗, p∗) symmetric) (© S. Russell & P. Norwig, AIMA)

37 / 44
An Example: The Wumpus World
A probability model of the Wumpus World
Consider again the Wumpus World (restricted to pit detection)
Evidence: no pit in (1,1), (1,2), (2,1), breezy in (1,2), (2,1)
Q. Given the evidence, what is the probability of having a pit in (1,3), (2,2) or (3,1)?
Two groups of variables:
Pij = true iff [i, j] contains a pit
(“causes”)
Bij = true iff [i, j] is breezy
(“effects”, consider only
B1,1, B1,2, B2,1)
Joint
Distribution:
P(P1,1, ..., P4,4, B1,1,
B1,2, B∗2,1def)
b = ¬ b1 1, ∧ b1 2, ∧ 2 1
Known facts
b∗, =
def
p ¬ p1 1, ∧ ¬1p2 , ∧ 2 1
(evidence):
¬p ,
Queries: P(P1,3|b∗, p∗)? P(P22|b∗, p∗)?
(© S. Russell & P. Norwig, AIMA)
(P(P3,1|b∗, p∗) symmetric) 37 / 44
An Example: The Wumpus World [cont.]

Specifying the probability model


Apply the product rule to the joint distribution P(P1,1, ..., P4,4, B1,1, B1,2, B2,1) =
P(B1,1, B1,2, B2,1|P1,1, ..., P4,4) P(P1,1, ..., P4,4)
P(B1,1, B1,2, B2,1|P1,1, ..., P4,4)
1 if one pit is adjacent to
breeze, 0 otherwise
P(P1,1, ..., P4,4): pits are placed randomly except in (1,1):
Q4 Q4
P(P1,1 , ..., P4,4 ) i=1 j=1 P(Pi,j )
= 0.2 if (i, j) /= (1,
P(P i,j ) =
1)}
0 otherwise
ex: P(P1,1, ..., P4,4) = 0.23 · 0.815−3 ≈ 0.00055 if 3 pits

38 / 44
An Example: The Wumpus World [cont.]
Inference by enumeration
Case P1,3:
Σ
General form of query: P(Y|E = e) = αP(Y, E = e) = α h P(Y, E = e, H =
h)
Y: query vars; E,e: evidence vars/values; H,h: hidden vars/values
Our case: P(P1,3|p∗, b∗), s.t. the evidence is
def
b∗ = чb1 ,1 A b1 ,2 A b2 ,1
def
p∗ = чp1 ,1 A чp1 ,2 A 2 1
Sumчp
over
,
hidden variables:
Σ 1,3|p , b ) =
P(P ∗ ∗

α unknown P(P1,3 |p∗∗ , b ,


unknown unknown)
are all Pij ’s s.t.
(i, j) /∈ {(1, 1), (1, 2), (2, 1), (1, 3)} l "/∈" is read as
"not an element of" or "not in.“
=⇒ 216−4 = 4096 terms of the sum!
Grows exponentially in the number of hidden variables H! (© S. Russell & P. Norwig, AIMA)
=⇒ Inefficient
39 / 44
An Example: The Wumpus World [cont.]
Inference by enumeration
Case P1,3:
Σ
General form of query: P(Y|E = e) = αP(Y, E = e) = α h P(Y, E = e, H =
h)
Here's how to read the equation:
• P(YE=e): This is the probability of event Y happening given that

event E has already happened. It is read as "the posterior

probability of Y given E".
• =: This symbol represents "equals".
• a: This is a constant of proportionality, which means it is a value

that is multiplied
∗ by another expression to obtain the final result.
The value of this constant is not important for understanding the
general form of the equation.
• P(Y, E=e): This is the joint probability of events Y and E
happening together, given that event E has already happened. It
is read as "the joint probability of Y and E given E". (© S. Russell & P. Norwig, AIMA)

39 / 44
An Example: The Wumpus World [cont.]
Using conditional independence
Basic insight: Given the fringe squares (see below), b∗ is conditionally independent of the
other hidden squares
def
Unknown = Fringe ∪ Other
def
=⇒ P(b∗|p∗, P1,3, Unknown) = P(b∗|p∗, P1,3, Fringe, Others) = P(b∗|p∗, P1,3, Fringe)
Next: manipulate the query into a form
where this equation can be used

(© S. Russell & P. Norwig, AIMA)

40 / 44
An Example: The Wumpus World [cont.]
Using conditional independence

Fringe in Quantifying Uncertainty:

In quantifying uncertainty, particularly in probabilistic or statistical contexts, the


fringe represents the regions of a distribution where the likelihood of occurrence is
relatively low but not impossible. It essentially captures the tails or outskirts of the
distribution.
For example, in a normal distribution, the fringe would refer to the regions far away
from the mean where the probability density is low. Even though events in the
fringe are less likely to occur, they are still within the realm of possibility according
to the distribution.
Understanding the fringe is crucial in risk assessment and decision-making, as it
helps account for extreme or rare events that may have significant consequences.

(© S. Russell & P. Norwig, AIMA)

40 / 44
An Example: The Wumpus World [cont.]
Using conditional independence

Fringe in Normalization:
In the context of normalization, which often involves scaling data to fit within a certain
range or distribution, the fringe refers to the extreme values of the original data that
may lie outside the normalized range.
For instance, if you're normalizing data to a range between 0 and 1, the fringe would
consist of the original data points that are closest to the minimum and maximum
values. Normalization techniques like min-max scaling or z-score
normalization aim to handle these fringe values effectively to ensure that
they are appropriately represented in the normalized data.
Handling fringe values properly during normalization is important to prevent them
from unduly skewing the analysis or the performance of machine learning algorithms
trained on the data.
In both cases, understanding and appropriately dealing with the fringe are essential
for accurate modeling, analysis, and decision-making in uncertain or data-driven
(© S. Russell & P. Norwig, AIMA)
contexts. 40 / 44
An Example: The Wumpus World [cont.]
Using conditional independence

Min-max scaling, also known as min-max normalization, transforms the original data
into a new range, typically between 0 and 1. It does this by linearly scaling each
feature in the dataset based on the minimum and maximum values observed in that
feature.
The formula for min-max scaling is as follows:
�scaled=�−�min�max−�minxscaled​=xmax​−xmin​x−xmin​​
Where:
• �x is an original data point.
• �minxmin​is the minimum value of the feature.
• �maxxmax​is the maximum value of the feature.
• �scaledxscaled​is the scaled value of �x within the range [0, 1].

(© S. Russell & P. Norwig, AIMA)

40 / 44
An Example: The Wumpus World [cont.]
P(p∗, b∗) = P(p∗, b∗) is scalar; use as a normalization constant

(© of Dana Nau, CMSC21, U. Maryland, Licensed under Creative Commons)

41 / 44
An Example: The Wumpus World [cont.]
Sum over the unknowns

(© of Dana Nau, CMSC21, U. Maryland, Licensed under Creative Commons)

41 / 44
An Example: The Wumpus World [cont.]
Use the product rule

(© of Dana Nau, CMSC21, U. Maryland, Licensed under Creative Commons)

41 / 44
An Example: The Wumpus World [cont.]
Separate unknown into fringe and other

(© of Dana Nau, CMSC21, U. Maryland, Licensed under Creative Commons)

41 / 44
An Example: The Wumpus World [cont.]
b∗ is conditionally independent of other given fringe

(© of Dana Nau, CMSC21, U. Maryland, Licensed under Creative Commons)

41 / 44
An Example: The Wumpus World [cont.]
Move P(b∗|p∗, P1,3, fringe) outward

(© of Dana Nau, CMSC21, U. Maryland, Licensed under Creative Commons)

41 / 44
An Example: The Wumpus World [cont.]
All of the pit locations are independent

(© of Dana Nau, CMSC21, U. Maryland, Licensed under Creative Commons)

41 / 44
An Example: The Wumpus World [cont.]
Move P(p∗), P(P1,3), and P(fringe) outward

(© of Dana Nau, CMSC21, U. Maryland, Licensed under Creative Commons)

41 / 44
An Example: The Wumpus World [cont.]
Σ
Remove other P(other ) because it equals 1

(© of Dana Nau, CMSC21, U. Maryland, Licensed under Creative Commons)

41 / 44
An Example: The Wumpus World [cont.]
P(p∗) is scalar, so make it part of the normalization constant

(© of Dana Nau, CMSC21, U. Maryland, Licensed under Creative Commons)

41 / 44
An Example: The Wumpus World [cont.]

Σ
We have obtained: P(P1,3|p∗, b∗) = α′P(P1,3) f r i n g e P(b∗|p∗, P1,3, fringe)P(fringe)
We know that P(P1,3) = ⟨0.2, 0.8⟩ (see slide 38)
We can compute the normalization coefficient α ′ afterwards
Σ ∗ ∗
fringe P(b |p , P1,3 , fringe)P(fringe): only 4 possible fringes
Start by rewriting as two separate
Σ
equations:
P( p1,3|p∗, b∗) = α′P( p1,3) fringe P(b∗|p∗, p1,3, fringe)P(fringe)
Σ
P(чp1,3|p∗, b∗) = α′P(чp1,3) fringe P(b∗|p∗, чp1,3, fringe)P(fringe)

(© S. Russell & P. Norwig, AIMA)

42 / 44
An Example: The Wumpus World [cont.]
Start by rewriting as two separate
Σ
equations:
P( p1,3|p∗, b∗) = α′P( p1,3) fringe P(b∗|p∗, p1,3, fringe)P(fringe)
Σ
P(чp1,3|p , b ) = α P(чp1,3)
∗ ∗ ′
fringe P(b |p , чp1,3, fringe)P(fringe)
∗ ∗

For
Σ
each of them, P(b∗|...) is 1 if the breezes occur, 0 otherwise:
P(b∗|p∗, p1,3, fringe)P(fringe) = 1 · 0.04 + 1 · 0.16 + 1 · 0.16 + 0 · 0.64 =
Σfringe
f r i n g e P(b |p , чp1,3, fringe)P(fringe) = 1·0.04 + 1 · 0.16 + 0 · 0.16 + 0 · 0.64 =
∗ ∗
0.36 Σ
P(P1,3 |p∗ , b∗ ) = α′ (P 1,3 ) fringe P(b ∗|p ∗, P1,3 , fringe)P(fringe)
=⇒ 0.2
P = α′⟨0.2, 0.8⟩⟨0.36, 0.2⟩ = α′⟨0.072, 0.16⟩ = (normalization, s.t. α ′ ≈ 4.31) ≈ ⟨0.31, 0.69⟩

(© S. Russell & P. Norwig, AIMA)


43 / 44
Exercise

Compute P(P2,2|p∗, b∗) in the same way.

44 / 44

You might also like