0% found this document useful (0 votes)
12 views

Chapt13 Uncertainty

Uploaded by

VI XY
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Chapt13 Uncertainty

Uploaded by

VI XY
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

ARTIFICIAL INTELLIGENCE

Uncertain Knowledge
and Reasoning

1
ARTIFICIAL INTELLIGENCE
2020/2021 Semester 2
Quantifying Uncertainty:
Chapter 13

2
 课程到目前为止,都是基于如下“完美”假
设:
• 世界(问题)的状态都是完全可以观察的
 当前状态是明确知道的
• 所执行的动作(决策)的描述是清楚的
下一步到达的状态是可以明确预测的
 现在我们研究一个智能体如何去应对所谓的
“非完美” 信息
 有时候,我们还需要考虑动态的问题世界

3
Introductory Example

A robot with
imperfect
sensing must
reach a goal
location among
moving obstacles
(dynamic world)

Goal

4
Outline
• Acting under Uncertainty
• Probability
• Inference using Full Joint Distribution
• Independence
• Bayes' Rule and It’s Use

5
Acting under Uncertainty
Let action At = leave for airport t minutes before flight
Will At get me there on time?

Problems:
1. partial observability (road state, other drivers' plans, etc.)
2. noisy sensors (traffic reports)
3. uncertainty in action outcomes (flat tire, etc.)
4. immense complexity of modeling and predicting traffic

Hence a purely logical approach either


1. risks falsehood: “A25 will get me there on time”, or
2. leads to conclusions that are too weak for decision making:

“A25 will get me there on time if there's no accident on the bridge


and it doesn't rain and my tires remain intact etc etc.”

6
Methods for handling uncertainty
• Default or nonmonotonic logic(非单调逻辑):
– Assume my car does not have a flat tire
– Assume A25 works unless contradicted by evidence
• Issues: What assumptions are reasonable? How to handle
contradiction?
• Rules with fudge factors (经验系数):
– A25 |→0.3 get there on time
– Sprinkler |→ 0.99 WetGrass
– WetGrass |→ 0.7 Rain
• Issues: Problems with combination, e.g., Sprinkler causes Rain??

• Probability
– Model agent‘s degree of belief, given the available evidence,
– A25 will get me there on time with probability 0.04

7
Probability
• Probabilistic assertions summarize effects of
– laziness: failure to enumerate exceptions,
qualifications, etc.
– ignorance: lack of relevant facts, initial conditions, etc.
• Theoretical ignorance: Medical science has no complete
theory for the domain.
• Practical ignorance: Even if we know all the rules, we might
be uncertain about a particular patient because not all the
necessary tests have been or can be run.

degree of belief
8
Probability
• Probability provides a way of summarizing the
uncertainty that comes from our laziness and
ignorance.
• Subjective probability:
– Probabilities relate propositions to agent's own state
of knowledge
e.g., P(A25 | no reported accidents) = 0.06
– These are not assertions about the world
– Probabilities of propositions change with new
evidence:
e.g., P(A25 | no reported accidents, 5 a.m.) = 0.15

9
Making decisions under uncertainty
Suppose I believe the following:
P(A25 gets me there on time | …) = 0.04
P(A90 gets me there on time | …) = 0.70
P(A120 gets me there on time | …) = 0.95
P(A1440 gets me there on time | …) = 0.9999
• Which action to choose?
Depends on my preferences for missing flight vs. time spent
waiting, etc.
– Utility theory is used to represent and infer preferences
– Decision theory = probability theory + utility theory

10
Making decisions under uncertainty

11
Basic concept of probability
• Basic element: random variable

• Similar to propositional logic: possible worlds defined by


assignment of values to random variables.

• Boolean random variables


e.g., Cavity (do I have a cavity?)
• Discrete random variables
e.g., Weather is one of <sunny, rainy, cloudy, snow>
– Domain values must be exhaustive and mutually exclusive
• Elementary proposition constructed by assignment of a value
to a random variable:
– e.g., Weather = sunny, Cavity = false (abbreviated as cavity)

• Complex propositions formed from elementary propositions


and standard logical connectives
– e.g., Weather = sunny  Cavity = false

12
Basic concept of probability
• Atomic event: A complete specification of the state of
the world about which the agent is uncertain
E.g., if the world consists of only two Boolean variables Cavity
and Toothache, then there are 4 distinct atomic events:

Cavity = false Toothache = false


Cavity = false  Toothache = true
Cavity = true  Toothache = false
Cavity = true  Toothache = true

• Atomic events are mutually exclusive and exhaustive


13
Axioms of probability

• For any propositions A, B


– 0 ≤ P(A) ≤ 1
– P(true) = 1 and P(false) = 0
– P(A  B) = P(A) + P(B) - P(A  B)

14
Using the axioms of probability
• P(a⋁¬a ) = P(a) + P (¬ a ) - P(a⋀¬a )
(by axiom 3 with b = ¬ a )
• P(true) = P(a) + P (¬ a ) - P(false)
(by logical equivalence)
• 1 = P(a) + P (¬ a ) (by axiom 2)
• P (¬ a ) = 1 - P(a) (by algebra).

15
Prior probability
• Prior or unconditional probabilities of propositions
e.g., P(Cavity = true) = 0.1 and P(Weather = sunny) = 0.72 correspond to
belief prior to arrival of any (new) evidence

• Probability distribution gives values for all possible assignments:


P(Weather) = <0.72,0.1,0.08,0.1> (normalized, i.e., sums to 1)

• Joint probability distribution for a set of random variables gives


the probability of every atomic event on those random variables
P(Weather,Cavity) = a 4 × 2 matrix of values:

Weather = sunny rainy cloudy snow


Cavity = true 0.144 0.02 0.016 0.02
Cavity = false 0.576 0.08 0.064 0.08

• Every question about a domain can be answered by the joint


distribution
16
Conditional probability
• Conditional or posterior probabilities
e.g., P(cavity | toothache) = 0.8

i.e., given that toothache is all I know

• (Notation for conditional distributions:


P(Cavity | Toothache) = 2-element vector of 2-element vectors)

• If we know more, e.g., cavity is also given, then we have


P(cavity | toothache,cavity) = 1

• New evidence may be irrelevant, allowing simplification, e.g.,


P(cavity | toothache, sunny) = P(cavity | toothache) = 0.8

• This kind of inference, sanctioned by domain knowledge, is


crucial

17
Conditional probability
• Definition of conditional probability:
P(a | b) = P(a  b) / P(b) if P(b) > 0
• Product rule gives an alternative formulation:
P(a  b) = P(a | b) P(b) = P(b | a) P(a)
• A general version holds for whole distributions, e.g.,
P(Weather,Cavity) = P(Weather | Cavity) P(Cavity)
• (View as a set of 4 × 2 equations, not matrix mult.)
• Chain rule is derived by successive application of product rule:
P(X1, …,Xn) = P(X1,...,Xn-1) P(Xn | X1,...,Xn-1)
= P(X1,...,Xn-2) P(Xn-1 | X1,...,Xn-2) P(Xn | X1,...,Xn-1)
=…
= Пi= 1^n P(Xi | X1, … ,Xi-1)
18
Conditional probability
P(Cavity, Toothache)
Cavity = false Toothache = false 0.8
Cavity = false  Toothache = true 0.1
Cavity = true  Toothache = false 0.05
Cavity = true  Toothache = true 0.05

P(Cavity) P(Toothache)
Cavity = false 0.9 Toothache = false 0.85
Cavity = true 0.1 Toothache = true 0.15

• What is P(Cavity = true | Toothache = false)?


0.05 / 0.85 = 0.059
• What is P(Cavity = false | Toothache = true)?
0.1 / 0.15 = 0.667

19
Conditional distributions
• A conditional distribution is a distribution over the values of
one variable given fixed values of other variables
P(Cavity, Toothache)
Cavity = false Toothache = false 0.8
Cavity = false  Toothache = true 0.1
Cavity = true  Toothache = false 0.05
Cavity = true  Toothache = true 0.05

P(Cavity | Toothache = true) P(Cavity|Toothache = false)


Cavity = false 0.667 Cavity = false 0.941
Cavity = true 0.333 Cavity = true 0.059

P(Toothache | Cavity = true) P(Toothache | Cavity = false)


Toothache= false 0.5 Toothache= false 0.889
Toothache = true 0.5 Toothache = true 0.111
20
Inference by enumeration
• Start with the joint probability distribution:

• For any proposition φ, sum the atomic events where it is true:


P(φ) = Σω:ω╞φ P(ω)

21
Inference by enumeration
• Start with the joint probability distribution:

• For any proposition φ, sum the atomic events where it is true:


P(φ) = Σω:ω╞φ P(ω)

• P(toothache) = 0.108 + 0.012 + 0.016 + 0.064 = 0.2



22
Inference by enumeration
• Start with the joint probability distribution:

• Can also compute conditional probabilities:


P(cavity | toothache) = P(cavity  toothache)
P(toothache)
= 0.016+0.064
0.108 + 0.012 + 0.016 + 0.064
= 0.4
23
Normalization

• Denominator can be viewed as a normalization constant α


P(Y | X)= α P(X | Y)P(Y)

P(Cavity | toothache) = α P(Cavity,toothache)


= α [P(Cavity, toothache, catch) + P(Cavity, toothache, catch)]
= α [<0.108,0.016> + <0.012,0.064>]
= α <0.12,0.08> = <0.6,0.4>

General idea: compute distribution on query variable by fixing


evidence variables and summing over hidden variables
24
Inference by enumeration
Typical Problem:
Given evidence variables E=e, estimate the posterior joint
distribution of the query variables Y

Let the hidden variables be H,

P(Y | E = e) = α P(Y,E = e) = α Σh P(Y, E= e, H = h)

• Obvious problems:
1. Worst-case time complexity O(dn) where d is the largest arity
2. Space complexity O(dn) to store the joint distribution
3. How to find the numbers for O(dn) entries?

25
Independence
• Two events A and B are independent if and only if
P(A  B) = P(A) P(B)
– In other words, P(A | B) = P(A) and P(B | A) = P(B)
– This is an important simplifying assumption for modeling,
e.g., Toothache and Weather can be assumed to be
independent
• Are two mutually exclusive events independent?
– No, but for mutually exclusive events we have
P(A  B) = P(A) + P(B)

• Conditional independence: A and B are conditionally


independent given C iff P(A  B | C) = P(A | C) P(B | C)
26
Independence
• A and B are independent iff
P(A|B) = P(A) or P(B|A) = P(B) or P(A, B) = P(A) P(B)

P(Toothache, Catch, Cavity, Weather)


= P(Toothache, Catch, Cavity) P(Weather)

• 32 entries reduced to 12; for n independent biased coins,


O(2n) →O(n)

• Absolute independence powerful but rare


• Dentistry is a large field with hundreds of variables, none of
which are independent. What to do?

27
Conditional independence
• If I have a cavity, the probability that the probe catches in it
doesn't depend on whether I have a toothache:
(1) P(catch | toothache, cavity) = P(catch | cavity)

• The same independence holds if I haven't got a cavity:


(2) P(catch | toothache, cavity) = P(catch | cavity)

• Catch is conditionally independent of Toothache given Cavity:


P(Catch | Toothache, Cavity) = P(Catch | Cavity)

• Equivalent statements:
P(Toothache | Catch, Cavity) = P(Toothache | Cavity)

P(Toothache, Catch | Cavity) = P(Toothache | Cavity) P(Catch | Cavity)

28
Conditional independence
• Write out full joint distribution using chain rule:
P(Toothache, Catch, Cavity)
= P(Toothache | Catch, Cavity) P(Catch, Cavity)
= P(Toothache | Catch, Cavity) P(Catch | Cavity) P(Cavity)
= P(Toothache | Cavity) P(Catch | Cavity) P(Cavity)

I.e., 2 + 2 + 1 = 5 independent numbers

• In most cases, the use of conditional independence reduces the


size of the representation of the joint distribution from
exponential in n to linear in n.

• Conditional independence is our most basic and robust form of


knowledge about uncertain environments.
29
Conditional independence:
Example
• Toothache: boolean variable indicating whether the patient has a toothache
• Cavity: boolean variable indicating whether the patient has a cavity
• Catch: whether the dentist’s probe catches in the cavity

• If the patient has a cavity, the probability that the probe catches in it doesn't
depend on whether he/she has a toothache
P(Catch | Toothache, Cavity) = P(Catch | Cavity)
• Therefore, Catch is conditionally independent of Toothache given Cavity
• Likewise, Toothache is conditionally independent of Catch given Cavity
P(Toothache | Catch, Cavity) = P(Toothache | Cavity)
• Equivalent statement:
P(Toothache, Catch | Cavity) = P(Toothache | Cavity) P(Catch | Cavity)

30
Bayes' Rule
• Bayes' rule

P(Y|X) = P(X|Y) P(Y) / P(X) = αP(X|Y) P(Y)

P(disease | symptom) =
P(symptom | disease)  P(disease)
P(symptom)

31
Bayes' Rule
• Useful for assessing diagnostic probability from causal probability:
– P(Cause|Effect) = P(Effect|Cause) P(Cause) / P(Effect)

– E.g., let M be meningitis (脑膜炎), S be stiff neck:


– We know:
P(s|m)=0.5; P(m)=1/5000; P(s)=1/20;
– Query:
P(m|s) = P(s|m) P(m) / P(s) = 0.5 × 0.0002 / 0.05 = 0.002

– Note: posterior probability of meningitis still very small!

32
Bayes Rule example
• Marie is getting married tomorrow, at an outdoor
ceremony in the desert. In recent years, it has rained only
5 days each year (5/365 = 0.014). Unfortunately, the
weatherman has predicted rain for tomorrow. When it
actually rains, the weatherman correctly forecasts rain
90% of the time. When it doesn't rain, he incorrectly
forecasts rain 10% of the time. What is the probability
that it will rain on Marie's wedding?

33
Bayes Rule example
• Marie is getting married tomorrow, at an outdoor ceremony in the
desert. In recent years, it has rained only 5 days each year (5/365 =
0.014). Unfortunately, the weatherman has predicted rain for
tomorrow. When it actually rains, the weatherman correctly
forecasts rain 90% of the time. When it doesn't rain, he incorrectly
forecasts rain 10% of the time. What is the probability that it will
rain on Marie's wedding?
P(Predict | Rain ) P(Rain )
P(Rain | Predict ) 
P(Predict )
P(Predict | Rain ) P(Rain )

P(Predict | Rain ) P(Rain )  P(Predict | Rain ) P(Rain )
0.9 * 0.014
  0.111
0.9 * 0.014  0.1* 0.986
34
Bayes rule: Another example
• 1% of women at age forty who participate in routine
screening have breast cancer. 80% of women with
breast cancer will get positive mammographies (早期胸
部肿瘤X线测定法) . 9.6% of women without breast
cancer will also get positive mammographies. A
woman in this age group had a positive
mammography in a routine screening. What is the
probability that she actually has breast cancer?

35
Bayes rule: Another example
• 1% of women at age forty who participate in routine screening
have breast cancer. 80% of women with breast cancer will get
positive mammographies. 9.6% of women without breast
cancer will also get positive mammographies. A woman in
this age group had a positive mammography in a routine
screening. What is the probability that she actually has breast
cancer?
P(Positive | Cancer ) P(Cancer )
P(Cancer | Positive) 
P(Positive)
P(Positive | Cancer ) P(Cancer )

P(Positive | Cancer ) P(Cancer )  P(Positive | Cancer ) P(Cancer )
0.8 * 0.01
  0.0776
0.8 * 0.01  0.096 * 0.99

36
Bayes' Rule and conditional independence
P(Cavity | toothache  catch)
= α P(toothache  catch | Cavity) P(Cavity)
= α P(toothache | Cavity) P(catch | Cavity) P(Cavity)

• This is an example of a naï


ve Bayes model:

• P(Cause,Effect1, … ,Effectn) = P(Cause) ПiP(Effecti|Cause)

• Total number of parameters is linear in n

37
Summary
• Probability is a rigorous formalism for uncertain
knowledge
• Joint probability distribution specifies probability of
every atomic event
• Queries can be answered by summing over atomic
events
• Independence and conditional independence
provide the basic tools to reduce the joint size

38
Exercises
• 13.8
• 13.15

39

You might also like