0% found this document useful (0 votes)
18 views36 pages

Ch13 Uncertainty Converted

Uploaded by

Nandini Ganjewar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views36 pages

Ch13 Uncertainty Converted

Uploaded by

Nandini Ganjewar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 36

Uncertainty

Chapter 5

Artificial Intelligence
This Lecture
• Probability
• Syntax
• Semantics
• Inference rules
• How to build a Naïve Bayes classifier
Fuzzy Logic in Real World
Probability
Probabilistic assertions summarize effects of
Ignorance: lack of relevant facts, initial conditions, etc.
Laziness: failure to enumerate exceptions, qualifications, etc.

Subjective or Bayesian probability:


Probabilities relate propositions to one's own state of knowledge
e.g., P(A90 succeeds | no reported accidents) = 0.97

These are NOT assertions about the world, but represent belief about the
whether the assertion is true.

Probabilities of propositions change with new evidence: e.g.,


P(A90 | no reported accidents, 5 a.m.) = 0.99

(Analogous to logical entailment status; I.e., does KB |=  )


Making decisions under uncertainty
• Suppose I believe the following:
P(A 30 gets me there on time | ...) = 0.05
P(A 60 gets me there on time | ...) = 0.70
P(A 100 gets me there on time | ...) = 0.95
P(A 1440 gets me there on time | ...) = 0.9999

• Which action to choose?

• Depends on my preferences for missing flight vs.


airport cuisine, etc.

• Utility theory is used to represent and infer


preferences

• Decision theory = utility theory + probability theory


Unconditional Probability
• Let A be a proposition, P(A) denotes the unconditional
probability that A is true.

• Example: if Male denotes the proposition that a


particular person is male, then P(Male)=0.5 mean that
without any other information, the probability of that
person being male is 0.5 (a 50% chance).

• Alternatively, if a population is sampled, then 50% of the


people will be male.

• Of course, with additional information (e.g. that the person


is a CS151 student), the “posterior probability” will likely
be different.
Axioms of probability
For any propositions A, B
1. 0  P(A)  1
2. P(True) = 1 and
P(False) = 0
3. P(A  B) = P(A) +
P(B) - P(A  B)

de Finetti (1931): an agent who bets according to


probabilities that violate these axioms can be forced to
bet so as to lose money regardless of outcome.
Syntax
Similar to PROPOSITIONAL LOGIC: possible worlds defined by
assignment of values to random variables.

Note: To make things confusing variables have first letter upper-case, and
symbol values are lower-case

Propositional or Boolean random variables


e.g., Cavity (do I have a cavity?)
Include propositional logic expressions
e.g., Burglary  Earthquake

Multivalued random variables


e.g., Weather is one of <sunny,rain,cloudy,snow>
Values must be exhaustive and mutually exclusive

A proposition is constructed by assignment of a value:


e.g., Weather = sunny; also Cavity = true for clarity
Priors, Distribution, Joint
Prior or unconditional probabilities of propositions
e.g., P(Cavity) =P(Cavity=TRUE)= 0.1
P(Weather=sunny) = 0.72
correspond to belief prior to arrival of any (new) evidence

Probability distribution gives probabilities of all possible values


of the random variable.
Weather is one of <sunny,rain,cloudy,snow>
P(Weather) = <0.72, 0.1, 0.08, 0.1>
(normalized, i.e., sums to 1)
Joint probability distribution
Joint probability distribution for a set of variables gives values for each
possible assignment to all the variables

P(Toothache, Cavity) is a 2 by 2 matrix.


Toothache=true Toothache = false
Cavity=true 0.04 0.06
Cavity=false 0.01 0.89

NOTE: Elements in table sum to 1 3 independent


numbers.
The importance of
independence
• P(Weather,Cavity) is a 4 by 2 matrix of
values:
Weather = sunny rain cloudy snow

Cavity=True .072 .01 .008 .01


Cavity=False .648 .09 .072 .09

• But these are independent (usually):


– Recall that if X and Y are independent then:
– P(X=x,Y=y) = P(X=x)*P(Y=y)

Weather is one of <sunny,rain,cloudy,snow>


P(Weather) = <0.72, 0.1, 0.08, 0.1> and P(Cavity) = .1
The importance of
independence
If we compute the marginals (sums of rows or columns), we see:

Weather = sunny rain cloudy snow marginals

Cavity=True .072 .01 .008 .01 .1


Cavity=False .648 .09 .072 .09 .9
marginals .72 .1 .08 .1

Each entry is just the product of the marginals (because they’re independent)!
So, instead of using 7 numbers, we can represent these by just recording 4
numbers, 3 from Weather and 1 from Cavity:
Weather is one of <sunny,rain,cloudy,snow>
P(Weather) = <0.72, 0.1, 0.08, (1- Sum of the others)>
P(Cavity) = <.1, (1-.1)>
This can be much more efficient (more later…)
Conditional Probabilities
• Conditional or posterior probabilities
e.g., P(Cavity | Toothache) = 0.8
What is the probability of having a cavity given that the patient has a
toothache? (as another example: P(Male | CS151Student) = ??

• If we know more, e.g., dental probe catches, then we have


P(Cavity | Toothache, Catch) = .9 (say)

• If we know even more, e.g., Cavity is also given, then we have


P(Cavity | Toothache, Cavity) = 1
Note: the less specific belief remains valid after more evidence arrives,
but is not always useful.
• New evidence may be irrelevant, allowing simplification, e.g.,
P(Cavity | Toothache, PadresWin) = P(Cavity | Toothache) = 0.8
(again, this is because Cavities and Baseball are independent
Conditional Probabilities
• Conditional or posterior probabilities
e.g., P(Cavity | Toothache) = 0.8
What is the probability of having a cavity given that the patient has a
toothache?
• Definition of conditional probability:
P(A|B) = P(A, B) / P(B) if P(B)  0

I.e., P(A|B) means,


our universe has
shrunk to the one
in which B is true.

P(A|B) is the ratio of


the red area to the
yellow area.
Conditional Probabilities
• Conditional or posterior probabilities
e.g., P(Cavity | Toothache) = 0.8
What is the probability of having a cavity given that the
patient has a toothache?

• Definition of conditional probability:


P(A|B) = P(A, B) / P(B) if P(B)  0

Toothache=true Toothache = false


Cavity=true 0.04 0.06
Cavity=false 0.01 0.89

P(Cavity|Toothache) = .04/(.04+.01) = 0.8


Conditional probability cont.
Definition of conditional probability:
P(A | B) = P(A, B) / P(B) if P(B) 
0

Product rule gives an alternative formulation:


P(A, B) = P(A | B)P(B) = P(B|A)P(A)

Example:
P(Male)= 0.5
P(CS151Student) = 0.0037
P(Male | CS151Student) = 0.9

What is the probability of being a male CS151 student?


P(Male, CS151Student)
= P(Male | CS151Student)P(CS151Student)
= 0.0037 * 0.9 = 0.0033
Conditional probability cont.
A general version holds for whole distributions, e.g.,
P(Weather,Cavity) = P(Weather|Cavity) P(Cavity)
(View as a 4 X 2 set of equations, not matrix multiplication -
The book calls this a pointwise product)

The Chain rule is derived by successive application of


product rule:
P(X1,…Xn) = P(X1, …, Xn-1) P(Xn | X1, …, Xn-1)
= P(X1, …,Xn-2) P(Xn-1| X1 , …,Xn-2) P(Xn | X1,
…, Xn-1)
=…
n
=  P(Xi | X1, …, Xi-1)
i=1
Bayes Rule
From product rule P(A B) = P(A|B)P(B) = P(B|A)P(A), we can
obtain Bayes' rule

P(B | A)P( A)
P( A | B) 
P(B)

Why is this useful???


For assessing diagnostic probability from causal probability:
e.g. Cavities as the cause of toothaches
Sleeping late as the cause for missing class
Meningitis as the cause of a stiff neck

P(Effect |
P(Cause | Effect) 
Cause)P(Cause)
P(Effect)
Bayes Rule: Example
P(Effect| Cause)P(Cause)
P(Cause | Effect) 
P ( Effect )

Let M be meningitis, S be stiff neck


P(M)=0.0001
P(S)= 0.1
P(S | M) = 0.8

P(S |
P(M | S ) 
M )P(M ) 0.8

0.0001
 0.0008
Bayes Rule: Another Example
Let A be AIDS, T be positive test result for AIDS
P(T|A)=0.99 “true positive” rate of test (aka a “hit”)
P(T|~A)= 0.005 “false positive” rate of test (aka a “false alarm”)
P(~T | A) = 1-0.99=.01 “false negative” or “miss”
P(~T|~A) = 0.995 “correct rejection”

Seems like a pretty good test, right?


Bayes Rule: Another Example
Let A be AIDS, T be positive test result for AIDS
P(T|A)=0.99 “true positive” rate of test (aka a “hit”)
P(T|~A)= 0.005 “false positive” rate of test (aka a “false alarm”)
P(~T | A) = 1-0.99=.01 “false negative” or “miss”
P(~T|~A) = 0.995 “correct rejection”
Now, suppose you are from a low-risk group, so P(A) = .001, and you get a
positive test result. Should you worry?
Let’s compute it using Bayes’ rule: P(A|T) = P(T|A)*P(A)/P(T)
Hmmm. Wait a minute: How do we know P(T)?
Recall there are two possibilities: Either you have AIDS or you don’t.
Therefore: P(A|T) + P(~A|T) = 1.
P(~A|T) = P(T|~A)*P(~A)/P(T)
So, P(A|T) + P(~A|T) = P(T|A)*P(A)/P(T) + P(T|~A)*P(~A)/P(T) = 1
Bayes Rule: Another Example
P(A|T) = P(T|A)*P(A)/P(T)
Hmmm. Wait a minute: How do we know P(T)?
Recall there are two possibilities: Either you have AIDS or you don’t.
Therefore: P(A|T) + P(~A|T) = 1.
Now, by Bayes’ rule:
P(~A|T) = P(T|~A)*P(~A)/P(T)
So,
P(A|T) + P(~A|T) = P(T|A)*P(A)/P(T) + P(T|~A)*P(~A)/P(T) = 1
 P(T|A)*P(A) + P(T|~A)*P(~A) = P(T)
 The denominator is always the sum of the numerators if they are exhaustive
of the possibilities - a normalizing constant to make the probabilities sum to 1!
 We can compute the numerators and then normalize afterwards!
Bayes Rule: Back to the Example
Let A be AIDS, T be positive test result for AIDS
P(T|A)=0.99 “true positive” rate of test (aka a “hit”)
P(T|~A)= 0.005 “false positive” rate of test (aka a “false alarm”)
P(~T | A) = 1-0.99=.01 “false negative” or “miss”
P(~T|~A) = 0.995 “correct rejection”
Now, suppose you are from a low-risk group, so P(A) = .001, and you get a
positive test result. Should you worry?

P(A|T) P(T|A)*P(A) = 0.99*.001 = .00099

P(~A|T)  P(T|~A)*P(~A) = 0.005*.999 = 0.004995


P(A|T) = .00099/(.00099+.004995) = 0.165
Maybe you should worry, or maybe you should get
re-tested….
Bayes Rule: Back to the Example II
Let A be AIDS, T be positive test result for AIDS
P(T|A)=0.99 “true positive” rate of test (aka a “hit”)
P(T|~A)= 0.005 “false positive” rate of test (aka a “false alarm”)
P(~T | A) = 1-0.99=.01 “false negative” or “miss”
P(~T|~A) = 0.995 “correct rejection”
Now, suppose you are from a high-risk group, so P(A) = .01, and you get
a positive test result. Should you worry?

P(A|T) P(T|A)*P(A) = 0.99*.01 = .0099

P(~A|T)  P(T|~A)*P(~A) = 0.005*.99 = 0.00495


P(A|T) = .0099/(.0099+.00495) = 0.667
=> You should worry!
Bayes Rule: Another Example
Let A be AIDS, T be positive test result for AIDS
P(T|A)=0.99 “true positive” rate of test (aka a “hit”)
P(T|~A)= 0.005 “false positive” rate of test (aka a “false alarm”)
P(~T | A) = 1-0.99=.01 “false negative” or “miss”
P(~T|~A) = 0.995 “correct rejection”
Now, suppose you are from a low-risk group, so P(A) = .001, and you get a positive
test result. Should you worry?
Let’s compute it using Bayes’ rule: P(A|T) = P(T|A)*P(A)/P(T)
Hmmm. Wait a minute: How do we know P(T)?
Recall there are two possibilities: Either you have AIDS or
you don’t.
Therefore: P(A|T) + P(~A|T) = 1.
P(~A|T) = P(T|~A)*P(~A)/P(T)
So, P(A|T) + P(~A|T) = P(T|
A)*P(A)/P(T) +
P(T|~A)*P(~A)/P(T) = 1
A generalized Bayes Rule

• More general version conditionalized on some


background evidence E

P(B | A, E)P( A | E)
P( A | B, E) 
P(B | E)
Full joint distributions
A complete probability model specifies every entry in the
joint distribution for all the variablesX = X1, ..., Xn
i.e., a probability for each possible world X1= x1, ..., Xn=
xn

E.g., suppose Toothache and Cavity are the random


variables: Toothache=true Toothache = false
Cavity=true 0.04 0.06
Cavity=false 0.01 0.89

Possible worlds are mutually exclusive  P(w1  w2) = 0


Possible worlds are exhaustive  w1  …  wn is True
hence i P(wi) = 1
Using the full joint distribution
Toothache=true Toothache = false
Cavity=true 0.04 0.06
Cavity=false 0.01 0.89

What is the unconditional probability of having a Cavity?

P(Cavity) = P(Cavity ^ Toothache) + P(Cavity^ ~Toothache)


= 0.04 + 0.06 = 0.1

What is the probability of having either a cavity or a Toothache?

P(Cavity  Toothache)
= P(Cavity,Toothache) + P(Cavity, ~Toothache) + P(~Cavity,Toothache)
= 0.04 + 0.06 + 0.01 = 0.11
Using the full joint distribution
Toothache=true Toothache = false
Cavity=true 0.04 0.06
Cavity=false 0.01 0.89

What is the probability of having a cavity given that you


already have a toothache?
0.04
P(Cavity  Toothache) 
P(Cavity | Toothache)  P(Toothache)  0.04  0.8
0.01
Normalization
Suppose we wish to compute a posterior distribution over random variable A
given B=b, and suppose A has possible values a1... am

We can apply Bayes' rule for each value of A:


P(A=a1|B=b) = P(B=b|
A=a1)P(A=a1)/P(B=b)
...
P(A=am | B=b) = P(B=b|
A=am)P(A=am)/P(B=b)

Adding these up, and noting that i P(A=ai | B=b) = 1:


P(B=b) = i P(B=b|
A=ai)P(A=ai)

This is the normalization factor denoted  =


1/P(B=b): P(A | B=b) =  P(B=b | A)P(A)

Typically compute an unnormalized distribution, normalize at end


e.g., suppose P(B=b | A)P(A) = <0.4,0.2,0.2>
then P(A|B=b) =  <0.4,0.2,0.2>
Marginalization

Given a conditional distribution P(X | Z), we can create


the unconditional distribution P(X) by marginalization:

P(X) = z P(X | Z=z) P(Z=z) = z P(X, Z=z)


In general, given a joint distribution over a set of
variables, the distribution over any subset (called a
marginal distribution for historical reasons) can be
calculated by summing out the other variables.
Conditioning
Introducing a variable as an extra condition:

P(X|Y) = z P(X | Y, Z=z) P(Z=z | Y)

Why is this useful??


Intuition: it is often easier to assess each specific circumstance, e.g.,
P(RunOver | Cross)
= P(RunOver | Cross, Light=green)P(Light=green |
Cross)
+ P(RunOver | Cross, Light=yellow)P(Light=yellow | Cross)
+ P(RunOver | Cross, Light=red)P(Light=red | Cross)
Absolute Independence
• Two random variables A and B are (absolutely) independent iff
P(A, B) = P(A)P(B)

• Using product rule for A & B independent, we can show:


P(A, B) = P(A | B)P(B) = P(A)P(B)
Therefore P(A | B) = P(A)

• If n Boolean variables are independent, the full joint is:


P(X1, …, Xn) = i P(Xi)
Full joint is generally specified by 2n - 1 numbers, but when independent only n
numbers are needed.

• Absolute independence is a very strong requirement, seldom met!!


Conditional Independence
• Some evidence may be irrelevant, allowing simplification, e.g.,

P(Cavity | Toothache, PadresWin) = P(Cavity | Toothache) = 0.8

• This property is known as Conditional Independence and can be expressed as:


P(X | Y, Z) = P(X | Z)
which says that X and Y independent given Z.

• If I have a cavity, the probability that thedentist’s probe catches in it doesn't


depend on whether I have a toothache:
1. P(Catch | Toothache, Cavity) = P(Catch | Cavity)
i.e., Catch is conditionally independent of Toothache given Cavity
This doesn’t say anything about P(Catch|Toothache)
The same independence holds if I haven't got a cavity:
2. P(Catch | Toothache, ~Cavity) = P(Catch | ~Cavity)
Conditional independence contd.
Equivalent statements to
P(Catch | Toothache, Cavity) = P(Catch | Cavity) (*)

a. P(Toothache | Catch, Cavity) = P(Toothache | Cavity)

P(Toothache | Catch, Cavity)


= P(Catch | Toothache, Cavity) P(Toothache |
Cavity) / P(Catch | Cavity)
= P(Catch | Cavity) P(Toothache | Cavity) /
P(Catch | Cavity)

(from *)
= P(Toothache|Cavity)
P(Toothache,Catch | Cavity)
= P(Toothache | Catch,Cavity)
b. P(Toothache,Catch|Cavity) P(Catch | Cavity) (product rule)
= P(Toothache|
= P(Toothache | Cavity) P(Catch | Cavity)
Cavity)P(Catch|Cavity) (from 1a)
Using Conditional Independence
Full joint distribution can now be written as
P(Toothache,Catch,Cavity)
= (Toothache,Catch | Cavity) P(Cavity)
= P(Toothache | Cavity)P(Catch | Cavity)P(Cavity) (from 1.b)

Specified by: 2 + 2 + 1 = 5 independent numbers


Compared to 7 for general joint
or 3 for unconditionally indpendent.

You might also like