Unit II Full Notes
Unit II Full Notes
PROBABILISTIC REASONING
Acting under uncertainty – Bayesian inference – naïve bayes models. Probabilistic reasoning
– Bayesian networks – exact inference in BN – approximate inference in BN – causal networks.
● Unfortunately, in order to make the rule true, we have to add an almost unlimited list
of possible problems. We could try turning the rule into a causal rule:
Cavity ⇒ Toothache
● But this rule is not right either; not all cavities cause pain. The only way to fix the rule
is to make it logically exhaustive: to augment the left-hand side with all the
qualifications required for a cavity to cause a toothache.
Reasons for failure of this diagnosis:
1. Laziness: It is too much work to list the complete set of antecedents or consequents
needed to ensure an exceptionless rule and too hard to use such rules. •
2. Theoretical ignorance: Medical science has no complete theory for the domain.
3. Practical ignorance: Even if we know all the rules, we might be uncertain about a
particular patient because not all the necessary tests have been or can be run.
Degree of Belief
The agent’s knowledge can at best provide only a degree of belief in the relevant sentences.
Our main tool for dealing with degrees of belief is probability theory.
Use of probability for uncertainity
● Probability provides a way of summarizing the uncertainty that comes from our laziness
and ignorance, thereby solving the qualification problem.
● We might not know for sure what afflicts a particular patient, but we believe that there
is, say, an 80% chance—that is, a probability of 0.8—that the patient who has a
toothache has a cavity.
● It is derived from statistical data—80% of the toothache patients seen so far have had
cavities—or from some general dental knowledge, or from a combination of evidence
sources.
● But there is no uncertainty in the actual world: the patient either has a cavity or doesn’t.
So what does it mean to say the probability of a cavity is 0.8? Shouldn’t it be either 0
or 1? The answer is that probability statements are made with respect to a knowledge
state, not with respect to the real world. \
● We say “The probability that the patient has a cavity, given that she has a toothache, is
0.8
Uncertainity and Rational Decisions
Automated Taxi To Airport
A90 0.97
A180 0.98
A1440 0.99
● If it is vital not to miss the flight, then it is worth taking A180 and risking the longer
wait at the airport.
● A1440 is not a good choice, because although it almost guarantees getting there on
time, it involves an intolerable wait—not to mention a possibly unpleasant diet of
airport food.
● To make such choices, an agent must first have preferences between the different
possible outcomes of the various plans.
● An outcome is a completely specified state, including such factors as whether the agent
arrives on time and the length of the wait at the airport.
Utility Theory
● Used to represent and reason with preferences. (The term utility is used here in the sense
of “the quality of being useful,” not in the sense of the electric company or water
works.)
● Utility theory says that every state has a degree of usefulness, or utility, to an agent and
that the agent will prefer states with higher utility.
● The utility of a state is relative to an agent.
Decision theory
● Preferences, as expressed by utilities, are combined with probabilities in the general
theory of rational decisions called decision theory:
Decision theory = probability theory + utility theory
● The fundamental idea of decision theory is that an agent is rational if and only if it
chooses the action that yields the highest expected utility, averaged over all the possible
outcomes of the action. This is called the principle of maximum expected utility
(MEU).
Basics of Probability
In probability theory, the set of all possible worlds is called the sample space. The
possible worlds are mutually exclusive and exhaustive—two possible worlds cannot
both be the case, and one possible world must be the case.
For example, if we are about to roll two (distinguishable) dice, there are 36 possible
worlds to consider: (1,1), (1,2), ..., (6,6). The Greek letter Ω (uppercase omega) is used
to refer to the sample space, and ω (lowercase omega) refers to elements of the space,
that is, particular possible worlds.
A fully specified probability model associates a numerical probability P(ω) with each
possible world.
Basic Theorem
The probability of each event is between 0 and 1.
The summation of probability is always 1.
○ For example, the probability of rolling a six-sided die and getting either an even
number or a multiple of 3 is 1/2 + 1/3 - 1/6 = 2/3.
3. Multiplication rule: The multiplication rule of probability states that the probability of
the intersection of two independent events A and B is equal to the probability of A times
the probability of B.
P(A and B) = P(A) x P(B)
○ For example, the probability of flipping a fair coin and then rolling a six-sided
die and getting heads and a 3 is 1/2 x 1/6 = 1/12.
4. Conditional probability: Conditional probability is the probability of an event A given
that another event B has occurred. The formula for conditional probability is:
P(A | B) = P(A and B) / P(B)
○ For example, the probability of rolling a six-sided die and getting an even
number given that the number is greater than 2 is 2/4 = 1/2, since there are two
even numbers greater than 2 (4 and 6) out of a total of four possible outcomes.
Probability
Throwing a Dice
Throwing two dice
Throwing Two dice
P(Sum=12)
P(Sum>10)
Deck of Cards
Deck of Cards
1. P( Black card)
2. P(Red Cards)
3. P ( number 10)
4. P (King)
5. P (Spade)
Theorems of Probability
Types of Probability
1. Unconditional Probability
2. Conditional Probability
Conditional Probability eg
Example Given in Book
1. P (Cavity) = 0.2
2. P(cavity/toothache) = 0.6
a. Whenvever toothache is true and we have no other information implies
cavity is true with probability of 0.6
3. P(Cavity/Toothache ^ ~bleeding) =0.4
~
Product Rule
Random Variables - variables in probability theory
P(Weather) = <0.6,0.1,0.29,0.01>
Random Variables domain
Continuous Variables
Joint Probability Distribution
1. Introduction
2. Semantics
a. Representing Full Joint Distribution
b. Conditional Independence
3. Efficient Representation
4. Exact Inference
5. Approximate Inference
1.INTRODUCTION
STORY
BAYESIAN NETWORKS
SEMANTICS OF BAYESIAN NETWORK
● a Bayesian network is a directed acyclic graph with some numeric parameters
attached to each node.
● A generic entry in the joint distribution is the probability of a conjunction of
particular assignments to each variable, such as P(X1 = x1 ∧ ... ∧ Xn = xn).
How to construct Bayesian Networks?
● construct a Bayesian network in such a way that the resulting joint distribution
is a good representation of a given domain
Comparing equation,
Rules
The topological semantics specifies that each variable is conditionally independent of its
non-descendants, given its parents.
a node is conditionally independent of all other nodes in the network, given its parents,
children, MARKOV BLANKET and children’s parents—that is, given its Markov blanket.
BAYESIAN NETWORKS
Efficient Representation of Bayesian Networks
● Even if the maximum number of parents k is smallish, filling in the CPT for a
node requires up to O(2k) numbers.
● Such relationships are describable by a canonical distribution that fits some
standard pattern.
● Deterministic Node - has its value specified exactly by the values of its parents,
with no uncertainty
Noisy OR
● Uncertain relationships can often be characterized by so-called noisy logical
relationships.
● Eg.
Assumptions of Noisy OR
1. all the possible causes are listed.
a. (If some are missing, we can always add a so-called leak node that covers “miscellaneous
causes.”)
2. it assumes that inhibition of each parent is independent of inhibition of any
other parents
a. for example, whatever inhibits Malaria from causing a fever is independent of whatever inhibits
Flu from causing a fever
Inhibition probability
F F F
F F T
F T F
F T T
T F F
T F T
T T F
T T T
Bayesian Nets with Continuous Variables
1. Impossible to have conditional probability for each value that is continuous
A typical query asks for the posterior probability distribution P(X | e).
❖ remove any leaf node that is not a query variable or an evidence variable.
❖ After its removal, there may be some more leaf nodes, and these too may be
irrelevant.
❖ Continuing this process, we eventually find that every variable that is not an
ancestor of a query variable or evidence variable is irrelevant to the query.
❖ A variable elimination algorithm can therefore remove all these variables
before evaluating the query.
3. Complexity of Exact Inference
● The burglary network of Figure 14.2 belongs to the family of networks in which
there is at most one undirected path between any two nodes in the network.
These are called singly connected networks or polytrees.
● The time and space complexity of exact inference in polytrees is linear in
the size of the network. ( size refers to number of entries in CPT)
● For multiply connected networks,variable elimination can have exponential
time and space complexity in the worst case, even when the number of
parents per node is bounded.
4. Clustering
Algorithm
Causal Bayesian networks, sometimes called Causal Diagrams, were devised to permit us to represent
causal asymmetries and to leverage the asymmetries towards reasoning with causal information. The
idea is to decide on arrow directionality by considerations that go beyond probabilistic dependence and
invoke a totally different type of judgment
Joint Probability
Structural Equations
The U-variables in these equations represent unmodeled variables, also called error
terms or disturbances, that perturb the functional relationship between each variable
and its parents. For example, may represent another potential source of wetness, in
addition to Sprinkler and Rain— perhaps MorningDew or FirefightingHelicopter.
Backdoor Criterion
● we might know that she checks the weather before deciding whether to turn on the
sprinkler, but we might not know how she makes her decision
● The specific reason this is problematic in this instance is that we would like to predict
the effect of turning on the sprinkler on a downstream variable such as
GreenerGrass, but the adjustment formula must take into account not only the direct
route from Sprinkler, but also the “back door” route via Cloudy and Rain.
● If we knew the value of Rain, this back-door path would be blocked—which suggests
that there might be a way to write an adjustment formula that conditions on Rain
instead of Cloudy