0% found this document useful (0 votes)
21 views

Unit II Full Notes

The document discusses probabilistic reasoning, focusing on how agents act under uncertainty using Bayesian inference and models like Bayesian networks. It highlights the challenges of managing belief states, the qualification problem, and the role of probability in decision-making, including utility and decision theory. Additionally, it covers the structure and semantics of Bayesian networks, emphasizing their efficiency and the concept of conditional independence.

Uploaded by

RAJASEKAR M
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Unit II Full Notes

The document discusses probabilistic reasoning, focusing on how agents act under uncertainty using Bayesian inference and models like Bayesian networks. It highlights the challenges of managing belief states, the qualification problem, and the role of probability in decision-making, including utility and decision theory. Additionally, it covers the structure and semantics of Bayesian networks, emphasizing their efficiency and the concept of conditional independence.

Uploaded by

RAJASEKAR M
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 108

UNIT II

PROBABILISTIC REASONING

Acting under uncertainty – Bayesian inference – naïve bayes models. Probabilistic reasoning
– Bayesian networks – exact inference in BN – approximate inference in BN – causal networks.

ACTING UNDER UNCERTAINITY

● Agents may need to handle uncertainty, whether due to partial observability,


nondeterminism, or a combination of the two.
● An agent may never know for certain what state it’s in or where it will end up after a
sequence of actions.
● Agents are designed to handle uncertainty by keeping track of a belief state—a
representation of the set of all possible world states that it might be in—and generating
a contingency plan that handles every possible eventuality that its sensors may report
during execution.
Drawbacks of Using Belief State
1. When interpreting partial sensor information, a logical agent must consider every
logically possible explanation for the observations, no matter how unlikely. This leads
to impossible large and complex belief-state representations.
2. A correct contingent plan that handles every eventuality can grow arbitrarily large and
must consider arbitrarily unlikely contingencies.
3. Sometimes there is no plan that is guaranteed to achieve the goal—yet the agent must
act. It must have some way to compare the merits of plans that are not guaranteed.
Example - Qualification Problem - —- Automated Taxi to Airport
Task - the goal of delivering a passenger to the airport on time.
● The agent forms a plan, A90, that involves leaving home 90 minutes before the flight
departs and driving at a reasonable speed. Even though the airport is only about 5 miles
away, a logical taxi agent will not be able to conclude with certainty that “Plan A90
will get us to the airport in time.”
● Instead, it reaches the weaker conclusion “Plan A90 will get us to the airport in time,
as long as the car doesn’t break down or run out of gas, and I don’t get into an accident,
and there are no accidents on the bridge, and the plane doesn’t leave early, and no
meteorite hits the car, and ... .”
● None of these conditions can be deduced for sure, so the plan’s success cannot be
inferred. This is the qualification problem.
Best Rational Plan
● A90 is expected to maximize the agent’s performance measure (where the expectation
is relative to the agent’s knowledge about the environment).
● The performance measure includes getting to the airport in time for the flight, avoiding
a long, unproductive wait at the airport, and avoiding speeding tickets along the way.
● The agent’s knowledge cannot guarantee any of these outcomes for A90, but it can
provide some degree of belief that they will be achieved.
● The rational decision—therefore depends on both the relative importance of various
goals and the likelihood that, and degree to which, they will be achieved.
Summarizing uncertainty
Diagnosing a dental patient’s toothache. Diagnosis—whether for medicine, automobile repair,
or whatever—almost always involves uncertainty.
Simple Rule —---------- Toothache ⇒ Cavity .
● The problem is that this rule is wrong. Not all patients with toothaches have cavities;
some of them have gum disease, an abscess, or one of several other problems:
Toothache ⇒ Cavity ∨ GumProblem ∨ Abscess ...

● Unfortunately, in order to make the rule true, we have to add an almost unlimited list
of possible problems. We could try turning the rule into a causal rule:
Cavity ⇒ Toothache
● But this rule is not right either; not all cavities cause pain. The only way to fix the rule
is to make it logically exhaustive: to augment the left-hand side with all the
qualifications required for a cavity to cause a toothache.
Reasons for failure of this diagnosis:
1. Laziness: It is too much work to list the complete set of antecedents or consequents
needed to ensure an exceptionless rule and too hard to use such rules. •
2. Theoretical ignorance: Medical science has no complete theory for the domain.
3. Practical ignorance: Even if we know all the rules, we might be uncertain about a
particular patient because not all the necessary tests have been or can be run.
Degree of Belief
The agent’s knowledge can at best provide only a degree of belief in the relevant sentences.
Our main tool for dealing with degrees of belief is probability theory.
Use of probability for uncertainity
● Probability provides a way of summarizing the uncertainty that comes from our laziness
and ignorance, thereby solving the qualification problem.
● We might not know for sure what afflicts a particular patient, but we believe that there
is, say, an 80% chance—that is, a probability of 0.8—that the patient who has a
toothache has a cavity.
● It is derived from statistical data—80% of the toothache patients seen so far have had
cavities—or from some general dental knowledge, or from a combination of evidence
sources.
● But there is no uncertainty in the actual world: the patient either has a cavity or doesn’t.
So what does it mean to say the probability of a cavity is 0.8? Shouldn’t it be either 0
or 1? The answer is that probability statements are made with respect to a knowledge
state, not with respect to the real world. \
● We say “The probability that the patient has a cavity, given that she has a toothache, is
0.8
Uncertainity and Rational Decisions
Automated Taxi To Airport

Plan Probability of catching flight

A90 0.97

A180 0.98

A1440 0.99

● If it is vital not to miss the flight, then it is worth taking A180 and risking the longer
wait at the airport.
● A1440 is not a good choice, because although it almost guarantees getting there on
time, it involves an intolerable wait—not to mention a possibly unpleasant diet of
airport food.
● To make such choices, an agent must first have preferences between the different
possible outcomes of the various plans.
● An outcome is a completely specified state, including such factors as whether the agent
arrives on time and the length of the wait at the airport.
Utility Theory
● Used to represent and reason with preferences. (The term utility is used here in the sense
of “the quality of being useful,” not in the sense of the electric company or water
works.)
● Utility theory says that every state has a degree of usefulness, or utility, to an agent and
that the agent will prefer states with higher utility.
● The utility of a state is relative to an agent.
Decision theory
● Preferences, as expressed by utilities, are combined with probabilities in the general
theory of rational decisions called decision theory:
Decision theory = probability theory + utility theory
● The fundamental idea of decision theory is that an agent is rational if and only if it
chooses the action that yields the highest expected utility, averaged over all the possible
outcomes of the action. This is called the principle of maximum expected utility
(MEU).
Basics of Probability
 In probability theory, the set of all possible worlds is called the sample space. The
possible worlds are mutually exclusive and exhaustive—two possible worlds cannot
both be the case, and one possible world must be the case.
 For example, if we are about to roll two (distinguishable) dice, there are 36 possible
worlds to consider: (1,1), (1,2), ..., (6,6). The Greek letter Ω (uppercase omega) is used
to refer to the sample space, and ω (lowercase omega) refers to elements of the space,
that is, particular possible worlds.
 A fully specified probability model associates a numerical probability P(ω) with each
possible world.
Basic Theorem
The probability of each event is between 0 and 1.
The summation of probability is always 1.

1. Probability axioms: Probability theory is based on three axioms:


○ The probability of any event is a number between 0 and 1, inclusive. For
example, the probability of flipping a fair coin and getting heads is 0.5.
○ The probability of the entire sample space is 1. For example, the probability of
rolling a six-sided die and getting any number from 1 to 6 is 1.
○ If A and B are two mutually exclusive events, then the probability of either A
or B occurring is the sum of their individual probabilities. For example, the
probability of rolling a six-sided die and getting either a 1 or a 2 is 1/6 + 1/6 =
1/3.
2. Addition rule: The addition rule of probability states that the probability of the union
of two events A and B is equal to the probability of A plus the probability of B, minus
the probability of their intersection (i.e., the probability that both events occur).
P(A or B) = P(A) + P(B) - P(A and B)

○ For example, the probability of rolling a six-sided die and getting either an even
number or a multiple of 3 is 1/2 + 1/3 - 1/6 = 2/3.
3. Multiplication rule: The multiplication rule of probability states that the probability of
the intersection of two independent events A and B is equal to the probability of A times
the probability of B.
P(A and B) = P(A) x P(B)

○ For example, the probability of flipping a fair coin and then rolling a six-sided
die and getting heads and a 3 is 1/2 x 1/6 = 1/12.
4. Conditional probability: Conditional probability is the probability of an event A given
that another event B has occurred. The formula for conditional probability is:
P(A | B) = P(A and B) / P(B)

○ For example, the probability of rolling a six-sided die and getting an even
number given that the number is greater than 2 is 2/4 = 1/2, since there are two
even numbers greater than 2 (4 and 6) out of a total of four possible outcomes.
Probability
Throwing a Dice
Throwing two dice
Throwing Two dice

P(Sum=12)

P(Sum>10)
Deck of Cards
Deck of Cards
1. P( Black card)

2. P(Red Cards)

3. P ( number 10)

4. P (King)

5. P (Spade)
Theorems of Probability
Types of Probability

1. Unconditional Probability

2. Conditional Probability
Conditional Probability eg
Example Given in Book

1. P (Cavity) = 0.2
2. P(cavity/toothache) = 0.6
a. Whenvever toothache is true and we have no other information implies
cavity is true with probability of 0.6
3. P(Cavity/Toothache ^ ~bleeding) =0.4

~
Product Rule
Random Variables - variables in probability theory

P(Weather= sunny) =0.6

P(Weather= rain) =0.1

P(Weather= cloudy) =0.29

P(Weather= snow) =0.01

P(Weather) = <0.6,0.1,0.29,0.01>
Random Variables domain
Continuous Variables
Joint Probability Distribution

Cavity - True, False

Weather - Sunny, Rain,Cloudy,Snow


Axioms
Inference Using Full Joint Distribution
Independence
Independent Events
If the probability of occurrence of an event A is not affected by the
occurrence of another event B, then A and B are said to be independent
events.
BAYES RULE
IN TERMS OF CAUSE AND EFFECT
For example, a doctor knows that the disease meningitis causes the patient to have a
stiff neck, say, 70% of the time.
The doctor also knows some unconditional facts: the prior probability that a patient has
meningitis is 1/50,000, and the prior probability that any patient has a stiff neck is 1%.
Letting s be the proposition that the patient has a stiff neck and m be the proposition that
the patient has meningitis, we have
NORMALIZED FORM

GENERAL FORM OF BAYES RULE


BAYES RULE MAY BE EFFICIENT
● If there is a sudden epidemic of meningitis, the unconditional probability of
meningitis, P(m), will go up.
● The doctor who derived the diagnostic probability P(m | s) directly from
statistical observation of patients before the epidemic will have no idea how
to update the value.
● But the doctor who computes P(m | s) from the other three values will see
that P(m | s) should go up proportionately with P(m).
● Most important, the causal information P(s | m) is unaffected by the epidemic,
because it simply reflects the way meningitis works.
COMBINING EVIDENCE
CONDITIONAL INDEPENDENCE
● Such a probability distribution is called a naive Bayes model.
● Naïve: It is called Naïve because it assumes that the occurrence of a certain feature is
independent of the occurrence of other features. Such as if the fruit is identified on the bases of
color, shape, and taste, then red, spherical, and sweet fruit is recognized as an apple. Hence each
feature individually contributes to identify that it is an apple without depending on each other.
● Bayes: It is called Bayes because it depends on the principle of Bayes' Theorem.
FEATURES OF NAIVE BAYES
● Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes theorem
and used for solving classification problems.
● It is mainly used in text classification that includes a high-dimensional training dataset.
● Naïve Bayes Classifier is one of the simple and most effective Classification algorithms
which helps in building the fast machine learning models that can make quick predictions.
● It is a probabilistic classifier, which means it predicts on the basis of the probability of an
object.
● Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental analysis,
and classifying articles.
THANK YOU
BAYESIAN NETWORKS

1. Introduction
2. Semantics
a. Representing Full Joint Distribution
b. Conditional Independence
3. Efficient Representation
4. Exact Inference
5. Approximate Inference
1.INTRODUCTION
STORY
BAYESIAN NETWORKS
SEMANTICS OF BAYESIAN NETWORK
● a Bayesian network is a directed acyclic graph with some numeric parameters
attached to each node.
● A generic entry in the joint distribution is the probability of a conjunction of
particular assignments to each variable, such as P(X1 = x1 ∧ ... ∧ Xn = xn).
How to construct Bayesian Networks?
● construct a Bayesian network in such a way that the resulting joint distribution
is a good representation of a given domain
Comparing equation,
Rules

they contain no redundant probability values. If there is no redundancy, then there is


no chance for inconsistency: it is impossible for the knowledge engineer or domain
expert to create a Bayesian network that violates the axioms of probability.
Compactness and Node Ordering
● The compactness of Bayesian networks is an example of a general property of
locally structured (also called sparse) systems.
● In a locally structured system, each subcomponent interacts directly with only a
bounded number of other components, regardless of the total number of
components.
● Local structure is usually associated with linear rather than exponential growth
in complexity
Ordering
Suppose we decide to add the nodes in the order MaryCalls, JohnCalls, Alarm,
Burglary, Earthquake.
Conditional
independence
relations in
Bayesian
networks

The topological semantics specifies that each variable is conditionally independent of its
non-descendants, given its parents.
a node is conditionally independent of all other nodes in the network, given its parents,
children, MARKOV BLANKET and children’s parents—that is, given its Markov blanket.
BAYESIAN NETWORKS
Efficient Representation of Bayesian Networks
● Even if the maximum number of parents k is smallish, filling in the CPT for a
node requires up to O(2k) numbers.
● Such relationships are describable by a canonical distribution that fits some
standard pattern.
● Deterministic Node - has its value specified exactly by the values of its parents,
with no uncertainty
Noisy OR
● Uncertain relationships can often be characterized by so-called noisy logical
relationships.
● Eg.
Assumptions of Noisy OR
1. all the possible causes are listed.
a. (If some are missing, we can always add a so-called leak node that covers “miscellaneous
causes.”)
2. it assumes that inhibition of each parent is independent of inhibition of any
other parents
a. for example, whatever inhibits Malaria from causing a fever is independent of whatever inhibits
Flu from causing a fever
Inhibition probability

Cold Flu Malaria P(Fever) P(~Fever)

F F F

F F T

F T F

F T T

T F F

T F T

T T F

T T T
Bayesian Nets with Continuous Variables
1. Impossible to have conditional probability for each value that is continuous

Soln: Use discretization( Intervals)

2. Use probability Density Function with set of parameters.


3. Discrete + Continuous = Hybrid Bayesian Network
For Cost
For Buys
BAYESIAN NETWORKS
EXACT INFERENCE
1. Inference by enumeration
2. Variable ELimination
3. Time and space Complexity
4. Clustering
What is Exact Inference?
Find posterior/conditional probability for any given query.

● X - query can be on single/multi variables


● E - set of evidences (E1, E2,....Em)
● Y - hidden variable that are neither evidence nor query ( Y1,Y2,.....,YL).

Thus, the complete set of variables is X = {X} ∪ E ∪ Y

A typical query asks for the posterior probability distribution P(X | e).

Eg. P(Burglary/John Calls= True, Mary Calls= True)


1. Inference by enumeration
● any conditional probability can be computed by summing terms from the full
joint distribution.
● a query P(X | e) can be answered using

● query can be answered using a Bayesian network by computing sums of


products of conditional probabilities from the network
Eg
How to construct Tree?
2. Variable Elimination Algorithm - Why?
Example
Eg for point wise multiplication
Variable Ordering
● Every choice of ordering yields a valid algorithm, but different orderings cause
different intermediate factors to be generated during the calculation.
● Which is Optimized?
● eliminate whichever variable minimizes the size of the next factor to be
constructed.
Another Example

❖ remove any leaf node that is not a query variable or an evidence variable.
❖ After its removal, there may be some more leaf nodes, and these too may be
irrelevant.
❖ Continuing this process, we eventually find that every variable that is not an
ancestor of a query variable or evidence variable is irrelevant to the query.
❖ A variable elimination algorithm can therefore remove all these variables
before evaluating the query.
3. Complexity of Exact Inference
● The burglary network of Figure 14.2 belongs to the family of networks in which
there is at most one undirected path between any two nodes in the network.
These are called singly connected networks or polytrees.
● The time and space complexity of exact inference in polytrees is linear in
the size of the network. ( size refers to number of entries in CPT)
● For multiply connected networks,variable elimination can have exponential
time and space complexity in the worst case, even when the number of
parents per node is bounded.
4. Clustering
Algorithm

Once the network is in polytree form, a special-purpose inference algorithm is required,


because ordinary inference methods cannot handle meganodes that share variables with
each other
Thank you
APPROXIMATE INFERENCE IN
BAYESIAN NETWORKS
NEED

exact inference is not feasible in large, multiply


connected networks
DIRECT SAMPLING
1. Prior Sampling
Equations

Suppose there are N total samples, and let NPS (x1,...,xn) be


the number of times the specific event x1,...,xn occurs in
the set of samples. The value converges for big values of N
Rejection Sampling

1. it generates samples from the prior distribution


specified by the network.
2. Then, it rejects all those that do not match the evidence
Drawback of Rejection Sampling

● The biggest problem with rejection sampling is that it


rejects so many samples!
● The fraction of samples consistent with the evidence e
drops exponentially as the number of evidence variables
grows, so the procedure is simply unusable for complex
problems.
Likelihood Weighting

1. fixes the values for the evidence variables E and samples


only the non-evidence variables.
2. Not all events are equal, hence each event is weighted by
the likelihood that the event accords to the evidence, as
measured by the product of the conditional probabilities
for each evidence variable, given its parents
Example
Equations
THANK YOU
Inference by Markov Chain Simulation

● Instead of generating each sample from scratch, MCMC


algorithms generate each sample by making a random change
to the preceding sample.
● For a current state it specifies a value for every
variable and generates a next state by making random
changes to the current state
Gibbs Sampling

● The Gibbs sampling algorithm for Bayesian networks starts


with an arbitrary state (with the evidence variables
fixed at their observed values)
● Generates a next state by randomly sampling a value for
one of the non-evidence variables Xi.
● The sampling for Xi is done conditioned on the current
values of the variables in the Markov blanket of Xi
Example
Cloudy Sprinkler Rain Wetgrass

Cloudy Sprinkler Rain Wetgrass


THANK YOU
CAUSAL NETWORKS
Key points

Causal Bayesian networks, sometimes called Causal Diagrams, were devised to permit us to represent
causal asymmetries and to leverage the asymmetries towards reasoning with causal information. The
idea is to decide on arrow directionality by considerations that go beyond probabilistic dependence and
invoke a totally different type of judgment
Joint Probability

Structural Equations

The U-variables in these equations represent unmodeled variables, also called error
terms or disturbances, that perturb the functional relationship between each variable
and its parents. For example, may represent another potential source of wetness, in
addition to Sprinkler and Rain— perhaps MorningDew or FirefightingHelicopter.
Backdoor Criterion

● we might know that she checks the weather before deciding whether to turn on the
sprinkler, but we might not know how she makes her decision
● The specific reason this is problematic in this instance is that we would like to predict
the effect of turning on the sprinkler on a downstream variable such as
GreenerGrass, but the adjustment formula must take into account not only the direct
route from Sprinkler, but also the “back door” route via Cloudy and Rain.
● If we knew the value of Rain, this back-door path would be blocked—which suggests
that there might be a way to write an adjustment formula that conditions on Rain
instead of Cloudy

You might also like