IAI-Unit5
IAI-Unit5
me/jntuh
5th UNIT
Probability Reasoning
Topic-1: Acting Under Uncertainty
Complexity in Belief State: If an agent sees something blurry, it must consider all
possible things it could be, which makes thinking too complicated.
Example: Seeing a shadow and considering it could be a cat, a dog, or a tree branch.
2. Drawbacks of Contingency Plans: Making a plan for every possible event, no matter
how unlikely, makes the plan too big and impractical.
Example: Planning for both a rainstorm and a meteorite hitting your car on the way to
the airport.
3. Uncertainty in Plan Success: Sometimes, no plan is perfect, so the agent must choose
the best option among uncertain ones.
Example: Deciding to leave for the airport early, knowing there's no way to be 100%
sure you’ll be on time because of potential traffic or car issues.
4. Making Rational Decisions Under Uncertainty: Choose the plan that’s most likely
to succeed based on what the agent knows, even if it’s not certain.
Example: Leaving 90 minutes before a flight to avoid missing it, even if there’s a small
chance of unexpected delays.
5. Weighing Goals and Likelihoods: Make the best choice by considering how important
each goal is and how likely it is to be achieved.
Example: Choosing between leaving very early to definitely make the flight but
waiting long at the airport, or leaving with just enough time, risking being late.
Decision Theory: Combines utility theory and probability theory to make rational choices.
An agent is rational if it chooses the action with the highest expected utility. This is called the
principle of Maximum Expected Utility (MEU).
Examples of Plans to the Airport
1. A90 Plan: Leaves 90 minutes before the flight with a 97% chance of catching the
flight.
o Utility: High chance of catching the flight but shorter wait at the airport.
2. A180 Plan: Leaves 180 minutes before the flight with a higher chance of catching the
flight.
o Utility: Higher chance of catching the flight but longer wait.
3. A1440 Plan: Leaves 24 hours before the flight.
o Utility: Almost certain to catch the flight but involves an intolerable wait.
Key Concepts
Utility is Relative: Depends on the agent’s preferences.
o Example: A draw in chess might be high utility for an amateur against a world
champion, but low for the world champion.
Preferences are Subjective: Even quirky preferences can be rational if they align
with the agent's utility.
o Example: Preferring jalapeño bubble-gum ice cream over chocolate chip is a
valid preference.
Principle of Maximum Expected Utility (MEU)
Expected Utility: The average utility of an action’s outcomes, weighted by their
probabilities.
o Decision: Choose the action with the highest expected utility.
Decision-Theoretic Agent
Belief State: Represents possible world states and their probabilities.
Action Selection: Chooses actions based on expected utility from probabilistic
predictions of outcomes.
Algorithm:
Step-1: A decision-theoretic agent that selects rational actions.
Step-2: function DT-AGENT (percept) returns an action
Step-3: persistent: belief state, probabilistic beliefs about the current state of the
world
5. Naïve Bayes Classifier: Applies Bayes' Rule assuming independence of features given the
class, simplifying probabilistic calculations.
6. Bayesian Networks: Graphical models depicting variable dependencies, aiding in
probabilistic inference.
7. Markov Chain Monte Carlo (MCMC): Algorithm for approximating complex
probability distributions in Bayesian networks.
8. Relational Probability Models: Extend Bayesian networks to handle relational data and
complex relationships.
9. Open Universe Probability Models: Address uncertainties about the existence and
properties of objects in varying scenarios.
10. Handling Uncertain Knowledge: Steps include gathering information, evaluating
reliability, using probabilistic reasoning, making decisions based on expected outcomes,
planning contingencies, and learning from experience.
TOPIC-1 END
1. Probability Notation:
o P(A): Probability of event A occurring.
o P(A'): Probability of event A not occurring.
o P(A ∩ B): Probability of both A and B occurring simultaneously.
o P(A ∪ B): Probability of either A or B occurring.
o P(A ∩ B'): Probability of A occurring but not B.
o P(A' ∪ B): Probability of either A not occurring or B occurring.
2. Conditional Probability:
o P(A | B): Probability of A given that B has occurred.
o Bayes’ Theorem: Allows updating probabilities based on new evidence.
3. Joint Probability:
o Probability of both A and B occurring simultaneously.
4. Marginal Probability:
o Probability of event A occurring regardless of other events.
5. Applications in AI:
o Bayesian Networks: Represent probabilistic relationships using DAGs.
o Hidden Markov Models (HMMs): Model systems with hidden states
influencing observable events.
o Markov Decision Processes (MDPs): Model decision-making under
uncertainty.
o Gaussian Processes (GPs): Used for regression and classification tasks,
incorporating uncertainty in predictions.
o Probabilistic Graphical Models (PGMs): Encode conditional independence
structures between random variables.
6. Importance:
o Handling Uncertainty: Enables AI to make robust decisions despite
incomplete or noisy data.
o Learning from Data: Bayesian methods and probabilistic models learn from
data to update beliefs.
o Inference: Tools for deducing new information from existing knowledge.
o Decision Making: Support for decision-making under uncertainty in various
applications.
6. Mathematical Foundation: It's derived from the definition of conditional probability and
the theorem of total probability, allowing for effective probabilistic inference.
7. Practical Use: By incorporating prior knowledge and updating it with observed data, Bayes'
Theorem facilitates more accurate predictions, enhances decision-making under uncertainty,
and supports evidence-based reasoning.
8. Historical Context: Developed in 1763, Bayes' Theorem has become a cornerstone in
statistical inference and Bayesian reasoning, influencing fields ranging from philosophy to
engineering.
9. Critical Considerations: It requires events to have non-zero probabilities and assumes
independence or conditional independence where applicable.
10. Educational Value: Understanding Bayes' Theorem equips individuals to analyze and
interpret probabilistic relationships, fostering a deeper understanding of uncertainties in real-
world scenarios.
11. Continued Relevance: Ongoing advancements in data analytics and machine learning
underscore the enduring significance of Bayes' Theorem in extracting meaningful insights from
data.
12. Bayes' Theorem provides a robust framework for updating beliefs and making informed
decisions in the face of uncertain and evolving information.
Extra Information
What is Probabilistic Notation?
Probabilistic notation uses symbols and rules to talk about chances and predictions. It helps
AI make decisions when things are uncertain, like guessing weather or predicting stocks.
Basic Probabilistic Notations:
1. Probability Notation:
o P(A): Probability that event A happens.
o P(A'): Probability that event A doesn’t happen.
o P(A ∩ B): Probability that both A and B happen.
o P(A ∪ B): Probability that either A or B (or both) happen.
Example: If you toss a coin, P(Heads) = 0.5 means there's a 50% chance of getting
heads.
2. Conditional Probability:
o P(A | B): Probability of A happening given that B has already happened.
Example: P(Rain | Cloudy) tells us the chance of rain when it's cloudy.
3. Joint Probability:
o P(A ∩ B): Probability of both A and B happening together.
Example: P(Heads ∩ Tails) = 0 because you can't get both heads and tails at the same
time in one coin toss.
4. Marginal Probability:
o P(A): Probability of A happening, regardless of other events.
Example: P(Sunny) = P(Sunny ∪ Cloudy) tells us the chance of sunny weather,
whether it’s cloudy or not.
Advanced Probabilistic Notations:
1. Random Variables:
o X: Represents an outcome.
6
o Example: Rolling two dice. The sample space Ω includes outcomes like (1,1),
(1,2), ..., (6,6) totaling 36 outcomes.
2. Probability Model:
o Definition: Assigns a probability to each possible outcome in the sample
space, denoted by P(ω).
o Example: If both dice are fair, each outcome in Ω has a probability of 1/36.
3. Event:
o Definition: An event is a set of outcomes from the sample space.
o Example: "The total is 11" is an event. It includes outcomes (5,6) and (6,5).
P(Total = 11) = P((5,6)) + P((6,5)) = 1/36 + 1/36 = 1/18.
4. Unconditional (Prior) Probability:
o Definition: Probability assigned to an event without any additional
information.
o Example: P(doubles) when rolling fair dice is 1/6.
5. Conditional (Posterior) Probability:
Definition: Probability of an event given that another event has occurred.
Example: P(doubles | first die is 5) is the probability of rolling doubles given that
the first die shows 5. It adjusts the probability based on new information.
6. Product Rule:
Definition: Relates conditional probability to unconditional probability.
Example: P(A and B) = P(A | B) * P(B). For example, P(doubles and first die is 5)
= P(doubles | first die is 5) * P(first die is 5).
Probability theory helps us quantify uncertainty and make decisions based on
observed evidence. It provides a formal framework to understand how likely events
are and how they relate to each other in terms of probability.
d) Conditional Probability Distribution: Each node has associated probabilities that depend
on its parents in the network. This distribution captures how each variable is affected by its
direct influences.
Example: For "Rain" as a node, its conditional probability distribution might specify how
likely rain is given the previous day's weather and current atmospheric conditions.
Bayesian networks provide a structured approach to probabilistic reasoning by organizing
variables, their relationships, and their uncertainties, making complex systems more
manageable and allowing for efficient decision-making and inference.
Extra Information:
1. Definition and Importance of Probabilistic Reasoning:
Probabilistic reasoning in AI uses probability theory to manage uncertainty in decision-
making.
It enables AI systems to function effectively in complex, real-world scenarios where
information is incomplete or noisy.
2. Key Techniques:
Bayesian Networks: Graphical models showing relationships between variables and
their probabilities, used in medical diagnosis and causal reasoning.
Markov Models: Predict future states based on current and past states, widely used in
weather forecasting and speech recognition.
Hidden Markov Models (HMMs): Extend Markov models to include hidden states,
crucial in applications like stock market prediction.
Probabilistic Graphical Models: Framework encompassing Bayesian networks and
HMMs, suitable for complex relationships.
3. Applications:
Robotics: Enables robots to navigate uncertain environments (e.g., SLAM algorithms).
Healthcare: Aids in medical diagnosis by assessing disease likelihood from symptoms.
Natural Language Processing (NLP): Used for tasks like part-of-speech tagging and
machine translation.
Finance: Models market behavior and assesses risks in investment decisions.
4. Advantages:
Flexibility: Adaptable to various domains and types of uncertainty.
Robustness: Handles noise and incomplete data well, making it reliable in practical
applications.
8. Markov Chain Monte Carlo (MCMC): An algorithm used for sampling from probability
distributions, particularly useful in Bayesian inference and approximate inference in
Bayesian networks.
9. First-Order Models: Challenges arise in applying probabilistic reasoning to complex
domains where relationships and rules involve first-order logic, requiring specialized
approaches.
10. Open Universe Probability Models: Address the issue of unknown or changing sets of
entities in probabilistic reasoning, allowing for more flexible and adaptive models.
11. These points highlight how probabilistic reasoning and Bayesian methods are applied in AI
systems to handle uncertainty and make informed decisions based on probabilistic
inference.
Knowledge helps in reasoning by providing the facts and rules needed to make logical
deductions, predictions, or decisions. In uncertain domains, probabilistic reasoning helps in
making decisions based on incomplete or uncertain information.
Example: Given symptoms like fever and cough, reasoning with a Bayesian network
can help diagnose the probability of having a flu.
d) Issues in Knowledge Representation
There are several challenges when it comes to representing knowledge, especially in uncertain
domains:
1. Complexity:
o Representing real-world situations can be very complex.
Example: Accurately modeling the weather involves many variables
and relationships.
2. Uncertainty:
o Not all information is certain; there can be unknowns or probabilities.
Example: Predicting if it will rain tomorrow based on current weather
data.
3. Ambiguity:
o The same information can be interpreted in different ways.
Example: The word "bank" can mean the side of a river or a financial
institution.
4. Incompleteness:
o Sometimes, not all information is available.
Example: Diagnosing a disease without knowing all symptoms.
5. Scalability:
o Handling large amounts of knowledge efficiently.
Example: A search engine indexing billions of web pages.
Knowledge representation in uncertain domains involves storing information in
ways that allow computers to understand and use it for reasoning.
Techniques like logic-based representation, probabilistic models, and fuzzy logic
help manage uncertainty. Knowledge plays a critical role in reasoning by providing
the necessary facts and rules.
However, issues like complexity, uncertainty, ambiguity, incompleteness, and
scalability pose challenges to effective knowledge representation.
Chain Rule
The chain rule lets us calculate the probability of any combination of events by multiplying the
conditional probabilities together.
For example, to find the probability of it raining and the grass being wet, we multiply the
probability of it raining by the probability of the grass being wet given that it’s raining.
Why Bayesian Networks Are Useful
No Redundancy: Bayesian Networks don’t repeat any probability values, so there’s no
chance of conflicting information.
Accurate Representation: It’s impossible to create a Bayesian Network that doesn’t
follow the rules of probability, ensuring accuracy.
In Simple Terms
Nodes: Think of them as questions (Is it raining? Is the sprinkler on? Is the grass wet?).
Links: These are the connections showing how one question affects another (Rain
affects Wet Grass).
CPTs: These are small tables that tell us the answer to one question based on the
answers to others.
Chain Rule: This is like a recipe for calculating the probability of any combination of
answers.
With a Bayesian Network, you can figure out the likelihood of any event based on the known
relationships and probabilities, and it’s always consistent with the rules of probability.
If we choose the wrong order when creating a Bayesian network, we can end up with a more
complex network that requires more probabilities to be specified and includes difficult and
unnatural relationships.
Example: Burglary Scenario
Consider a scenario involving an alarm system that detects both burglaries and earthquakes,
and two people (John and Mary) who call when they hear the alarm.
Correct Node Order: Burglary, Earthquake, Alarm, JohnCalls, MaryCalls
In this correct order:
1. Burglary and Earthquake are independent events.
2. Alarm depends on both Burglary and Earthquake.
3. JohnCalls and MaryCalls depend on the Alarm.
This order results in a simple and compact network.
Incorrect Node Order: MaryCalls, JohnCalls, Alarm, Burglary, Earthquake
In this incorrect order:
1. MaryCalls has no parents.
2. JohnCalls depends on MaryCalls.
3. Alarm depends on both MaryCalls and JohnCalls.
4. Burglary depends on the Alarm.
5. Earthquake depends on both Alarm and Burglary.
This creates a more complex network with unnecessary relationships, making it harder to
specify and understand the probabilities. For example, you would need to assess the
probability of an Earthquake given both an Alarm and a Burglary, which is unnatural.
Very Bad Node Order: MaryCalls, JohnCalls, Earthquake, Burglary, Alarm
In this very bad order:
1. MaryCalls and JohnCalls have no parents.
2. Earthquake depends on both JohnCalls and MaryCalls.
3. Burglary depends on Earthquake.
4. Alarm depends on both Burglary and Earthquake.
This results in a network as complex as specifying the full joint distribution, requiring 31
distinct probabilities.
Key Point
Using a causal model (causes to effects) is simpler and requires fewer and more natural
probabilities than using a diagnostic model (symptoms to causes). For example, in medicine,
doctors prefer giving probabilities for causes leading to symptoms rather than the other way
around.
Conclusion:
Correct order = simpler network, fewer probabilities.
Incorrect order = more complex network, more probabilities, unnatural relationships.
Very bad order = maximum complexity, as complex as the full joint distribution.
Conditional independence in Bayesian networks using simple examples:
1. Numerical Semantics: This means defining how variables interact through specific
probabilities in the network.
Example: In a Bayesian network about home security, if Alarm going off influences
whether John Calls and Mary Calls, we might say:
P(John Calls | Alarm) = 0.8 (John is likely to call if the alarm goes off).
P(Mary Calls | Alarm) = 0.6 (Mary is somewhat likely to call if the alarm goes
off).
2. Topological Semantics:
This focuses on the structure of the network to determine independence relationships.
Example: In the same security network, if Alarm directly affects both John Calls and
Mary Calls, they would be independent of each other given Alarm.
If Alarm is on, whether John calls or not doesn’t change the probability of Mary
calling, and vice versa.
3. Equivalence of Semantics:
Both numerical and topological views are equivalent—they describe the same
relationships.
Example: Knowing that in our network, Burglary is independent of John Calls and
Mary Calls given Alarm and Earthquake allows us to precisely calculate how these
variables influence each other numerically.
4. Markov Blanket:
It includes a node's parents, children, and children's parents, making it a shield against
influences from other parts of the network.
Example: If Burglary’s Markov blanket includes Alarm and Earthquake, then
Burglary is independent of other variables like John Calls and Mary Calls given these
factors.
Creating Conditional Probability Tables (CPTs) for Bayesian networks can be complex,
especially with fewer parent nodes (k). In the worst case, filling these tables could need O(2^k)
entries. However, many parent-child relationships follow standard patterns, described by
simpler distributions. For instance, in a network predicting rain (R) with Cloudy (C) and Windy
(W) as parents, instead of listing all combos (4), you might note rain is likelier when cloudy:
P(R | C, W) = High if C = True, else Low.
This simplifies the CPT by using patterns rather than every scenario, making it easier to
manage.
Deterministic Nodes:
In a Bayesian network, a deterministic node's value is exact and depends solely on its parent
nodes, without uncertainty:
Logical Example: A child node "North American" with parent nodes "Canadian,"
"US," and "Mexican" simply reflects whether any parent is true.
Numerical Example 1: If parent nodes are car prices at different dealers, the child
node "bargain hunter's price" would be the minimum price among the parents.
Numerical Example 2: For a lake's parent nodes representing inflows and outflows,
the child node "change in water level" would be the difference between total inflows
and outflows.
Noisy-OR
Noisy logical relationships, like noisy-OR, handle uncertainty in connections. For example, in
propositional logic, we might say having a Fever could be due to Cold, Flu, or Malaria. But
unlike a strict OR, where having any one of these conditions guarantees a Fever, noisy-OR
allows for cases where having Cold alone might not result in a Fever.
Noisy-OR Assumption
F F F 0.0 1.0
F F T 0.9 0.1
F T F 0.8 0.2
F T T 0.98 0.02
T F F 0.4 0.6
T F T 0.94 0.06
T T F 0.88 0.12
T T T 0.988 0.012
This approach, used in medical diagnosis and other fields, helps model probabilistic
relationships efficiently while accounting for uncertainty and independence among
causes.
Bayesian nets with continuous variables
DISCRETIZATION
Handling continuous variables in Bayesian networks is tricky because they can take on
countless values. To handle this, we often use discretization. For instance, instead of
considering every possible temperature, we might group them into categories like "cold,"
"warm," and "hot."
Discretization simplifies things by reducing the number of specific values we deal with.
However, it can make our estimates less precise and require larger tables of probabilities.
Alternatively, we can use standard math functions like the Gaussian (normal) distribution,
which uses parameters like mean (μ) and variance (σ²). This approach helps us model
values more accurately without discretizing them explicitly.
In more complex situations, we might use nonparametric methods. Here, distributions are
represented indirectly through specific examples with known values of related variables.
These methods make Bayesian networks useful for real-world problems involving
continuous data, balancing accuracy with practicality in calculations.
Handling Discrete Parents: For discrete parents like Subsidy, we specify different
scenarios (Subsidy is true or false) by providing separate distributions (P(Cost |
Harvest, Subsidy) and P(Cost | Harvest, ¬Subsidy)).
Handling Continuous Parent (Harvest): We use a linear Gaussian distribution,
where the cost's distribution (like its mean and variability) changes linearly with the
value of Harvest.
Example: If Harvest affects Cost, we might say Cost follows a Gaussian
distribution with mean μ = ath + bt and standard deviation σ. Here, ath and bt are
parameters that define how Harvest influences Cost.
2. Discrete Variable (Buys):
Given Continuous Parent (Cost): For Buys (a discrete variable), its distribution
depends on the continuous variable Cost.
Example: Buys could be influenced by Cost such that the probability of buying
fruit might increase as Cost decreases. This relationship is specified in the Bayesian
network to show how Cost affects Buys.
In a hybrid Bayesian network:
Continuous variables like Cost are described using linear Gaussian distributions, with
parameters that show how they depend on continuous and discrete parent variables.
Discrete variables like Buys are described by how they depend on continuous parent
variables, showing the likelihood of different outcomes based on the value of the
continuous parent.
This structure helps model complex relationships in systems where both types of variables
interact.
For this example, then, the conditional distribution for Cost is specified by naming the linear Gaussian
distribution and providing the parameters at, bt, σt, af, bf,and σf.
Figures 14.6(a) and (b) show these two relationships. Notice that in each case the slope is
negative, because cost decreases as supply increases. (Of course, the assumption of linearity
implies that the cost becomes negative at some point; the linear model is reasonable only if the
harvest size is limited to a narrow range.)
Figure 14.6(c) shows the distribution P(c|h), averaging over the two possible values of Subsidy
and assuming that each has prior probability 0.5. This shows that even with very simple models,
quite interesting distributions can be represented.
CONDITIONAL GAUSSIAN
In networks with linear Gaussian distributions:
All continuous variables connected in a certain way form a joint distribution that looks
like a bell curve, known as a multivariate Gaussian.
When discrete variables influence continuous ones, the conditional distribution of the
continuous variables given specific values of the discrete ones is called a conditional
Gaussian distribution.
For example, imagine predicting if a customer buys a product based on its cost. Let's say cost
is continuous, and the buying decision (Buys) is discrete (yes or no). If we think customers are
more likely to buy when the cost is low and less likely as it increases, we can model this with
a soft threshold.
To create soft thresholds, we use the cumulative standard normal distribution function, Φ(x).
For instance, the probability a customer buys given a cost (c) might be Φ((−c+μ)/σ), where:
P(buys |Cost =c)=Φ((−c+μ)/σ)
μ is the cost where the probability of buying is 50%,
σ controls how quickly the probability changes around μ.
As cost moves away from μ, the probability of buying decreases smoothly according to the
standard normal curve. This method helps us model complex decision-making in a clear,
probabilistic way.
This probit distribution (pronounced “pro-bit” and short for “probability unit”) is illustrated in
Figure 14.7(a). The form can be justified by proposing that the underlying decision process has a
hard threshold, but that the precise location of the threshold is subject to random Gaussian noise.
An alternative to the probit model is the logit distribution (pronounced “low-jit”). It uses the
logistic function 1/(1 + e−x) to produce a soft threshold: P(buys |Cost =c)= 1/ 1+exp (−2−c+μ/σ) .
This is illustrated in Figure 14.7(b). The two distributions look similar, but the logit actually has
much longer “tails.” The probit is often a better fit to real situations, but the logit is sometimes
easier to deal with mathematically. It is used widely in neural networks. Both probit and logit can
be generalized to handle multiple continuous parents by taking a linear combination of the parent
values.
End of Topic -7
Big and complex networks make exact answers hard to find. So, we turn to approximate
methods like Monte Carlo algorithms, such as simulated annealing. These methods
randomly sample data to give results close to the right ones. They're widely used in science
to estimate tricky values.
For example, imagine predicting an election outcome in a large country with many factors
affecting voters. Monte Carlo methods help us estimate different outcomes by randomly
sampling voter opinions and behaviors.
When calculating posterior probabilities, we look at two main types of algorithms: direct
sampling and Markov chain sampling. Each tackles complex network data differently to
give useful estimates.
Other methods like variational approaches and loopy propagation are also mentioned as
ways to handle similar problems at the end of the chapter.
Direct sampling methods
1. Direct Sampling Overview: Sampling methods generate random outcomes from known
probability distributions. For example, flipping an unbiased coin with a 50% chance for
heads and tails can be simulated by generating a random number between 0 and 1. If it's ≤
0.5, it's heads; otherwise, it's tails.
2. Simple Example: To simulate a coin flip, generate a random number between 0 and 1. ≤
0.5 = heads, > 0.5 = tails.
3. Sampling in Bayesian Networks: Bayesian networks depict probabilistic relationships
among variables. Sampling follows a topological order, starting with variables without
parents and using their distributions to sample values. Proceed to variables with known
parent values using conditional distributions until all are sampled.
4. Conclusion In a Bayesian network where variables like Weather, Temperature, and Outfit
Choice are interconnected, sample in topological order: Weather first, then Temperature,
and finally Outfit Choice, respecting each variable's dependencies. This approach ensures
that sampled values reflect the network's probabilistic structure accurately.
Algorithm Explanation:
PRIOR-SAMPLE constructs a sample by iteratively sampling each variable in the network according to
its conditional probability distribution, given the values already sampled for its parent variables.
6. Example: For instance, if we sample 1000 times and find that 511 of those samples
have Rain = true, then we estimate the probability of rain as 0.511.
7. These points highlight how sampling from a Bayesian network helps us understand
probabilities of different events, even when the exact probabilities are not initially
known.
We might observe how often it rains after seeing a red sky, ignoring days when the sky
wasn’t red.
If we observe 50 red sky nights and it rains 15 times the next day, then P(R |
RedSkyAtNight = true) ≈ 15/50 = 0.3.
Conclusion:
Rejection sampling provides a way to estimate probabilities in complex scenarios by generating
and filtering scenarios based on observed evidence. However, it becomes impractical for highly
complex problems due to the large number of samples it discards.
Rejection sampling is a Monte Carlo method used in Bayesian networks for generating
samples from joint probability distributions:
1. Basic Idea: It randomly assigns values to variables based on priors and accepts or
rejects samples based on observed evidence.
2. Procedure:
a. Randomly assign values to variables according to priors.
b. Check if the sample satisfies observed evidence; keep or reject it accordingly.
c. Repeat until enough valid samples are collected.
3. Sampling Efficiency: Can be inefficient with low prior probabilities of satisfying
evidence, leading to many rejections.
4. Consistency: Provides consistent estimates of the posterior distribution with sufficient
samples, but can be computationally expensive for large networks or strict evidence.
5. Suitability: Easy to implement but may struggle with scalability in complex networks
or dependencies.
6. Comparison: Less efficient compared to methods like likelihood weighting due to
higher rejection rates, wasting computational resources.Rejection sampling ensures
correct posterior estimates with enough samples, its inefficiency in managing evidence
constraints and scalability issues restrict its practical utility in many Bayesian network
scenarios.
This method relies on generating samples and selecting only those consistent with the observed
evidence, making it suitable for querying Bayesian networks where direct computation of
probabilities may be impractical.
Suppose we know the sprinkler is on and it's not raining. Based on our model, if we
find that the probability of cloudy being false is higher, we update our current situation
accordingly.
2. Sampling Rain:
Next, let's determine if it's raining. This depends on factors like whether it's cloudy, if
the sprinkler is on, and if the grass is wet.
For instance, if we know it's not cloudy, the sprinkler is on, and the grass is wet, we
might find that rain is likely to be true.
3. Counting States:
As we go through these steps, each combination of conditions we consider forms a
state. Some states will have rain as true, others as false.
Let's say we count 20 states where rain is true and 60 states where it's false.
4. Calculating the Probability:
To answer how likely it is to rain, we normalize the counts we found. This means
dividing the number of true rain states by the total number of states considered.
In our example, with 20 true rain states and 60 false ones, the normalized probability is
0.25 (or 25%) for rain being true and 0.75 (or 75%) for rain being false.
In Bayesian networks help us compute probabilities by sampling different scenarios
based on known conditions, ultimately giving us a clearer picture of the likelihood of
events like rain occurring.
The complete algorithm is shown in Figure 14.16.
Gibbs sampling is a method used in Bayesian networks to generate samples from the
posterior distribution of variables given evidence:
SIBYL addresses uncertainties regarding the existence of objects behind observed symbols and
whether different symbols refer to the same object.
In a Relational Probability Model (RPM), we deal with uncertainty about how symbols
correspond to real-world objects. Unlike traditional databases that assume everything not
known is false, RPMs acknowledge uncertainties like whether different symbols refer to the
same object or how many objects exist.
For example, imagine a book retailer using ISBNs to identify books. A single book like "Gone
With the Wind" might have multiple ISBNs, which complicates tracking recommendations or
sales across different identifiers. Similarly, customers using multiple login IDs can confuse
systems meant to track reputation, like in online reviews or security systems.
These uncertainties are critical because:
1. Existence Uncertainty: We're unsure about which objects truly exist behind the
symbols we observe. For instance, how many distinct copies of "Gone With the Wind"
are there?
2. Identity Uncertainty: We may not be certain which symbols actually refer to the same
underlying object. For example, are two different ISBNs really identifying the same
physical book?
RPMs help us model scenarios where what we observe (like ISBNs or customer IDs) might not
perfectly reflect the underlying reality (actual books or unique customers). This flexibility in
handling uncertainty makes RPMs useful in areas like online retail and security where clear
identifications can be tricky due to various factors.
Relational Probability Models
Relational Probability Models (RPMs) are similar to first-order logic but use symbols for
constants, functions, and predicates. Predicates act like functions that determine true or false
statements.
In the context of recommending books to customers, here are the key elements simplified:
1. Symbols and Types:
o Constants: Names of customers (like C1, C2) and books (like B1, B2).
o Functions (returning values):
Honest: Determines if a customer is honest (Honest(C1)).
Kindness: Rates the kindness of a customer (Kindness(C1)).
2. Random Variables:
o RPM defines random variables by instantiating each function with specific
combinations of customers and books. For instance:
Honest(C1), Honest(C2), etc.
3. Dependencies:
o Each function has dependencies that govern its behavior. These are statements
about the likelihood of certain values based on other factors:
Honest(c) might have probabilities like 0.99 for true and 0.01 for false.
Kindness(c) might have probabilities like 0.1, 0.1, 0.2, 0.3, 0.3 for its
ratings.
Quality(b) might have probabilities like 0.05, 0.2, 0.4, 0.2, 0.15 for its
ratings.
Recommendation(c, b)'s probability might be derived from a
RPMs help model relationships and probabilities between entities like customers and books,
enabling systems to make informed predictions and recommendations based on data and logical
dependencies defined for each entity.
Context Specific Independence
For example:
An honest customer might always give books by their favorite author a perfect rating
of 5.
If we don't know who wrote a certain book, we have to consider all possible authors
and how much customers like each author to predict their ratings accurately.
This system deals with uncertainty about book authors and customer preferences to make better
predictions about book ratings.
Open-universe probability models (Managing Ambiguity in Real-World Object Identification)
1. Unlike in databases with unique IDs, real-life systems like cameras, text processors, and
intelligence analysis deal with ambiguity.
2. Cameras may not recognize if the object around a corner is the same as seen earlier.
3. Text processors must determine if different mentions (e.g., "Mary," "Dr. Smith," "she")
refer to the same entity.
4. Intelligence analysts track spies without knowing how many there are or if various
identifiers belong to the same spy.
5. Human understanding involves learning what objects exist and linking observations
without clear, unique identifiers.
6. Dealing with uncertainty is a crucial challenge in these contexts.
OPEN UNIVERSE