0% found this document useful (0 votes)

38 views

Lecture 2 - CS50_s Introduction to Artificial Intelligence with Python

cs50

Uploaded by

yogiahmed12

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views

Lecture 2 - CS50_s Introduction to Artificial Intelligence with Python

cs50

Uploaded by

yogiahmed12

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

CS50’s Introduction to Artificial Intelligence with Python

OpenCourseWare

Donate  (https://cs50.harvard.edu/donate)

Brian Yu (https://brianyu.me)
[email protected]

David J. Malan (https://cs.harvard.edu/malan/)

[email protected]
 (https://www.facebook.com/dmalan)  (https://github.com/dmalan) 
(https://www.instagram.com/davidjmalan/)  (https://www.linkedin.com/in/malan/) 
(https://www.reddit.com/user/davidjmalan)  (https://www.threads.net/@davidjmalan)
 (https://twitter.com/davidjmalan)

Lecture 2

Uncertainty
Last lecture, we discussed how AI can represent and derive new knowledge. However, often, in
reality, the AI has only partial knowledge of the world, leaving space for uncertainty. Still, we would
like our AI to make the best possible decision in these situations. For example, when predicting
weather, the AI has information about the weather today, but there is no way to predict with 100%
accuracy the weather tomorrow. Still, we can do better than chance, and today’s lecture is about
how we can create AI that makes optimal decisions given limited information and uncertainty.

Probability
Uncertainty can be represented as a number of events and the likelihood, or probability, of each of
them happening.

Possible Worlds
Every possible situation can be thought of as a world, represented by the lowercase Greek letter
omega ω. For example, rolling a die can result in six possible worlds: a world where the die yields
a 1, a world where the die yields a 2, and so on. To represent the probability of a certain world, we
write P(ω).

Axioms in Probability

 0 < P(ω) < 1: every value representing probability must range between 0 and 1.
 Zero is an impossible event, like rolling a standard die and getting a 7.
 One is an event that is certain to happen, like rolling a standard die and getting a value
less than 10.
 In general, the higher the value, the more likely the event is to happen.
 The probabilities of every possible event, when summed together, are equal to 1.

The probability of rolling a number R with a standard die can be represented as P(R). In our case,
P(R) = 1/6, because there are six possible worlds (rolling any number from 1 through 6) and each is
equally likely to happen. Now, consider the event of rolling two dice. Now, there are 36 possible
events, which are, again, equally as likely.
However, what happens if we try to predict the sum of the two dice? In this case, we have only 11
possible values (the sum has to range from 2 to 12), and they do not occur equally as often.

To get the probability of an event, we divide the number of worlds in which it occurs by the
number of total possible worlds. For example, there are 36 possible worlds when rolling two dice.
Only in one of these worlds, when both dice yield a 6, do we get the sum of 12. Thus, P(12) = 1/36,
or, in words, the probability of rolling two dice and getting two numbers whose sum is 12 is 1/36.
What is P(7)? We count and see that the sum 7 occurs in 6 worlds. Thus, P(7) = 6/36 = 1/6.

Unconditional Probability

Unconditional probability is the degree of belief in a proposition in the absence of any other
evidence. All the questions that we have asked so far were questions of unconditional probability,
because the result of rolling a die is not dependent on previous events.
Conditional Probability
Conditional probability is the degree of belief in a proposition given some evidence that has
already been revealed. As discussed in the introduction, AI can use partial information to make
educated guesses about the future. To use this information, which affects the probability that the
event occurs in the future, we rely on conditional probability.

Conditional probability is expressed using the following notation: P(a | b), meaning “the probability
of event a occurring given that we know event b to have occurred,” or, more succinctly, “the
probability of a given b.” Now we can ask questions like what is the probability of rain today given
that it rained yesterday P(rain today | rain yesterday), or what is the probability of the patient having
the disease given their test results P(disease | test results).

Mathematically, to compute the conditional probability of a given b, we use the following formula:

To put it in words, the probability that a given b is true is equal to the probability of a and b being
true, divided by the probability of b. An intuitive way of reasoning about this is the thought “we are
interested in the events where both a and b are true (the numerator), but only from the worlds
where we know b to be true (the denominator).” Dividing by b restricts the possible worlds to the
ones where b is true. The following are algebraically equivalent forms to the formula above:
For example, consider P(sum 12 | roll six on one die), or the probability of rolling two dice and
getting a sum of twelve, given that we have already rolled one die and got a six. To calculate this,
we first restrict our worlds to the ones where the value of the first die is six:

Now we ask how many times does the event a (the sum being 12) occur in the worlds that we
restricted the question to (dividing by P(b), or the probability of the first die yielding 6).
Random Variables
A random variable is a variable in probability theory with a domain of possible values that it can
take on. For example, to represent possible outcomes when rolling a die, we can define a random
variable Roll, that can take on the values {0, 1, 2, 3, 4, 5, 6}. To represent the status of a flight, we can
define a variable Flight that takes on the values {on time, delayed, canceled}.

Often, we are interested in the probability with which each value occurs. We represent this using a
probability distribution. For example,

 P(Flight = on time) = 0.6

 P(Flight = delayed) = 0.3
 P(Flight = canceled) = 0.1

To interpret the probability distribution with words, this means that there is a 60% chance that the
flight is on time, 30% chance that it is delayed, and 10% chance that it is canceled. Note that, as
shown previously, the sum the probabilities of all possible outcomes is 1.

A probability distribution can be represented more succinctly as a vector. For example, P(Flight) =
<0.6, 0.3, 0.1>. For this notation to be interpretable, the values have a set order (in our case, on time,
delayed, canceled).

Independence
Independence is the knowledge that the occurrence of one event does not affect the probability of
the other event. For example, when rolling two dice, the result of each die is independent from the
other. Rolling a 4 with the first die does not influence the value of the second die that we roll. This
is opposed to dependent events, like clouds in the morning and rain in the afternoon. If it is cloudy
in the morning, it is more likely that it will rain in the morning, so these events are dependent.

Independence can be defined mathematically: events a and b are independent if and only if the
probability of a and b is equal to the probability of a times the probability of b: P(a ∧ b) = P(a)P(b).

Bayes’ Rule
Bayes’ rule is commonly used in probability theory to compute conditional probability. In words,
Bayes’ rule says that the probability of b given a is equal to the probability of a given b, times the
probability of b, divided by the probability of a.

For example, we would like to compute the probability of it raining in the afternoon if there are
clouds in the morning, or P(rain | clouds). We start with the following information:

 80% of rainy afternoons start with cloudy mornings, or P(clouds | rain).

 40% of days have cloudy mornings, or P(clouds).
 10% of days have rainy afternoons, or P(rain).

Applying Bayes’ rule, we compute (0.1)(0.8)/(0.4) = 0.2. That is, the probability that it rains in the
afternoon given that it was cloudy in the morning is 20%.

Knowing P(a | b), in addition to P(a) and P(b), allows us to calculate P(b | a). This is helpful, because
knowing the conditional probability of a visible effect given an unknown cause, P(visible effect |
unknown cause), allows us to calculate the probability of the unknown cause given the visible
effect, P(unknown cause | visible effect). For example, we can learn P(medical test results | disease)
through medical trials, where we test people with the disease and see how often the test picks up
on that. Knowing this, we can calculate P(disease | medical test results), which is valuable diagnostic
information.

Joint Probability
Joint probability is the likelihood of multiple events all occurring.

Let us consider the following example, concerning the probabilities of clouds in the morning and
rain in the afternoon.

C = cloud C = ¬cloud

0.4 0.6

R = rain R = ¬rain

0.1 0.9

Looking at these data, we can’t say whether clouds in the morning are related to the likelihood of
rain in the afternoon. To be able to do so, we need to look at the joint probabilities of all the
possible outcomes of the two variables. We can represent this in a table as follows:

R = rain R = ¬rain

C = cloud 0.08 0.32

C = ¬cloud 0.02 0.58

Now we are able to know information about the co-occurrence of the events. For example, we
know that the probability of a certain day having clouds in the morning and rain in the afternoon is
0.08. The probability of no clouds in the morning and no rain in the afternoon is 0.58.

Using joint probabilities, we can deduce conditional probability. For example, if we are interested
in the probability distribution of clouds in the morning given rain in the afternoon. P(C | rain) = P(C,
rain)/P(rain) (a side note: in probability, commas and ∧ are used interchangeably. Thus, P(C, rain) =
P(C ∧ rain)). In words, we divide the joint probability of rain and clouds by the probability of rain.

In the last equation, it is possible to view P(rain) as some constant by which P(C, rain) is multiplied.
Thus, we can rewrite P(C, rain)/P(rain) = αP(C, rain), or α<0.08, 0.02>. Factoring out α leaves us with
the proportions of the probabilities of the possible values of C given that there is rain in the
afternoon. Namely, if there is rain in the afternoon, the proportion of the probabilities of clouds in
the morning and no clouds in the morning is 0.08:0.02. Note that 0.08 and 0.02 don’t sum up to 1;
however, since this is the probability distribution for the random variable C, we know that they
should sum up to 1. Therefore, we need to normalize the values by computing α such that α0.08 +
α0.02 = 1. Finally, we can say that P(C | rain) = <0.8, 0.2>.

Probability Rules
 Negation: P(¬a) = 1 - P(a). This stems from the fact that the sum of the probabilities of all the
possible worlds is 1, and the complementary literals a and ¬a include all the possible worlds.
 Inclusion-Exclusion: P(a ∨ b) = P(a) + P(b) - P(a ∧ b). This can interpreted in the following way:
the worlds in which a or b are true are equal to all the worlds where a is true, plus the worlds
where b is true. However, in this case, some worlds are counted twice (the worlds where both
a and b are true)). To get rid of this overlap, we subtract once the worlds where both a and b
are true (since they were counted twice).
Here is an example from outside lecture that can elucidate this. Suppose I eat ice cream
80% of days and cookies 70% of days. If we’re calculating the probability that today I eat
ice cream or cookies P(ice cream ∨ cookies) without subtracting P(ice cream ∧ cookies), we
erroneously end up with 0.7 + 0.8 = 1.5. This contradicts the axiom that probability ranges
between 0 and 1. To correct for counting twice the days when I ate both ice cream and
cookies, we need to subtract P(ice cream ∧ cookies) once.
 Marginalization: P(a) = P(a, b) + P(a, ¬b). The idea here is that b and ¬b are disjoint
probabilities. That is, the probability of b and ¬b occurring at the same time is 0. We also
know b and ¬b sum up to 1. Thus, when a happens, b can either happen or not. When we take
the probability of both a and b happening in addition to the probability of a and ¬b, we end
up with simply the probability of a.

Marginalization can be expressed for random variables the following way:

The left side of the equation means “The probability of random variable X having the value xᵢ.” For
example, for the variable C we mentioned earlier, the two possible values are clouds in the morning
and no clouds in the morning. The right part of the equation is the idea of marginalization. P(X = xᵢ)
is equal to the sum of all the joint probabilities of xᵢ and every single value of the random variable
Y. For example, P(C = cloud) = P(C = cloud, R = rain) + P(C = cloud, R = ¬rain) = 0.08 + 0.32 = 0.4.

 Conditioning: P(a) = P(a | b)P(b) + P(a | ¬b)P(¬b). This is a similar idea to marginalization. The
probability of event a occurring is equal to the probability of a given b times the probability
of b, plus the probability of a given ¬b time the probability of ¬b.

In this formula, the random variable X takes the value xᵢ with probability that is equal to the sum
of the probabilities of xᵢ given each value of the random variable Y multiplied by the probability of
variable Y taking that value. This makes sense if we remember that P(a | b) = P(a, b)/P(b). If we
multiply this expression by P(b), we end up with P(a, b), and from here we do the same as we did
with marginalization.

Bayesian Networks
A Bayesian network is a data structure that represents the dependencies among random variables.
Bayesian networks have the following properties:

 They are directed graphs.

 Each node on the graph represent a random variable.
 An arrow from X to Y represents that X is a parent of Y. That is, the probability distribution of
Y depends on the value of X.
 Each node X has probability distribution P(X | Parents(X)).

Let’s consider an example of a Bayesian network that involves variables that affect whether we get
to our appointment on time.
Let’s describe this Bayesian network from the top down:

 Rain is the root node in this network. This means that its probability distribution is not
reliant on any prior event. In our example, Rain is a random variable that can take the values
{none, light, heavy} with the following probability distribution:

none light heavy

0.7 0.2 0.1

 Maintenance, in our example, encodes whether there is train track maintenance, taking the
values {yes, no}. Rain is a parent node of Maintenance, which means that the probability
distribution of Maintenance is affected by Rain.

R yes no
none 0.4 0.6

light 0.2 0.8

heavy 0.1 0.9

 Train is the variable that encodes whether the train is on time or delayed, taking the values
{on time, delayed}. Note that Train has arrows pointing to it from both Maintenance and Rain.
This means that both are parents of Train, and their values affect the probability distribution
of Train.

R M on time delayed
none yes 0.8 0.2

none no 0.9 0.1

light yes 0.6 0.4

light no 0.7 0.3

heavy yes 0.4 0.6

heavy no 0.5 0.5

 Appointment is a random variable that represents whether we attend our appointment,

taking the values {attend, miss}. Note that its only parent is Train. This point about Bayesian
network is noteworthy: parents include only direct relations. It is true that maintenance
affects whether the train is on time, and whether the train is on time affects whether we
attend the appointment. However, in the end, what directly affects our chances of attending
the appointment is whether the train came on time, and this is what is represented in the
Bayesian network. For example, if the train came on time, it could be heavy rain and track
maintenance, but that has no effect over whether we made it to our appointment.

T attend miss
on time 0.9 0.1

delayed 0.6 0.4

For example, if we want to find the probability of missing the meeting when the train was delayed
on a day with no maintenance and light rain, or P(light, no, delayed, miss), we will compute the
following: P(light)P(no | light)P(delayed | light, no)P(miss | delayed). The value of each of the
individual probabilities can be found in the probability distributions above, and then these values
are multiplied to produce P(no, light, delayed, miss).

Inference

At the last lecture, we looked at inference through entailment. This means that we could
definitively conclude new information based on the information that we already had. We can also
infer new information based on probabilities. While this does not allow us to know new
information for certain, it allows us to figure out the probability distributions for some values.
Inference has multiple properties.

 Query X: the variable for which we want to compute the probability distribution.
 Evidence variables E: one or more variables that have been observed for event e. For
example, we might have observed that there is light rain, and this observation helps us
compute the probability that the train is delayed.
 Hidden variables Y: variables that aren’t the query and also haven’t been observed. For
example, standing at the train station, we can observe whether there is rain, but we can’t
know if there is maintenance on the track further down the road. Thus, Maintenance would
be a hidden variable in this situation.
 The goal: calculate P(X | e). For example, compute the probability distribution of the Train
variable (the query) based on the evidence e that we know there is light rain.

Let’s take an example. We want to compute the probability distribution of the Appointment variable
given the evidence that there is light rain and no track maintenance. That is, we know that there is
light rain and no track maintenance, and we want to figure out what are the probabilities that we
attend the appointment and that we miss the appointment, P(Appointment | light, no). from the joint
probability section, we know that we can express the possible values of the Appointment random
variable as a proportion, rewriting P(Appointment | light, no) as αP(Appointment, light, no). How can
we calculate the probability distribution of Appointment if its parent is only the Train variable, and
not Rain or Maintenance? Here, we will use marginalization. The value of P(Appointment, light, no) is
equal to α[P(Appointment, light, no, delayed) + P(Appointment, light, no, on time)].

Inference by Enumeration

Inference by enumeration is a process of finding the probability distribution of variable X given

observed evidence e and some hidden variables Y.
In this equation, X stand for the query variable, e for the observed evidence, y for all the values of
the hidden variables, and α normalizes the result such that we end up with probabilities that add
up to 1. To explain the equation in words, it is saying that the probability distribution of X given e
is equal to a normalized probability distribution of X and e. To get to this distribution, we sum the
normalized probability of X, e, and y, where y takes each time a different value of the hidden
variables Y.

Multiple libraries exist in Python to ease the process of probabilistic inference. We will take a look
at the library pomegranate to see how the above data can be represented in code.

First, we create the nodes and provide a probability distribution for each one.

from pomegranate import *

# Rain node has no parents

rain = Node(DiscreteDistribution({
"none": 0.7,
"light": 0.2,
"heavy": 0.1
}), name="rain")

# Track maintenance node is conditional on rain

maintenance = Node(ConditionalProbabilityTable([
["none", "yes", 0.4],
["none", "no", 0.6],
["light", "yes", 0.2],
["light", "no", 0.8],
["heavy", "yes", 0.1],
["heavy", "no", 0.9]
], [rain.distribution]), name="maintenance")

# Train node is conditional on rain and maintenance

train = Node(ConditionalProbabilityTable([
["none", "yes", "on time", 0.8],
["none", "yes", "delayed", 0.2],
["none", "no", "on time", 0.9],
["none", "no", "delayed", 0.1],
["light", "yes", "on time", 0.6],
["light", "yes", "delayed", 0.4],
["light", "no", "on time", 0.7],
["light", "no", "delayed", 0.3],
["heavy", "yes", "on time", 0.4],
["heavy", "yes", "delayed", 0.6],
["heavy", "no", "on time", 0.5],
["heavy", "no", "delayed", 0.5],
], [rain.distribution, maintenance.distribution]), name="train")

# Appointment node is conditional on train

appointment = Node(ConditionalProbabilityTable([
["on time", "attend", 0.9],
["on time", "miss", 0.1],
["delayed", "attend", 0.6],
["delayed", "miss", 0.4]
], [train.distribution]), name="appointment")

Second, we create the model by adding all the nodes and then describing which node is the parent
of which other node by adding edges between them (recall that a Bayesian network is a directed
graph, consisting of nodes with arrows between them).

# Create a Bayesian Network and add states

model = BayesianNetwork()
model.add_states(rain, maintenance, train, appointment)

# Add edges connecting nodes

model.add_edge(rain, maintenance)
model.add_edge(rain, train)
model.add_edge(maintenance, train)
model.add_edge(train, appointment)

# Finalize model
model.bake()

Now, to ask how probable a certain event is, we run the model with the values we are interested in.
In this example, we want to ask what is the probability that there is no rain, no track maintenance,
the train is on time, and we attend the meeting.

# Calculate probability for a given observation

probability = model.probability([["none", "no", "on time", "attend"]])

print(probability)

Otherwise, we could use the program to provide probability distributions for all variables given
some observed evidence. In the following case, we know that the train was delayed. Given this
information, we compute and print the probability distributions of the variables Rain, Maintenance,
and Appointment.

# Calculate predictions based on the evidence that the train was delayed
predictions = model.predict_proba({
"train": "delayed"
})

# Print predictions for each node

for node, prediction in zip(model.states, predictions):
if isinstance(prediction, str):
print(f"{node.name}: {prediction}")
else:
print(f"{node.name}")
for value, probability in prediction.parameters[0].items():
print(f" {value}: {probability:.4f}")

The code above used inference by enumeration. However, this way of computing probability is
inefficient, especially when there are many variables in the model. A different way to go about this
would be abandoning exact inference in favor of approximate inference. Doing this, we lose some
precision in the generated probabilities, but often this imprecision is negligible. Instead, we gain a
scalable method of calculating probabilities.

Sampling
Sampling is one technique of approximate inference. In sampling, each variable is sampled for a
value according to its probability distribution. We will start with an example from outside lecture,
and then cover the example from lecture.

To generate a distribution using sampling with a die, we can roll the die multiple times and
record what value we got each time. Suppose we rolled the die 600 times. We count how many
times we got 1, which is supposed to be roughly 100, and then repeat for the rest of the values,
2-6. Then, we divide each count by the total number of rolls. This will generate an approximate
distribution of the values of rolling a die: on one hand, it is unlikely that we get the result that
each value has a probability of 1/6 of occurring (which is the exact probability), but we will get
a value that’s close to it.

Here is an example from lecture: if we start with sampling the Rain variable, the value none will be
generated with probability of 0.7, the value light will be generated with probability of 0.2, and the
value heavy will be generated with probability of 0.1. Suppose that the sampled value we get is
none. When we get to the Maintenance variable, we sample it, too, but only from the probability
distribution where Rain is equal to none, because this is an already sampled result. We will
continue to do so through all the nodes. Now we have one sample, and repeating this process
multiple times generates a distribution. Now, if we want to answer a question, such as what is
P(Train = on time), we can count the number of samples where the variable Train has the value on
time, and divide the result by the total number of samples. This way, we have just generated an
approximate probability for P(Train = on time).

We can also answer questions that involve conditional probability, such as P(Rain = light | Train = on
time). In this case, we ignore all samples where the value of Train is not on time, and then proceed
as before. We count how many samples have the variable Rain = light among those samples that
have Train = on time, and then divide by the total number of samples where Train = on time.
In code, a sampling function can look like generate_sample :

import pomegranate

from collections import Counter

from model import model

def generate_sample():

# Mapping of random variable name to sample generated

sample = {}

# Mapping of distribution to sample generated

parents = {}

# Loop over all states, assuming topological order

for state in model.states:

# If we have a non-root node, sample conditional on parents

if isinstance(state.distribution, pomegranate.ConditionalProbabilityTable):
sample[state.name] = state.distribution.sample(parent_values=parents)

# Otherwise, just sample from the distribution alone

else:
sample[state.name] = state.distribution.sample()

# Keep track of the sampled value in the parents mapping

parents[state.distribution] = sample[state.name]

# Return generated sample

return sample

Now, to compute P(Appointment | Train = delayed), which is the probability distribution of the
Appointment variable given that the train is delayed, we do the following:

# Rejection sampling
# Compute distribution of Appointment given that train is delayed
N = 10000
data = []

# Repeat sampling 10,000 times

for i in range(N):

# Generate a sample based on the function that we defined earlier

sample = generate_sample()

# If, in this sample, the variable of Train has the value delayed, save the sample
if sample["train"] == "delayed":
data.append(sample["appointment"])

# Count how many times each value of the variable appeared. We can later normalize by d
print(Counter(data))

Likelihood Weighting

In the sampling example above, we discarded the samples that did not match the evidence that we
had. This is inefficient. One way to get around this is with likelihood weighting, using the following
steps:

 Start by fixing the values for evidence variables.

 Sample the non-evidence variables using conditional probabilities in the Bayesian network.
 Weight each sample by its likelihood: the probability of all the evidence occurring.

For example, if we have the observation that the train was on time, we will start sampling as
before. We sample a value of Rain given its probability distribution, then Maintenance, but when
we get to Train - we always give it the observed value, in our case, on time. Then we proceed and
sample Appointment based on its probability distribution given Train = on time. Now that this
sample exists, we weight it by the conditional probability of the observed variable given its
sampled parents. That is, if we sampled Rain and got light, and then we sampled Maintenance and
got yes, then we will weight this sample by P(Train = on time | light, yes).

Markov Models
So far, we have looked at questions of probability given some information that we observed. In this
kind of paradigm, the dimension of time is not represented in any way. However, many tasks do rely
on the dimension of time, such as prediction. To represent the variable of time we will create a new
variable, X, and change it based on the event of interest, such that Xₜ is the current event, Xₜ₊₁ is the
next event, and so on. To be able to predict events in the future, we will use Markov Models.

The Markov Assumption

The Markov assumption is an assumption that the current state depends on only a finite fixed
number of previous states. This is important to us. Think of the task of predicting weather. In theory,
we could use all the data from the past year to predict tomorrow’s weather. However, it is
infeasible, both because of the computational power this would require and because there is
probably no information about the conditional probability of tomorrow’s weather based on the
weather 365 days ago. Using the Markov assumption, we restrict our previous states (e.g. how many
previous days we are going to consider when predicting tomorrow’s weather), thereby making the
task manageable. This means that we might get a more rough approximation of the probabilities of
interest, but this is often good enough for our needs. Moreover, we can use a Markov model based
on the information of the one last event (e.g. predicting tomorrow’s weather based on today’s
weather).

Markov Chain

A Markov chain is a sequence of random variables where the distribution of each variable follows
the Markov assumption. That is, each event in the chain occurs based on the probability of the
event before it.

To start constructing a Markov chain, we need a transition model that will specify the the
probability distributions of the next event based on the possible values of the current event.

In this example, the probability of tomorrow being sunny based on today being sunny is 0.8. This is
reasonable, because it is more likely than not that a sunny day will follow a sunny day. However, if
it is rainy today, the probability of rain tomorrow is 0.7, since rainy days are more likely to follow
each other. Using this transition model, it is possible to sample a Markov chain. Start with a day
being either rainy or sunny, and then sample the next day based on the probability of it being
sunny or rainy given the weather today. Then, condition the probability of the day after tomorrow
based on tomorrow, and so on, resulting in a Markov chain:
Given this Markov chain, we can now answer questions such as “what is the probability of having
four rainy days in a row?” Here is an example of how a Markov chain can be implemented in code:

from pomegranate import *

# Define starting probabilities

start = DiscreteDistribution({
"sun": 0.5,
"rain": 0.5
})

# Define transition model

transitions = ConditionalProbabilityTable([
["sun", "sun", 0.8],
["sun", "rain", 0.2],
["rain", "sun", 0.3],
["rain", "rain", 0.7]
], [start])

# Create Markov chain

model = MarkovChain([start, transitions])

# Sample 50 states from chain

print(model.sample(50))

Hidden Markov Models

A hidden Markov model is a type of a Markov model for a system with hidden states that generate
some observed event. This means that sometimes, the AI has some measurement of the world but
no access to the precise state of the world. In these cases, the state of the world is called the
hidden state and whatever data the AI has access to are the observations. Here are a few examples
for this:

 For a robot exploring uncharted territory, the hidden state is its position, and the observation
is the data recorded by the robot’s sensors.
 In speech recognition, the hidden state is the words that were spoken, and the observation is
the audio waveforms.
 When measuring user engagement on websites, the hidden state is how engaged the user is,
and the observation is the website or app analytics.

For our discussion, we will use the following example. Our AI wants to infer the weather (the
hidden state), but it only has access to an indoor camera that records how many people brought
umbrellas with them. Here is our sensor model (also called emission model) that represents these
probabilities:
In this model, if it is sunny, it is most probable that people will not bring umbrellas to the building.
If it is rainy, then it is very likely that people bring umbrellas to the building. By using the
observation of whether people brought an umbrella or not, we can predict with reasonable
likelihood what the weather is outside.

Sensor Markov Assumption

The assumption that the evidence variable depends only on the corresponding state. For example,
for our models, we assume that whether people bring umbrellas to the office depends only on the
weather. This is not necessarily reflective of the complete truth, because, for example, more
conscientious, rain-averse people might take an umbrella with them everywhere even when it is
sunny, and if we knew everyone’s personalities it would add more data to the model. However, the
sensor Markov assumption ignores these data, assuming that only the hidden state affects the
observation.

A hidden Markov model can be represented in a Markov chain with two layers. The top layer,
variable X, stands for the hidden state. The bottom layer, variable E, stands for the evidence, the
observations that we have.
Based on hidden Markov models, multiple tasks can be achieved:

 Filtering: given observations from start until now, calculate the probability distribution for
the current state. For example, given information on when people bring umbrellas form the
start of time until today, we generate a probability distribution for whether it is raining today
or not.
 Prediction: given observations from start until now, calculate the probability distribution for
a future state.
 Smoothing: given observations from start until now, calculate the probability distribution for
a past state. For example, calculating the probability of rain yesterday given that people
brought umbrellas today.
 Most likely explanation: given observations from start until now, calculate most likely
sequence of events.

The most likely explanation task can be used in processes such as voice recognition, where, based
on multiple waveforms, the AI infers the most likely sequence of words or syllables that brought to
these waveforms. Next is a Python implementation of a hidden Markov model that we will use for
a most likely explanation task:

from pomegranate import *

# Observation model for each state

sun = DiscreteDistribution({
"umbrella": 0.2,
"no umbrella": 0.8
})

rain = DiscreteDistribution({
"umbrella": 0.9,
"no umbrella": 0.1
})

states = [sun, rain]

# Transition model
transitions = numpy.array(
[[0.8, 0.2], # Tomorrow's predictions if today = sun
[0.3, 0.7]] # Tomorrow's predictions if today = rain
)

# Starting probabilities
starts = numpy.array([0.5, 0.5])

# Create the model

model = HiddenMarkovModel.from_matrix(
transitions, states, starts,
state_names=["sun", "rain"]
)
model.bake()

Note that our model has both the sensor model and the transition model. We need both for the
hidden Markov model. In the following code snippet, we see a sequence of observations of whether
people brought umbrellas to the building or not, and based on this sequence we will run the
model, which will generate and print the most likely explanation (i.e. the weather sequence that
most likely brought to this pattern of observations):

from model import model

# Observed data
observations = [
"umbrella",
"umbrella",
"no umbrella",
"umbrella",
"umbrella",
"umbrella",
"umbrella",
"no umbrella",
"no umbrella"
]

# Predict underlying states

predictions = model.predict(observations)
for prediction in predictions:
print(model.states[prediction].name)
In this case, the output of the program will be rain, rain, sun, rain, rain, rain, rain, sun, sun. This
output represents what is the most likely pattern of weather given our observations of people
bringing or not bringing umbrellas to the building.

Modern Control Systems (MCS) : Lecture-2-3-4 Root Locus
No ratings yet
Modern Control Systems (MCS) : Lecture-2-3-4 Root Locus
28 pages
Percentage Statistics
No ratings yet
Percentage Statistics
6 pages
01 Probability Theory
No ratings yet
01 Probability Theory
15 pages
Statistics Tutorial: Basic Probability: Probability of A Sample Point
No ratings yet
Statistics Tutorial: Basic Probability: Probability of A Sample Point
48 pages
Probability: An Overview of Key Facts For CS3920/CS5920 Machine Learning
No ratings yet
Probability: An Overview of Key Facts For CS3920/CS5920 Machine Learning
12 pages
A Tutorial On Probability Theory
No ratings yet
A Tutorial On Probability Theory
25 pages
Chapter 6: Probability: 6.1 Set Theory and Venn Diagrams
No ratings yet
Chapter 6: Probability: 6.1 Set Theory and Venn Diagrams
13 pages
Probability
100% (1)
Probability
10 pages
CBSE Class 10 Maths Chapter 15 Probability Revision Notes
No ratings yet
CBSE Class 10 Maths Chapter 15 Probability Revision Notes
43 pages
Artificial Intelligence M2
No ratings yet
Artificial Intelligence M2
12 pages
High School Mathematics Extensions_Discrete Probability
No ratings yet
High School Mathematics Extensions_Discrete Probability
13 pages
Sample Space of Rolling Two Dice
No ratings yet
Sample Space of Rolling Two Dice
3 pages
Lecture 5 Bayesian Classification 3
No ratings yet
Lecture 5 Bayesian Classification 3
103 pages
MZB127 Topic 7 Lecture Notes (Unannotated Version)
No ratings yet
MZB127 Topic 7 Lecture Notes (Unannotated Version)
26 pages
Some Types of Events in Probability
No ratings yet
Some Types of Events in Probability
6 pages
Lec 10 J
No ratings yet
Lec 10 J
21 pages
Conditional Probability
No ratings yet
Conditional Probability
9 pages
Chapter 2. Discrete Models: 1 Probability - The Foundation of Statistics
No ratings yet
Chapter 2. Discrete Models: 1 Probability - The Foundation of Statistics
29 pages
Chapter13 Uncertainty
No ratings yet
Chapter13 Uncertainty
49 pages
Statistics For Management - 1
No ratings yet
Statistics For Management - 1
43 pages
Probability
No ratings yet
Probability
14 pages
Chapter 10, Probability and Stats
No ratings yet
Chapter 10, Probability and Stats
30 pages
Notes On ML
No ratings yet
Notes On ML
42 pages
Statistical Basis and Classical Statistics - Organized
No ratings yet
Statistical Basis and Classical Statistics - Organized
10 pages
Important RGPV Question
No ratings yet
Important RGPV Question
23 pages
Probability
No ratings yet
Probability
16 pages
Probability Theory - MIT OCW
No ratings yet
Probability Theory - MIT OCW
16 pages
Avinash Mathematics Project-1 (Edited)
No ratings yet
Avinash Mathematics Project-1 (Edited)
20 pages
Basic rules of combining probabilities notes
No ratings yet
Basic rules of combining probabilities notes
11 pages
2 Probability
No ratings yet
2 Probability
26 pages
Stastical Hydrology
No ratings yet
Stastical Hydrology
11 pages
Conditional Probability Function
No ratings yet
Conditional Probability Function
4 pages
Chapter One 1. Overview of Basic of Probability Theory 1.1
No ratings yet
Chapter One 1. Overview of Basic of Probability Theory 1.1
11 pages
Probability: Dr. Shahid Ali
No ratings yet
Probability: Dr. Shahid Ali
28 pages
Probability and Prob - Dist.
No ratings yet
Probability and Prob - Dist.
37 pages
Module 2
No ratings yet
Module 2
12 pages
Handling Uncertainty
No ratings yet
Handling Uncertainty
13 pages
Seeing Theory
No ratings yet
Seeing Theory
66 pages
Probabilistic Reasoning (Unit-3)
No ratings yet
Probabilistic Reasoning (Unit-3)
21 pages
CH 6 - Probability
No ratings yet
CH 6 - Probability
7 pages
STT 430/630/ES 760 Lecture Notes: Chapter 3: Probability
No ratings yet
STT 430/630/ES 760 Lecture Notes: Chapter 3: Probability
15 pages
Probability Theory - Counting
No ratings yet
Probability Theory - Counting
8 pages
Fundamental concepts of probability
No ratings yet
Fundamental concepts of probability
12 pages
AGE 200 Qualitative and Quantitative Techniques in Geography
No ratings yet
AGE 200 Qualitative and Quantitative Techniques in Geography
54 pages
Scott Cunningham Causal Inference 2020 SEGUNDA PARTE
No ratings yet
Scott Cunningham Causal Inference 2020 SEGUNDA PARTE
107 pages
Introduction to probability
No ratings yet
Introduction to probability
38 pages
IAI: Treatment of Uncertainty
No ratings yet
IAI: Treatment of Uncertainty
20 pages
Rule of Succession
No ratings yet
Rule of Succession
9 pages
excersie one answers
No ratings yet
excersie one answers
6 pages
College of Engineering Department of Electrical and Computer Engineering (Electronics and Communication Stream)
No ratings yet
College of Engineering Department of Electrical and Computer Engineering (Electronics and Communication Stream)
41 pages
Basic Probability
No ratings yet
Basic Probability
14 pages
udayagiriiiii (1)
No ratings yet
udayagiriiiii (1)
13 pages
Probability_Introduction_SgM
No ratings yet
Probability_Introduction_SgM
23 pages
Probabilistic Reasoning
No ratings yet
Probabilistic Reasoning
9 pages
Basic Concepts of Probability
No ratings yet
Basic Concepts of Probability
112 pages
Practical Data Science: Basic Concepts of Probability
No ratings yet
Practical Data Science: Basic Concepts of Probability
5 pages
DOC-20241223-WA0001
No ratings yet
DOC-20241223-WA0001
14 pages
Probabilistic Reasoning in Artificial Intelligence
No ratings yet
Probabilistic Reasoning in Artificial Intelligence
5 pages
BAYES Theorem
From Everand
BAYES Theorem
Jeffery Short
2/5 (5)
Probability Distributions: Six Sigma Thinking, #5
From Everand
Probability Distributions: Six Sigma Thinking, #5
Sumeet Savant
No ratings yet
Probability with Permutations: An Introduction To Probability And Combinations
From Everand
Probability with Permutations: An Introduction To Probability And Combinations
Steven Taylor
No ratings yet
DCN Lec 17
No ratings yet
DCN Lec 17
40 pages
1 Introduction To Finite Element Methods For Electromagnetic Fields and Coupled Problems
No ratings yet
1 Introduction To Finite Element Methods For Electromagnetic Fields and Coupled Problems
128 pages
Model Question Paper-1
No ratings yet
Model Question Paper-1
3 pages
Report Lab 4: Digital Signal Processing Laboratory
No ratings yet
Report Lab 4: Digital Signal Processing Laboratory
18 pages
AlgoTrading101 - Black Algo Method
No ratings yet
AlgoTrading101 - Black Algo Method
8 pages
M1 Ch4_1
No ratings yet
M1 Ch4_1
12 pages
AN1823 Application Note: Error Correction Code in NAND Flash Memories
No ratings yet
AN1823 Application Note: Error Correction Code in NAND Flash Memories
1 page
Numerical Solution For Diffusion Equation
No ratings yet
Numerical Solution For Diffusion Equation
5 pages
CS Unit 1 EoT Exam Solutions
No ratings yet
CS Unit 1 EoT Exam Solutions
5 pages
Inverse Kinematics For A Rhino Robot
No ratings yet
Inverse Kinematics For A Rhino Robot
11 pages
ELEC301x Review Lecture Notes
No ratings yet
ELEC301x Review Lecture Notes
12 pages
RSA - Public Key Cryptography Algorithm (Project Id: P12) Karthik Karuppaiya Graduate Student Computer Science Department Arizona State University
No ratings yet
RSA - Public Key Cryptography Algorithm (Project Id: P12) Karthik Karuppaiya Graduate Student Computer Science Department Arizona State University
18 pages
Linear Algebra Paper Solution
No ratings yet
Linear Algebra Paper Solution
3 pages
A NURBS Enhanced Extended Finite Element Approach For Unfitted CAD Analysis
No ratings yet
A NURBS Enhanced Extended Finite Element Approach For Unfitted CAD Analysis
17 pages
Normal Distribution Lognormal Distribution
No ratings yet
Normal Distribution Lognormal Distribution
4 pages
3RD SOLVES PROBLEMS INVOLVING VARIATIONS
No ratings yet
3RD SOLVES PROBLEMS INVOLVING VARIATIONS
4 pages
Bayesian-inference-slides-2021
No ratings yet
Bayesian-inference-slides-2021
37 pages
Structural Theory 2
100% (1)
Structural Theory 2
12 pages
Research On The Topics of Neutrosophic Operations Research. Volume 1
No ratings yet
Research On The Topics of Neutrosophic Operations Research. Volume 1
366 pages
General Physics Lesson Exemplar
No ratings yet
General Physics Lesson Exemplar
9 pages
Chapter 6 - Solutions Problem 1:: SOM 306 - Operations Management A. Dechter
No ratings yet
Chapter 6 - Solutions Problem 1:: SOM 306 - Operations Management A. Dechter
5 pages
Final Exam
No ratings yet
Final Exam
4 pages
Chandan Mukherjee, Howard White, Marc Wuyts - (1998) Econometrics and Data Analysis For Developing Countries-Routledge
100% (2)
Chandan Mukherjee, Howard White, Marc Wuyts - (1998) Econometrics and Data Analysis For Developing Countries-Routledge
515 pages
Chapter 8 - The Geometric Distribution
No ratings yet
Chapter 8 - The Geometric Distribution
3 pages
Predictive Maintenance Matlab
No ratings yet
Predictive Maintenance Matlab
71 pages
Multiple Regression
No ratings yet
Multiple Regression
6 pages
SLB 3
No ratings yet
SLB 3
1 page
1822 B.E Cse Batchno 242
No ratings yet
1822 B.E Cse Batchno 242
54 pages
KALMAN Grewal
No ratings yet
KALMAN Grewal
7 pages

Lecture 2 - CS50_s Introduction to Artificial Intelligence with Python

Uploaded by

Lecture 2 - CS50_s Introduction to Artificial Intelligence with Python

Uploaded by

CS50’s Introduction to Artificial Intelligence with Python

David J. Malan (https://cs.harvard.edu/malan/)

 P(Flight = on time) = 0.6

 80% of rainy afternoons start with cloudy mornings, or P(clouds | rain).

C = cloud 0.08 0.32

C = ¬cloud 0.02 0.58

Marginalization can be expressed for random variables the following way:

 They are directed graphs.

none light heavy

light 0.2 0.8

heavy 0.1 0.9

none no 0.9 0.1

light yes 0.6 0.4

light no 0.7 0.3

heavy yes 0.4 0.6

heavy no 0.5 0.5

 Appointment is a random variable that represents whether we attend our appointment,

delayed 0.6 0.4

Inference by enumeration is a process of finding the probability distribution of variable X given

from pomegranate import *

# Rain node has no parents

# Track maintenance node is conditional on rain

# Train node is conditional on rain and maintenance

# Appointment node is conditional on train

# Create a Bayesian Network and add states

# Add edges connecting nodes

# Calculate probability for a given observation

# Print predictions for each node

from collections import Counter

from model import model

# Mapping of random variable name to sample generated

# Mapping of distribution to sample generated

# Loop over all states, assuming topological order

# If we have a non-root node, sample conditional on parents

# Otherwise, just sample from the distribution alone

# Keep track of the sampled value in the parents mapping

# Return generated sample

# Repeat sampling 10,000 times

# Generate a sample based on the function that we defined earlier

 Start by fixing the values for evidence variables.

The Markov Assumption

from pomegranate import *

# Define starting probabilities

# Define transition model

# Create Markov chain

# Sample 50 states from chain

Hidden Markov Models

Sensor Markov Assumption

from pomegranate import *

# Observation model for each state

states = [sun, rain]

# Create the model

from model import model

# Predict underlying states

You might also like