A Beginner's Guide to Probability and Bayesian Reasoning in Python
Background
Probability theory is the mathematical framework for quantifying uncertainty and making decisions under uncertainty. It forms the foundation of many modern AI techniques, including Bayesian inference and machine learning classifiers.
In this article, we’ll explore fundamental concepts of probability, key axioms, and theorems like Bayes’ theorem. Then, we’ll discuss how these principles underpin the Naive Bayes classifier and Bayesian belief networks. Finally, you’ll see practical Python code implementing these ideas with a classic dataset, predicting whether to play tennis given weather conditions.
Some highlights we will discuss,
The foundational axioms of probability
Key probability formulas
Reasoning with uncertainty using Bayes' Theorem, Naive Bayes Classifier, and Bayesian Belief Networks (BBNs)
Python code to experiment with PlayTennis data
Core Probability Concepts and Axioms
At its essence, probability measures the likelihood of an event occurring and ranges between 0 and 1.
The three axioms of probability are:
Non-negativity: For any event A, P(A) ≥ 0
Normalisation: The probability of the sample space S is P(S) = 1
Additivity: For mutually exclusive events A and B, P(A ∪ B) = P(A) + P(B)
These are the core assumptions any probability function must satisfy.
Important Formulas
Conditional probability:
Sometimes, we want to find out the chance (probability) of something happening only if we know that something else has already happened. For example, imagine you want to know the chance it will rain if you see dark clouds in the sky.
Where:
• P(A|B) means: the probability of A happening, given that B has happened. • P(A ∩ B) means: the probability that both A and B happen together. • P(B) means: the probability that B happens.
What does it mean?
To find the chance of A happening when B is true, you look at how often both A and B happen together.
Then, you divide that by how often B happens at all.
This works only if P(B) > 0, meaning B has a chance to happen, so it makes sense to talk about “given B.”
Example:
A = It rains.
B = There are dark clouds.
If the chance of both rain and dark clouds is 0.3, and the chance of dark clouds is 0.5, then:
P(rain | dark clouds) = 0.3 / 0.5 = 0.6
So, when you see dark clouds, there is a 60% chance of rain.
Bayes’ theorem:
Bayes’ theorem allows us to update our beliefs about an event based on new evidence. It’s expressed as:
Where:
P(A | B) is the probability of event A given event B has occurred.
P(B | A) is the probability of event B given event A is true.
P(A) is the initial probability of event A.
P(B) is the probability of event B.
Bayes’ theorem is widely used in fields like data science, machine learning, and decision-making to make better predictions with updated information.
Naive Bayes: Handling Multiple Features
Naive Bayes is a powerful and simple algorithm used in classification problems, especially when dealing with multiple features.
The core idea is captured by this formula:
Where:
C is the class or category we want to predict.
X₁, …, Xₙ are the features or attributes.
P(C) is the prior probability of the class.
P(Xᵢ | C) is the probability of each feature given the class.
The symbol Π means multiplying the probabilities for all features together.
This “naive” assumption — that features are independent — makes the math simple and efficient, yet it often performs remarkably well in practice.
Bayesian Reasoning and Naive Bayes Classifier
Bayesian reasoning uses Bayes’ theorem to update our beliefs based on new evidence. Building on this, the Naive Bayes classifier assumes all features are conditionally independent given the class label, which simplifies calculations significantly.
The classifier works by finding the class C that maximises:
Where:
C is the class label (e.g., PlayTennis = Yes/No)
Xᵢ are the features (e.g., Outlook, Temperature)
Despite the strong independence assumption between features, Naive Bayes often delivers surprisingly accurate results in many real-world applications, from spam detection to medical diagnosis.
Bayesian Belief Network (BBN)
A Bayesian network is a graphical model representing variables and their conditional dependencies via a directed acyclic graph (DAG). It generalises naive Bayes by modelling more complex dependencies among variables. Each node represents a variable (e.g., Outlook, Temperature), and edges denote conditional dependencies.
Tennis Dataset Bayesian Network Diagram
Nodes: Variables (Outlook, Temperature, Humidity, Wind, PlayTennis)
Edges: Directed connections showing causal or conditional dependency (e.g., Outlook influences Temperature, Humidity, and Wind, and all influence PlayTennis)
This structure models how the weather conditions probabilistically influence the decision to play tennis.
Data Source
Our example uses the classic Play Tennis dataset, which records weather conditions along with a decision to play tennis or not:
Python Implementation: Tennis Probability Predictor
Here’s a practical Python class that implements:
Data preparation with prior and conditional probabilities
Naive Bayes classification
Queries for most likely conditions given the PlayTennis value
Normalised posterior probability distribution for PlayTennis given conditions
Examples Walkthrough
Q1: Find the probability of playing tennis under the given conditions.
Output: We use a Naive Bayes classifier, and the output is 0.0051.
Q2: What is the distribution over PlayTennis if Wind=Strong?
Q2: Most Likely Conditions Given PlayTennis=No?
Conclusion
Probability theory provides a rigorous foundation for reasoning under uncertainty. Bayes’ theorem enables us to update beliefs given new data. Naive Bayes classifiers, despite their simplicity, are powerful tools for classification tasks.
This article walked through the theory, formulas, and practical implementation on a familiar dataset predicting tennis playability. Bayesian networks provide a more flexible framework to capture dependencies among variables beyond naive assumptions.
If you want to deepen your knowledge or develop probabilistic models, understanding these concepts is essential.