A Beginner's Guide to Probability and Bayesian Reasoning in Python

Aqib Javed

AI Enthusiast | VP Backend Eng @ JPMorgan | Ex - Amazonian | Ex - Morgan Stanley

Published Jul 1, 2025

Background

Probability theory is the mathematical framework for quantifying uncertainty and making decisions under uncertainty. It forms the foundation of many modern AI techniques, including Bayesian inference and machine learning classifiers.

In this article, we’ll explore fundamental concepts of probability, key axioms, and theorems like Bayes’ theorem. Then, we’ll discuss how these principles underpin the Naive Bayes classifier and Bayesian belief networks. Finally, you’ll see practical Python code implementing these ideas with a classic dataset, predicting whether to play tennis given weather conditions.

Some highlights we will discuss,

The foundational axioms of probability
Key probability formulas
Reasoning with uncertainty using Bayes' Theorem, Naive Bayes Classifier, and Bayesian Belief Networks (BBNs)
Python code to experiment with PlayTennis data

Core Probability Concepts and Axioms

At its essence, probability measures the likelihood of an event occurring and ranges between 0 and 1.

The three axioms of probability are:

Non-negativity: For any event A, P(A) ≥ 0
Normalisation: The probability of the sample space S is P(S) = 1
Additivity: For mutually exclusive events A and B, P(A ∪ B) = P(A) + P(B)

These are the core assumptions any probability function must satisfy.

Important Formulas

Conditional probability:

Sometimes, we want to find out the chance (probability) of something happening only if we know that something else has already happened. For example, imagine you want to know the chance it will rain if you see dark clouds in the sky.

Where:

• P(A|B) means: the probability of A happening, given that B has happened. • P(A ∩ B) means: the probability that both A and B happen together. • P(B) means: the probability that B happens.

What does it mean?

To find the chance of A happening when B is true, you look at how often both A and B happen together.
Then, you divide that by how often B happens at all.
This works only if P(B) > 0, meaning B has a chance to happen, so it makes sense to talk about “given B.”

Example:

A = It rains.
B = There are dark clouds.

If the chance of both rain and dark clouds is 0.3, and the chance of dark clouds is 0.5, then:

P(rain | dark clouds) = 0.3 / 0.5 = 0.6

So, when you see dark clouds, there is a 60% chance of rain.

Bayes’ theorem:

Bayes’ theorem allows us to update our beliefs about an event based on new evidence. It’s expressed as:

Where:

P(A | B) is the probability of event A given event B has occurred.
P(B | A) is the probability of event B given event A is true.
P(A) is the initial probability of event A.
P(B) is the probability of event B.

Bayes’ theorem is widely used in fields like data science, machine learning, and decision-making to make better predictions with updated information.

Naive Bayes: Handling Multiple Features

Naive Bayes is a powerful and simple algorithm used in classification problems, especially when dealing with multiple features.

The core idea is captured by this formula:

Where:

C is the class or category we want to predict.
X₁, …, Xₙ are the features or attributes.
P(C) is the prior probability of the class.
P(Xᵢ | C) is the probability of each feature given the class.
The symbol Π means multiplying the probabilities for all features together.

This “naive” assumption — that features are independent — makes the math simple and efficient, yet it often performs remarkably well in practice.

Bayesian Reasoning and Naive Bayes Classifier

Bayesian reasoning uses Bayes’ theorem to update our beliefs based on new evidence. Building on this, the Naive Bayes classifier assumes all features are conditionally independent given the class label, which simplifies calculations significantly.

The classifier works by finding the class C that maximises:

Where:

C is the class label (e.g., PlayTennis = Yes/No)
Xᵢ are the features (e.g., Outlook, Temperature)

Despite the strong independence assumption between features, Naive Bayes often delivers surprisingly accurate results in many real-world applications, from spam detection to medical diagnosis.

Bayesian Belief Network (BBN)

A Bayesian network is a graphical model representing variables and their conditional dependencies via a directed acyclic graph (DAG). It generalises naive Bayes by modelling more complex dependencies among variables. Each node represents a variable (e.g., Outlook, Temperature), and edges denote conditional dependencies.

Tennis Dataset Bayesian Network Diagram

Nodes: Variables (Outlook, Temperature, Humidity, Wind, PlayTennis)
Edges: Directed connections showing causal or conditional dependency (e.g., Outlook influences Temperature, Humidity, and Wind, and all influence PlayTennis)

This structure models how the weather conditions probabilistically influence the decision to play tennis.

Data Source

Our example uses the classic Play Tennis dataset, which records weather conditions along with a decision to play tennis or not:

Python Implementation: Tennis Probability Predictor

Here’s a practical Python class that implements:

Data preparation with prior and conditional probabilities
Naive Bayes classification
Queries for most likely conditions given the PlayTennis value
Normalised posterior probability distribution for PlayTennis given conditions

Examples Walkthrough

Q1: Find the probability of playing tennis under the given conditions.

Output: We use a Naive Bayes classifier, and the output is 0.0051.

Q2: What is the distribution over PlayTennis if Wind=Strong?

Q2: Most Likely Conditions Given PlayTennis=No?

Conclusion

Probability theory provides a rigorous foundation for reasoning under uncertainty. Bayes’ theorem enables us to update beliefs given new data. Naive Bayes classifiers, despite their simplicity, are powerful tools for classification tasks.

This article walked through the theory, formulas, and practical implementation on a familiar dataset predicting tennis playability. Bayesian networks provide a more flexible framework to capture dependencies among variables beyond naive assumptions.

If you want to deepen your knowledge or develop probabilistic models, understanding these concepts is essential.

A Beginner's Guide to Probability and Bayesian Reasoning in Python

Aqib Javed

AI Enthusiast | VP Backend Eng @ JPMorgan | Ex - Amazonian | Ex - Morgan Stanley

Background

Core Probability Concepts and Axioms

Important Formulas

Conditional probability:

Bayes’ theorem:

Naive Bayes: Handling Multiple Features

Bayesian Reasoning and Naive Bayes Classifier

Bayesian Belief Network (BBN)

Tennis Dataset Bayesian Network Diagram

Data Source

Python Implementation: Tennis Probability Predictor

Examples Walkthrough

Conclusion

More articles by this author

Others also viewed

⚙️ PYTHON – ARIMA Modeling: Parameter Selection (p, d, q)

Understanding Bayesian with Examples In Python

🧠 PYTHON + TIME SERIES – Correcting Outliers with Z-Score and Linear Interpolation

IV Implementing a Systemic Dimensional Cyberprofiling Model in Python

Is Python Finally Too Slow for Modern AI?

Why AI Platforms Favor Python and Its Potential to Dominate Future Programming

📊 PYTHON + AI TIP 🧮 How Does the Machine Learn by Proximity – Mathematics and Implementation of KNN (K Nearest Neighbors)

Supervised Machine Learning With Python: Classification. Gaussian Naïve Bayes

Why Python is popular among Machine learning and AI Community?

SIMPLE LINEAR REGRESSION IN PYTHON :

Explore topics

Background

Core Probability Concepts and Axioms

Important Formulas

Conditional probability:

Bayes’ theorem:

Naive Bayes: Handling Multiple Features

Bayesian Reasoning and Naive Bayes Classifier

Bayesian Belief Network (BBN)

Tennis Dataset Bayesian Network Diagram

Data Source

Python Implementation: Tennis Probability Predictor

Examples Walkthrough

Conclusion

Genetic Algorithm

Jul 25, 2025

Uninformed Search Algorithms In AI

Jun 23, 2025

Types of AI Agents

Jun 20, 2025

Agents & Environments

Jun 19, 2025

Artificial Intelligence - The State of the Art 🫡

May 22, 2025

Air Traffic Coordination Algorithm

Mar 8, 2025

Quagmire of N Queens Problem

Feb 22, 2025

Create Smaller Docker Images For Spring Boot Using JLink

Aug 2, 2024

Graphs in Java

Dec 30, 2022

Introduction To Graphs

Dec 25, 2022

Others also viewed

⚙️ PYTHON – ARIMA Modeling: Parameter Selection (p, d, q)

Understanding Bayesian with Examples In Python

🧠 PYTHON + TIME SERIES – Correcting Outliers with Z-Score and Linear Interpolation

IV Implementing a Systemic Dimensional Cyberprofiling Model in Python

Is Python Finally Too Slow for Modern AI?

Why AI Platforms Favor Python and Its Potential to Dominate Future Programming

📊 PYTHON + AI TIP 🧮 How Does the Machine Learn by Proximity – Mathematics and Implementation of KNN (K Nearest Neighbors)

Supervised Machine Learning With Python: Classification. Gaussian Naïve Bayes

Why Python is popular among Machine learning and AI Community?

SIMPLE LINEAR REGRESSION IN PYTHON :

Explore topics