Mathematics for Machine Learning
Mathematics for Machine Learning
Mathematics for
Machine Learning
with
Dr. Naveed R. Butt
@
GIKI - FES
Recall…
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 2
Recall…
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 3
Recall…
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 4
Recall…
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 5
Our next module is all about that…
Probability
- Modeling Uncertainties
- Collecting Probabilities (distributions)
- Extracting Key Indicators (moments)
- Reviewing Common Distributions
- Estimating Parameters
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 6
What is Probability?
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 7
What is Probability?
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 8
What is Probability?
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 9
What is Probability?
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 10
What is Probability?
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 11
Why Do We Sometimes Lack Knowledge?
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 12
Why Do We Sometimes Lack Knowledge?
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 13
Why Do We Sometimes Lack Knowledge?
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 14
Why Do We Sometimes Lack Knowledge?
Another Example
- To perform facial recognition, we cannot ask a person to provide thousands of their photos
(different moods, lighting conditions, times of day, grooming levels)
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 15
Why Do We Sometimes Lack Knowledge?
Quantum Randomness
- Where’s the electron?
- According to current consensus, processes and
properties at quantum level are probabilistic by their
very nature.
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 16
Our way of making sense of the uncertain
Statistics world through whatever data we have…
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 17
Our way of making sense of the uncertain
Statistics world through whatever data we have…
What’s the height of the next student who enters the room?
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 18
Our way of making sense of the uncertain
Statistics world through whatever data we have…
What’s the height of the next student who enters the room?
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 19
Our way of making sense of the uncertain
Statistics world through whatever data we have…
What’s the height of the next student who enters the room?
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 20
Our way of making sense of the uncertain
Statistics world through whatever data we have…
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 21
Our way of making sense of the uncertain
Statistics world through whatever data we have…
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 22
Our way of making sense of the uncertain
Statistics world through whatever data we have…
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 23
Using Probability & Statistics …to make design decisions
based on educated guesses…
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 24
Using Probability & Statistics …to make design decisions
based on educated guesses…
Examples
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 25
Using Probability & Statistics …to make design decisions
based on educated guesses…
Examples
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 26
Using Probability & Statistics …to make design decisions
based on educated guesses…
Examples
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 27
How Do We Assign Probabilities?
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 28
A Tale of Two Philosophies
Frequentist
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 30
Assigning Probabilities – Four Steps
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 31
Assigning Probabilities – Four Steps
1. Clearly define your experiment…
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 32
Assigning Probabilities – Four Steps
1. Clearly define your experiment…
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 33
Assigning Probabilities – Four Steps
1. Clearly define your experiment…
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 34
Assigning Probabilities – Four Steps
1. Clearly define your experiment…
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 35
Axioms of Probability (Kolmogorov)
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 36
Axioms of Probability (Kolmogorov)
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 37
Axioms of Probability (Kolmogorov)
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 38
For the case of infinite number of possible
events, Axiom 3 is replaced by Axiom 4.
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 39
Formally speaking…
Ω 𝜔1 𝜔2
𝜔3
𝜔4 𝜔5
Sample Space
…
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 40
Formally speaking…
Ω 𝜔1 𝜔2
𝜔3
𝜔4 𝜔5
Sample Space
…
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 41
Formally speaking…
Probability Measure
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 42
Formally speaking…
Probability Measure
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 43
Events - Some Terminology and Results
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 44
Events - Some Terminology and Results
Sample Space
- Set of all possible distinct events
Ω
𝜔1 𝜔2
𝜔3
𝜔4 𝜔5
…
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 45
Events - Some Terminology and Results
Sample Space
- Set of all possible distinct events
Ω
𝜔1 𝜔2
𝜔3
𝜔4 𝜔5
…
𝜔𝑖 = Sample Point
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 46
Events - Some Terminology and Results
Sample Space Simple Event
𝜔3
𝜔4 𝜔5
…
𝜔𝑖 = Sample Point
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 47
Events - Some Terminology and Results
Sample Space Simple Event
𝜔𝑖 = Sample Point
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 48
Events - Some Terminology and Results
Sample Space Simple Event
… Null Event
- An event containing no sample
points
𝜔𝑖 = Sample Point - E.g., 𝐶 = { } is a null event
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 49
Events - Some Terminology and Results
Ω
𝜔1 𝜔2
𝜔3
Sure Event
𝜔4 𝜔5
- An event consisting of all
the sample points …
- E.g., 𝐶 = Ω is a sure event
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 50
Events - Some Terminology and Results
Ω
𝜔1 𝜔2
𝜔3
Sure Event
𝜔4 𝜔5
- An event consisting of all
the sample points …
- E.g., 𝐶 = Ω is a sure event
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 51
Events - Some Terminology and Results
Ω
𝜔1 𝜔2
𝜔3
Sure Event
𝜔4 𝜔5
- An event consisting of all
the sample points …
- E.g., 𝐶 = Ω is a sure event
Mutually Exclusive Events
Equally Likely Events
- Events that cannot occur at the same time
- Events having the same - E.g., in the depicted case, 𝐴 = {𝜔1 , 𝜔2 } and
probability of occurring 𝐵 = {𝜔3 , 𝜔4 } then clearly 𝐴 and 𝐵 cannot
- E.g., if 𝑃 𝐴 = 𝑃[𝐵] then events occur at the same time.
𝐴 and 𝐵 are equally likely
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 52
Events - Some Terminology and Results
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 53
Events - Some Terminology and Results
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 54
Events - Some Terminology and Results
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 55
Events - Some Terminology and Results
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 56
Events - Some Terminology and Results
de Morgan’s Laws
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 57
We are often interested in finding
Joint Probability probability of two events occurring at the
same time. This is called “Joint Probability”
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 58
We are often interested in finding
Joint Probability probability of two events occurring at the
same time. This is called “Joint Probability”
𝑃 𝐴 𝑎𝑛𝑑 𝐵 = 𝑃 𝐴 ∩ 𝐵 = 𝑃[𝐴, 𝐵]
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 59
We are often interested in finding
Joint Probability probability of two events occurring at the
same time. This is called “Joint Probability”
𝑃 𝐴 𝑎𝑛𝑑 𝐵 = 𝑃 𝐴 ∩ 𝐵 = 𝑃[𝐴, 𝐵]
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 60
We are often interested in finding
Joint Probability probability of two events occurring at the
same time. This is called “Joint Probability”
𝑃 𝐴 𝑎𝑛𝑑 𝐵 = 𝑃 𝐴 ∩ 𝐵 = 𝑃[𝐴, 𝐵]
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 61
Sometimes knowledge of one random event can help us
Conditional Probability assign probability to another random event – for this
the concept of “conditional probability” comes in handy.
𝐴 = 𝐴𝑙𝑖 𝑖𝑠 𝑖𝑛 𝑙𝑒𝑐𝑡𝑢𝑟𝑒 𝑎𝑡 8 𝑎𝑚
𝐵 = {𝐴𝑙𝑖 𝑖𝑠 𝑠𝑙𝑒𝑒𝑝𝑖𝑛𝑔 𝑎𝑡 8 𝑎𝑚}
Suppose I tell you that 𝐴 has occurred (i.e., Ali is in lecture at 8 am),
now what is the probability that he is sleeping at 8 am?
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 62
Understanding Conditional Probability
- Suppose I tell you that I’ve written an integer from 1 to 4 on a piece of
paper, but do not tell you the number.
- Under the assumption (“belief”) that I could have picked any of the four
numbers with equal chances, what is the probability that I wrote 3?
𝐴= 3 , 𝑃 𝐴 =?
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 63
Understanding Conditional Probability
- Suppose I tell you that I’ve written an integer from 1 to 4 on a piece of
paper, but do not tell you the number.
- Under the assumption (“belief”) that I could have picked any of the four
numbers with equal chances, what is the probability that I wrote 3?
𝐴= 3 , 𝑃 𝐴 =? 1
𝑃𝐴 =
4
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 64
Understanding Conditional Probability
- Suppose I tell you that I’ve written an integer from 1 to 4 on a piece of
paper, but do not tell you the number.
- Under the assumption (“belief”) that I could have picked any of the four
numbers with equal chances, what is the probability that I wrote 3?
𝐴= 3 , 𝑃 𝐴 =? 1
𝑃𝐴 =
4
𝑃 𝐴 𝑔𝑖𝑣𝑒𝑛 𝐵 = 𝑃 𝐴 𝐵 = ?
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 65
Understanding Conditional Probability
- Suppose I tell you that I’ve written an integer from 1 to 4 on a piece of
paper, but do not tell you the number.
- Under the assumption (“belief”) that I could have picked any of the four
numbers with equal chances, what is the probability that I wrote 3?
𝐴= 3 , 𝑃 𝐴 =? 1
𝑃𝐴 =
4
𝑃 𝐴 𝑔𝑖𝑣𝑒𝑛 𝐵 = 𝑃 𝐴 𝐵 = ? 1
𝑃𝐴𝐵 =
2
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 66
Understanding Conditional Probability What just happened? Let’s look at
the sample space.
- Suppose I tell you that I’ve written an integer from 1 to 4 on a piece of
paper, but do not tell you the number.
Sample Space before
Ω additional information.
- Under the assumption (“belief”) that I could have picked any of the four
numbers with equal chances, what is the probability that I wrote 3? 1 2
𝐴= 3 , 𝑃 𝐴 =? 𝑃𝐴 =
1 3 4
4
𝑃 𝐴 𝑔𝑖𝑣𝑒𝑛 𝐵 = 𝑃 𝐴 𝐵 = ? 1
𝑃𝐴𝐵 =
2
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 67
Understanding Conditional Probability What just happened? Let’s look at
the sample space.
- Suppose I tell you that I’ve written an integer from 1 to 4 on a piece of
paper, but do not tell you the number.
Sample Space before
Ω additional information.
- Under the assumption (“belief”) that I could have picked any of the four
numbers with equal chances, what is the probability that I wrote 3? 1 2
𝐴= 3 , 𝑃 𝐴 =? 𝑃𝐴 =
1 3 4
4
𝑃 𝐴 𝑔𝑖𝑣𝑒𝑛 𝐵 = 𝑃 𝐴 𝐵 = ? 1
𝑃𝐴𝐵 =
2
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 68
Understanding Conditional Probability What just happened? Let’s look at
the sample space.
- Suppose I tell you that I’ve written an integer from 1 to 4 on a piece of
paper, but do not tell you the number.
Sample Space before
Ω additional information.
- Under the assumption (“belief”) that I could have picked any of the four
numbers with equal chances, what is the probability that I wrote 3? 1 2
𝐴= 3 , 𝑃 𝐴 =? 𝑃𝐴 =
1 3 4
4
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 69
Understanding Conditional Probability
Q. What if the additional information I give you is “I
wrote a positive integer”, or “I had cake for breakfast”
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 70
Understanding Conditional Probability
Q. What if the additional information I give you is “I
wrote a positive integer”, or “I had cake for breakfast”
Clearly…
𝑃 𝐴𝐶 =𝑃 𝐴
𝑃 𝐴 𝐷 = 𝑃[𝐴]
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 71
Understanding Conditional Probability
Q. What if the additional information I give you is “I
wrote a positive integer”, or “I had cake for breakfast”
Clearly…
𝑃 𝐴𝐶 =𝑃 𝐴
𝑃 𝐴 𝐷 = 𝑃[𝐴]
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 72
Understanding Conditional Probability
Q. What if the additional information I give you is “I
wrote a positive integer”, or “I had cake for breakfast”
1 2
Clearly… 4
3
𝑃 𝐴𝐶 =𝑃 𝐴
𝑃 𝐴 𝐷 = 𝑃[𝐴]
Sample Space after
Ωnew additional information.
Why? Note that the two new pieces of
information are quite useless/irrelevant in the
sense that they fail to shrink the sample space!
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 73
Linking Conditional and Joint Probabilities
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 74
Linking Conditional and Joint Probabilities
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 75
Linking Conditional and Joint Probabilities
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 76
Linking Conditional and Joint Probabilities
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 77
Statistical Independence (Independent Events)
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 78
Statistical Independence (Independent Events)
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 79
Statistical Independence (Independent Events)
𝑃 𝐴 𝐶 = 𝑃[𝐴]
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 80
Statistical Independence (Independent Events)
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 81
Statistical Independence (Independent Events)
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 82
Statistical Independence (Independent Events)
In fact, this is one of the primary ways of checking independence
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 83
Statistical Independence (Independent Events)
In fact, this is one of the primary ways of checking independence
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 84
Statistical Independence (Independent Events)
In fact, this is one of the primary ways of checking independence
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 85
Statistical Independence (Independent Events)
In fact, this is one of the primary ways of checking independence
Recall 𝑃 𝐴 𝐵 = 𝑃 𝐴 ∩ 𝐵 /𝑃[𝐵]
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 86
Some Examples and Consequences of the Axioms
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 87
Some Examples and Consequences of the Axioms
Given the sample space Ω containing six simple events, can you find 𝑃[𝜔1 ]?
1
Is 𝑃 𝜔1 = 6 ? Ω 𝜔1 𝜔2
𝜔3 𝜔4
𝜔6 𝜔5
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 88
Some Examples and Consequences of the Axioms
Given the sample space Ω containing six simple events, can you find 𝑃[𝜔1 ]?
1
Is 𝑃 𝜔1 = 6 ? Ω 𝜔1 𝜔2
No! In fact, we cannot claim this unless we have the information (or are
𝜔3 𝜔4
ready to assume) that all the simple events here are equally likely!
𝜔6 𝜔5
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 89
Some Examples and Consequences of the Axioms
Given the sample space Ω containing six simple events, can you find 𝑃[𝜔1 ]?
1
Is 𝑃 𝜔1 = 6 ? Ω 𝜔1 𝜔2
No! In fact, we cannot claim this unless we have the information (or are
𝜔3 𝜔4
ready to assume) that all the simple events here are equally likely!
𝜔6 𝜔5
𝑃 𝜔1 = 𝑃 𝜔2 = ⋯ = 𝑃 𝜔6
1
In this case we can say that 𝑃 𝜔1 = 6
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 90
Some Examples and Consequences of the Axioms
Another Example
𝑃 𝜔1 = 𝑃 𝜔2 = 𝑃 𝜔3 = 5
1
Ω
𝜔1 𝜔2
- Find 𝑃[𝜔4 ]
- Are the events in Ω equally likely? 𝜔3 𝜔4
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 91
Some Examples and Consequences of the Axioms
Another Example
𝑃 𝜔1 = 𝑃 𝜔2 = 𝑃 𝜔3 = 5
1
Ω
𝜔1 𝜔2
- Find 𝑃[𝜔4 ]
- Are the events in Ω equally likely? 𝜔3 𝜔4
𝑃 𝜔1 + 𝑃 𝜔2 + 𝑃 𝜔3 + 𝑃 𝜔4 = 𝑃 Ω = 𝑃 𝑠𝑢𝑟𝑒 𝑒𝑣𝑒𝑛𝑡 = 1
This leads to
1 1 1 2
𝑃 𝜔4 = 1− − − =
5 5 5 5
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 92
Some Examples and Consequences of the Axioms
Another Example
𝑃 𝜔1 = 𝑃 𝜔2 = 𝑃 𝜔3 = 5
1
Ω
𝜔1 𝜔2
- Find 𝑃[𝜔4 ]
- Are the events in Ω equally likely? 𝜔3 𝜔4
𝑃 𝜔1 + 𝑃 𝜔2 + 𝑃 𝜔3 + 𝑃 𝜔4 = 𝑃 Ω = 𝑃 𝑠𝑢𝑟𝑒 𝑒𝑣𝑒𝑛𝑡 = 1
This leads to
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 93
Some Examples and Consequences of the Axioms
Another Example
Establishing whether events
Let’s say we have a sample space Ω containing four are equally likely or not…
simple events, and I give you the information that
- In practice, we either assume events to
𝑃 𝜔1 = 𝑃 𝜔2 = 𝑃 𝜔3 = 5
1
Ω be equally likely (“belief/experience”).
𝑃 𝜔1 + 𝑃 𝜔2 + 𝑃 𝜔3 + 𝑃 𝜔4 = 𝑃 Ω = 𝑃 𝑠𝑢𝑟𝑒 𝑒𝑣𝑒𝑛𝑡 = 1
This leads to
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 94
Probabilities of Mutually Exclusive and Overlapping Events Ω
𝜔1 𝜔2
𝒂𝒏𝒅 𝑣𝑠. 𝒐𝒓
𝜔3 𝜔4
𝜔6 𝐵
𝐴
𝜔5
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 95
Probabilities of Mutually Exclusive and Overlapping Events Ω
𝜔1 𝜔2
𝒂𝒏𝒅 𝑣𝑠. 𝒐𝒓
𝜔3 𝜔4
- 𝐴 and 𝐵 are compound events
𝜔6 𝐵
𝐴
𝜔5
- Clearly, the two cannot occur at the same time (since 𝐴 ∩ 𝐵 = ∅)
- 𝑃 𝐴 𝑎𝑛𝑑 𝐵 = 𝑃 𝐴 ∩ 𝐵 = 0
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 96
Probabilities of Mutually Exclusive and Overlapping Events Ω
𝜔1 𝜔2
𝒂𝒏𝒅 𝑣𝑠. 𝒐𝒓
𝜔3 𝜔4
- 𝐴 and 𝐵 are compound events
𝜔6 𝐵
𝐴
𝜔5
- Clearly, the two cannot occur at the same time (since 𝐴 ∩ 𝐵 = ∅)
- 𝑃 𝐴 𝑎𝑛𝑑 𝐵 = 𝑃 𝐴 ∩ 𝐵 = 0
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 97
Probabilities of Mutually Exclusive and Overlapping Events Ω
𝜔1 𝜔2
𝒂𝒏𝒅 𝑣𝑠. 𝒐𝒓
𝜔3 𝜔4
- 𝐴 and 𝐵 are compound events
𝜔6 𝐵
𝐴
𝜔5
- Clearly, the two cannot occur at the same time (since 𝐴 ∩ 𝐵 = ∅)
- 𝑃 𝐴 𝑎𝑛𝑑 𝐵 = 𝑃 𝐴 ∩ 𝐵 = 0
- 𝑃 𝐴 𝑜𝑟 𝐵 = 𝑃 𝐴 ∪ 𝐵 = 𝑃 𝐴 + 𝑃[𝐵]
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 98
Probabilities of Mutually Exclusive and Overlapping Events Ω
𝜔1 𝜔2
𝒂𝒏𝒅 𝑣𝑠. 𝒐𝒓
𝜔3 𝜔4
- 𝐴 and 𝐵 are compound events
𝜔6 𝐵
𝐴
𝜔5
- Clearly, the two cannot occur at the same time (since 𝐴 ∩ 𝐵 = ∅)
- 𝑃 𝐴 𝑎𝑛𝑑 𝐵 = 𝑃 𝐴 ∩ 𝐵 = 0
Interesting question: can mutually
exclusive events be independent?
- Such events, we’ve already defined as being “Mutually Exclusive”.
- 𝑃 𝐴 𝑜𝑟 𝐵 = 𝑃 𝐴 ∪ 𝐵 = 𝑃 𝐴 + 𝑃[𝐵]
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 99
Probabilities of Mutually Exclusive and Overlapping Events Ω
𝜔1 𝜔2
𝒂𝒏𝒅 𝑣𝑠. 𝒐𝒓
𝜔3 𝜔4
- 𝐴 and 𝐵 are compound events
𝜔6 𝐵
𝐴
𝜔5
- Clearly, the two cannot occur at the same time (since 𝐴 ∩ 𝐵 = ∅)
- 𝑃 𝐴 𝑎𝑛𝑑 𝐵 = 𝑃 𝐴 ∩ 𝐵 = 0
Interesting question: can mutually
exclusive events be independent?
- Such events, we’ve already defined as being “Mutually Exclusive”.
No! Since knowledge of one
occurring immediately changes the
But what about 𝑃 𝐴 ∪ 𝐵 ? probability of the other two zero.
- From the third axiom we know that for
mutually exclusive events we must have:
- 𝑃 𝐴 𝑜𝑟 𝐵 = 𝑃 𝐴 ∪ 𝐵 = 𝑃 𝐴 + 𝑃[𝐵]
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 100
Probabilities of Mutually Exclusive and Overlapping Events Ω
𝜔1 𝜔2
𝒂𝒏𝒅 𝑣𝑠. 𝒐𝒓
𝜔3 𝜔4
- 𝐴 and 𝐵 are compound events
𝜔6 𝐵
𝐴
𝜔5
- Clearly, the two cannot occur at the same time (since 𝐴 ∩ 𝐵 = ∅)
- 𝑃 𝐴 𝑎𝑛𝑑 𝐵 = 𝑃 𝐴 ∩ 𝐵 = 0
Interesting question: can mutually
exclusive events be independent?
- Such events, we’ve already defined as being “Mutually Exclusive”.
No! Since knowledge of one
occurring immediately changes the
But what about 𝑃 𝐴 ∪ 𝐵 ? probability of the other two zero.
- From the third axiom we know that for
mutually exclusive events we must have: In the example shown
- 𝑃 𝐴 𝑜𝑟 𝐵 = 𝑃 𝐴 ∪ 𝐵 = 𝑃 𝐴 + 𝑃[𝐵] 𝑃 𝐴 = 𝑃 𝜔1 + 𝑃 𝜔1 + 𝑃 𝜔5
But
𝑃 𝐴𝐵 =0
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 101
Probabilities of Mutually Exclusive and Overlapping Events 𝜔2
Ω
𝜔1
What if 𝐴 and 𝐵 are overlapping? i.e., 𝐴 ∩ 𝐵 ≠ ∅ 𝜔3 𝜔4
𝐴
𝜔6 𝐵
What about 𝑃[𝐴 𝑜𝑟 𝐵] now? 𝜔5
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 102
Probabilities of Mutually Exclusive and Overlapping Events 𝜔2
Ω
𝜔1
What if 𝐴 and 𝐵 are overlapping? i.e., 𝐴 ∩ 𝐵 ≠ ∅ 𝜔3 𝜔4
𝐴
𝜔6 𝐵
What about 𝑃[𝐴 𝑜𝑟 𝐵] now? 𝜔5
𝑃 𝐴 ∪ 𝐵 = 𝑃 𝐴 + 𝑃 𝐵 − 𝑃[𝐴 ∩ 𝐵]
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 103
Probabilities of Mutually Exclusive and Overlapping Events 𝜔2
Ω
𝜔1
What if 𝐴 and 𝐵 are overlapping? i.e., 𝐴 ∩ 𝐵 ≠ ∅ 𝜔3 𝜔4
𝐴
𝜔6 𝐵
What about 𝑃[𝐴 𝑜𝑟 𝐵] now? 𝜔5
𝑃 𝐴 ∪ 𝐵 = 𝑃 𝐴 + 𝑃 𝐵 − 𝑃[𝐴 ∩ 𝐵]
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 104
Probabilities of Mutually Exclusive and Overlapping Events 𝜔2
Ω
𝜔1
What if 𝐴 and 𝐵 are overlapping? i.e., 𝐴 ∩ 𝐵 ≠ ∅ 𝜔3 𝜔4
𝐴
𝜔6 𝐵
What about 𝑃[𝐴 𝑜𝑟 𝐵] now? 𝜔5
𝑃 𝐴 ∪ 𝐵 = 𝑃 𝐴 + 𝑃 𝐵 − 𝑃[𝐴 ∩ 𝐵]
𝐴−𝐴∩𝐵
𝜔1
𝐵
𝜔3 𝜔4
𝜔6
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 105
Probabilities of Mutually Exclusive and Overlapping Events 𝜔2
Ω
𝜔1
What if 𝐴 and 𝐵 are overlapping? i.e., 𝐴 ∩ 𝐵 ≠ ∅ 𝜔3 𝜔4
𝐴
𝜔6 𝐵
What about 𝑃[𝐴 𝑜𝑟 𝐵] now? 𝜔5
𝑃 𝐴 ∪ 𝐵 = 𝑃 𝐴 + 𝑃 𝐵 − 𝑃[𝐴 ∩ 𝐵]
𝐴
𝐴−𝐴∩𝐵
𝜔1 or 𝜔1
𝐵
𝜔3 𝜔4 𝜔3 𝜔4
𝜔6 𝜔6
𝐵−𝐴∩𝐵
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 106
Probabilities of Mutually Exclusive and Overlapping Events 𝜔2
Ω
𝜔1
What if 𝐴 and 𝐵 are overlapping? i.e., 𝐴 ∩ 𝐵 ≠ ∅ 𝜔3 𝜔4
𝐴
𝜔6 𝐵
What about 𝑃[𝐴 𝑜𝑟 𝐵] now? 𝜔5
𝑃 𝐴 ∪ 𝐵 = 𝑃 𝐴 + 𝑃 𝐵 − 𝑃[𝐴 ∩ 𝐵]
𝐴
𝐴−𝐴∩𝐵
𝜔1 or 𝜔1
𝐵
𝜔3 𝜔4 𝜔3 𝜔4
𝜔6 𝜔6
𝐵−𝐴∩𝐵
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 107
Probabilities of Mutually Exclusive and Overlapping Events
Another look
at 𝑃[𝐴 𝑜𝑟 𝐵]
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 108
Probabilities of Mutually Exclusive and Overlapping Events
Another look
at 𝑃[𝐴 𝑜𝑟 𝐵]
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 109
Some other interesting consequences of the axioms… (for any events 𝐴, 𝐵)
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 110
A Smart Use of Conditional Probabilities … Bayes’ Theorem
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 111
A Smart Use of Conditional Probabilities … Bayes’ Theorem
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 112
A Smart Use of Conditional Probabilities … Bayes’ Theorem
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 113
A Smart Use of Conditional Probabilities … Bayes’ Theorem
𝐴 = 𝛼 𝑓𝑎𝑖𝑙𝑠
𝐵 = {𝛽 𝑓𝑎𝑖𝑙𝑠}
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 114
A Smart Use of Conditional Probabilities … Bayes’ Theorem
𝐴 = 𝛼 𝑓𝑎𝑖𝑙𝑠
𝐵 = {𝛽 𝑓𝑎𝑖𝑙𝑠}
𝑃 𝐴 𝐵 =?
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 115
A Smart Use of Conditional Probabilities … Bayes’ Theorem We’ll use the previously discussed link
between conditional and joint probabilities
- Suppose that a certain system would fail if two of its
distinct components 𝛼 and 𝛽 both fail. 𝑃 𝐴∩𝐵
𝑃𝐴𝐵 = … (⋆)
𝑃𝐵
- If probability that 𝛼 fails is 0.01 and the probability that 𝛽
fails is 0.005, and probability that 𝛽 fails if 𝛼 has failed is
0.015, find the probability that 𝛼 will fail if 𝛽 has failed.
𝐴 = 𝛼 𝑓𝑎𝑖𝑙𝑠
𝐵 = {𝛽 𝑓𝑎𝑖𝑙𝑠}
𝑃 𝐴 𝐵 =?
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 116
A Smart Use of Conditional Probabilities … Bayes’ Theorem We’ll use the previously discussed link
between conditional and joint probabilities
- Suppose that a certain system would fail if two of its
distinct components 𝛼 and 𝛽 both fail. 𝑃 𝐴∩𝐵
𝑃𝐴𝐵 = … (⋆)
𝑃𝐵
- If probability that 𝛼 fails is 0.01 and the probability that 𝛽
fails is 0.005, and probability that 𝛽 fails if 𝛼 has failed is Though we do not have 𝑃[𝐴 ∩ 𝐵], we can
0.015, find the probability that 𝛼 will fail if 𝛽 has failed. derive it by using the conditional in reverse
𝑃 𝐴∩𝐵
Always good to define events first… 𝑃𝐵𝐴 =
𝑃𝐴
𝐴 = 𝛼 𝑓𝑎𝑖𝑙𝑠 ⇒ 𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐵 𝐴 𝑃[𝐴]
𝐵 = {𝛽 𝑓𝑎𝑖𝑙𝑠}
𝑃 𝐴 𝐵 =?
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 117
A Smart Use of Conditional Probabilities … Bayes’ Theorem We’ll use the previously discussed link
between conditional and joint probabilities
- Suppose that a certain system would fail if two of its
distinct components 𝛼 and 𝛽 both fail. 𝑃 𝐴∩𝐵
𝑃𝐴𝐵 = … (⋆)
𝑃𝐵
- If probability that 𝛼 fails is 0.01 and the probability that 𝛽
fails is 0.005, and probability that 𝛽 fails if 𝛼 has failed is Though we do not have 𝑃[𝐴 ∩ 𝐵], we can
0.015, find the probability that 𝛼 will fail if 𝛽 has failed. derive it by using the conditional in reverse
𝑃 𝐴∩𝐵
Always good to define events first… 𝑃𝐵𝐴 =
𝑃𝐴
𝐴 = 𝛼 𝑓𝑎𝑖𝑙𝑠 ⇒ 𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐵 𝐴 𝑃[𝐴]
𝐵 = {𝛽 𝑓𝑎𝑖𝑙𝑠}
Plugging this in (⋆) gives
In terms of these events, what we are given is:
𝑃 𝐵 𝐴 𝑃[𝐴]
𝑃 𝐴 = 0.01, 𝑃 𝐵 = 0.005, 𝑃 𝐵 𝐴 = 0.015 𝑃𝐴𝐵 =
𝑃𝐵
𝑃 𝐴 𝐵 =?
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 118
A Smart Use of Conditional Probabilities … Bayes’ Theorem We’ll use the previously discussed link
between conditional and joint probabilities
- Suppose that a certain system would fail if two of its
distinct components 𝛼 and 𝛽 both fail. 𝑃 𝐴∩𝐵
𝑃𝐴𝐵 = … (⋆)
𝑃𝐵
- If probability that 𝛼 fails is 0.01 and the probability that 𝛽
fails is 0.005, and probability that 𝛽 fails if 𝛼 has failed is Though we do not have 𝑃[𝐴 ∩ 𝐵], we can
0.015, find the probability that 𝛼 will fail if 𝛽 has failed. derive it by using the conditional in reverse
𝑃 𝐴∩𝐵
Always good to define events first… 𝑃𝐵𝐴 =
𝑃𝐴
𝐴 = 𝛼 𝑓𝑎𝑖𝑙𝑠 ⇒ 𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐵 𝐴 𝑃[𝐴]
𝐵 = {𝛽 𝑓𝑎𝑖𝑙𝑠}
Plugging this in (⋆) gives
In terms of these events, what we are given is:
𝑃 𝐵 𝐴 𝑃[𝐴]
𝑃 𝐴 = 0.01, 𝑃 𝐵 = 0.005, 𝑃 𝐵 𝐴 = 0.015 𝑃𝐴𝐵 =
𝑃𝐵
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 120
A Smart Use of Conditional Probabilities … Bayes’ Theorem
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 121
“Learning” Probabilities
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES
122
“Learning” Probabilities
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES
123
“Learning” Probabilities
Initial probability of an
event (assumed or
based on past data).
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES
124
“Learning” Probabilities
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES
125
“Learning” Probabilities
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES
126
“Learning” Probabilities
Iterative “learning” as
Learned/Updated Statistics of new data and new data arrives
probability of Initial probability of an its statistical relevance to
event given new event (assumed or event of interest.
data/information. based on past data).
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES
127
“Learning” Probabilities
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES
128
Bayes’ Theorem – Formal Definition
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES
129
Bayes’ Theorem – Formal Definition
This means that {𝐴𝑗 } are mutually
exclusive and their union spans Ω
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES
130
Another Smart Use of Conditional Probabilities … Total Probability
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 131
Another Smart Use of Conditional Probabilities … Total Probability
- You reckon your chances of winning from Zalmi are 0.05, from
Gladiators 0.04, and from Qalandars 0.1.
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 132
Another Smart Use of Conditional Probabilities … Total Probability
- You reckon your chances of winning from Zalmi are 0.05, from
Gladiators 0.04, and from Qalandars 0.1.
Let’s call our team Topi Drama, and define some events
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 133
Another Smart Use of Conditional Probabilities … Total Probability
- You reckon your chances of winning from Zalmi are 0.05, from
Gladiators 0.04, and from Qalandars 0.1.
Let’s call our team Topi Drama, and define some events
𝑊 = 𝐷𝑟𝑎𝑚𝑎 𝑤𝑖𝑛𝑠
𝑍 = 𝑍𝑎𝑙𝑚𝑖 𝑐ℎ𝑜𝑠𝑒𝑛
𝐺 = 𝐺𝑙𝑎𝑑𝑖𝑎𝑡𝑜𝑟𝑠 𝑐ℎ𝑜𝑠𝑒𝑛
𝑄 = {𝑄𝑎𝑙𝑎𝑛𝑑𝑎𝑟𝑠 𝑐ℎ𝑜𝑠𝑒𝑛}
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 134
Another Smart Use of Conditional Probabilities … Total Probability
In terms of these events, we are given:
- Suppose your team is to play a PSL match against one of three
teams: Quetta Gladiators, Peshawar Zalmi, or Lahore Qalandars. 𝑃 𝑊 𝑍 = 0.05
𝑃 𝑊 𝐺 = 0.04
- You reckon your chances of winning from Zalmi are 0.05, from 𝑃 𝑊 𝑄 = 0.1
Gladiators 0.04, and from Qalandars 0.1.
Further
- What is your probability of winning if your opponent is chosen
randomly with equal probabilities? 1
𝑃 𝑍 =𝑃 𝐺 =𝑃 𝑄 =
3
Let’s call our team Topi Drama, and define some events
𝑊 = 𝐷𝑟𝑎𝑚𝑎 𝑤𝑖𝑛𝑠
𝑍 = 𝑍𝑎𝑙𝑚𝑖 𝑐ℎ𝑜𝑠𝑒𝑛
𝐺 = 𝐺𝑙𝑎𝑑𝑖𝑎𝑡𝑜𝑟𝑠 𝑐ℎ𝑜𝑠𝑒𝑛
𝑄 = {𝑄𝑎𝑙𝑎𝑛𝑑𝑎𝑟𝑠 𝑐ℎ𝑜𝑠𝑒𝑛}
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 135
Another Smart Use of Conditional Probabilities … Total Probability
In terms of these events, we are given:
- Suppose your team is to play a PSL match against one of three
teams: Quetta Gladiators, Peshawar Zalmi, or Lahore Qalandars. 𝑃 𝑊 𝑍 = 0.05
𝑃 𝑊 𝐺 = 0.04
- You reckon your chances of winning from Zalmi are 0.05, from 𝑃 𝑊 𝑄 = 0.1
Gladiators 0.04, and from Qalandars 0.1.
Further
- What is your probability of winning if your opponent is chosen
randomly with equal probabilities? 1
𝑃 𝑍 =𝑃 𝐺 =𝑃 𝑄 =
3
Let’s call our team Topi Drama, and define some events
And we have to find
𝑊 = 𝐷𝑟𝑎𝑚𝑎 𝑤𝑖𝑛𝑠 𝑃 𝑊 =?
𝑍 = 𝑍𝑎𝑙𝑚𝑖 𝑐ℎ𝑜𝑠𝑒𝑛
𝐺 = 𝐺𝑙𝑎𝑑𝑖𝑎𝑡𝑜𝑟𝑠 𝑐ℎ𝑜𝑠𝑒𝑛
𝑄 = {𝑄𝑎𝑙𝑎𝑛𝑑𝑎𝑟𝑠 𝑐ℎ𝑜𝑠𝑒𝑛} How do we solve this?
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 136
Another Smart Use of Conditional Probabilities … Total Probability
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 137
Another Smart Use of Conditional Probabilities … Total Probability
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 138
Another Smart Use of Conditional Probabilities … Total Probability
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 139
Another Smart Use of Conditional Probabilities … Total Probability
𝑃 𝑊 = 𝑃 𝑊 𝑍 𝑃 𝑍 + 𝑃 𝑊 𝐺 𝑃 𝐺 + 𝑃 𝑊 𝑄 𝑃[𝑄]
1 1 1 0.19
𝑃 𝑊 = 0.05 × + 0.04 × + 0.1 × =
3 3 3 3
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 140
Another Smart Use of Conditional Probabilities … Total Probability
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 141
Total Probability – Formal Definition
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 142
Total Probability – Formal Definition
Logic: since 𝐴’s are mutually exclusive Probability split into Total Probability
and exhaustive, 𝐵 ∈ Ω must conditionals or joint
necessarily be made up of disjoint probabilities.
overlaps with some (or all) of the 𝐴’s.
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 143
Topological
Space
Vector
We have previously defined several spaces (vector
Space
space, metric space, inner product space…). Metric
Space
Inner
Product
Space
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 144
Topological
Space
Vector
We have previously defined several spaces (vector
Space
space, metric space, inner product space…). Metric
Space
Inner
Let us now formally define the Probability Space. Product
Probability
Space
Space
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 145
Probability A Set,
A Field,
Space and A Measure
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 146
Probability A Set,
A Field,
Space
Probability Measure
and A Measure
Ω 𝐹 𝑃
𝜎 − 𝑓𝑖𝑒𝑙𝑑
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 147
Probability A Set,
A Field,
Space
Probability Measure
and A Measure
Ω 𝐹 𝑃
𝜎 − 𝑓𝑖𝑒𝑙𝑑
- A Set of All Possibilities
(Sample Space Ω),
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 148
Probability A Set,
A Field,
Space
Probability Measure
and A Measure
Ω 𝐹 𝑃
𝜎 − 𝑓𝑖𝑒𝑙𝑑
- A Set of All Possibilities
(Sample Space Ω),
- A collection (𝐹) of
subsets (events) of Ω
that allows closure
under several set
operations,
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 149
Probability A Set,
A Field,
Space
Probability Measure
and A Measure
Ω 𝐹 𝑃
𝜎 − 𝑓𝑖𝑒𝑙𝑑
- A Set of All Possibilities
(Sample Space Ω),
- A collection (𝐹) of
subsets (events) of Ω
that allows closure
under several set
operations,
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 150
Probability A Set,
A Field,
Space
Probability Measure
and A Measure
Ω 𝐹 𝑃
𝜎 − 𝑓𝑖𝑒𝑙𝑑
- A Set of All Possibilities
(Sample Space Ω),
- A collection (𝐹) of
subsets (events) of Ω
that allows closure
under several set
operations,
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 151
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 152
- These ensure closures (under union, intersection) etc. so
that probability measure assignments make sense.
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 153
- These ensure closures (under union, intersection) etc. so
that probability measure assignments make sense.
- E.g., probability axioms require 𝑃 𝐴 ∪ 𝐴𝑐 = 𝑃 𝐴 +
𝑃(𝐴𝑐 ) = 1, but if only 𝐴 ∈ 𝐹 but 𝐴𝑐 is not included in 𝐹,
things will become problematic.
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 154
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 155
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 156
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 157
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 158
With the basic notions covered, we are
now ready to embrace uncertainty by
modeling it precisely…
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 159
Embracing Uncertainty/Variation… Prefix: Random
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 160
Embracing Uncertainty/Variation… Prefix: Random
Deterministic
Event
Variable
Process
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 161
Embracing Uncertainty/Variation… Prefix: Random
Deterministic → Random
Event Event
Variable Variable
Process Process
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 162
Embracing Uncertainty/Variation… Prefix: Random
Deterministic → Random
Event Event
Variable Variable
Process Process
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 163
Embracing Uncertainty/Variation… Prefix: Random
Deterministic → Random
Event Event
Variable Variable
Process Process
- An outcome of an uncertain
Random Event happening/experiment.
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 164
Embracing Uncertainty/Variation… Prefix: Random
Variable
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 165
Embracing Uncertainty/Variation… Prefix: Random
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 166
Embracing Uncertainty/Variation… Prefix: Random
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 167
Embracing Uncertainty/Variation… Prefix: Random
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 168
Embracing Uncertainty/Variation… Prefix: Random
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 169
Recall… (we previously saw a random
variable as a function/mapping)
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 170
A random variable is a mapping from
probability space to real number line.
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 171
A random variable is a mapping from
probability space to real number line.
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 172
Back to the radius example… 𝑟 𝜔 = radius of a circle decided through outcome of random
event 𝜔 coming from a probability space.
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 173
Back to the radius example… 𝑟 𝜔 = radius of a circle decided through outcome of random
event 𝜔 coming from a probability space.
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 174
Back to the radius example… 𝑟 𝜔 = radius of a circle decided through outcome of random
event 𝜔 coming from a probability space.
Mapping 1: 𝑟 𝜔 =𝜔
1, 𝜔=1
2, 𝜔=2
3, 𝜔=3
𝑟 𝜔 =
4, 𝜔=4
5, 𝜔=5
6, 𝜔=6
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 175
Back to the radius example… 𝑟 𝜔 = radius of a circle decided through outcome of random
event 𝜔 coming from a probability space.
1, 𝜔=1
2, 𝜔=2
3, 𝜔=3
𝑟 𝜔 =
4, 𝜔=4
5, 𝜔=5
6, 𝜔=6
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 176
Back to the radius example… 𝑟 𝜔 = radius of a circle decided through outcome of random
event 𝜔 coming from a probability space.
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 177
Random Variable As a Mapping
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 178
Random Variable As a Mapping
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 179
Random Variable As a Mapping
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 180
Random Variable As a Mapping
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 181
𝑍 = Next student who enters the room, is he from Swabi region? Not a random variable yet.
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 182
𝑍 = Next student who enters the room, is he from Swabi region? Not a random variable yet.
Possible outcomes
(descriptive)
Yes
No
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 183
𝑍 = Next student who enters the room, is he from Swabi region? Not a random variable yet.
Possible outcomes
(descriptive)
Possible outcomes
(numeric)
Yes
1
No
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 184
𝑍 = Next student who enters the room, is he from Swabi region? Not a random variable yet.
Possible outcomes
(descriptive)
Possible outcomes
(numeric)
Yes
1
No
𝑍 = 1 if next student entering the room is from Swabi Now 𝑍 is a random variable!!
𝑍 = 0 if next student entering the room in not from Swabi
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 185
Once the outcome of a random experiment is available (or assumed), the
Realization
value of the random variable becomes fixed, and is called a realization.
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 186
Once the outcome of a random experiment is available (or assumed), the
Realization
value of the random variable becomes fixed, and is called a realization.
𝑟 𝜔 =𝜔
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 187
Once the outcome of a random experiment is available (or assumed), the
Realization
value of the random variable becomes fixed, and is called a realization.
𝑟 𝜔 =𝜔
𝑟 ∈ 1, 2, 3, 4, 5, 6
𝑎 ∈ {𝜋, 4𝜋, 9𝜋, 16𝜋, 25𝜋, 36𝜋}
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 188
Once the outcome of a random experiment is available (or assumed), the
Realization
value of the random variable becomes fixed, and is called a realization.
𝑟 𝜔 =𝜔
𝑟 ∈ 1, 2, 3, 4, 5, 6
𝑎 ∈ {𝜋, 4𝜋, 9𝜋, 16𝜋, 25𝜋, 36𝜋}
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 189
Random Process A collection of random variables.
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 190
Random Process A collection of random variables.
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 191
Random Process A collection of random variables.
{𝑋1 , 𝑋2 , 𝑋3 , 𝑋4 } = heights of next four students entering the room Random Process
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 192
Random Process A collection of random variables.
{𝑋1 , 𝑋2 , 𝑋3 , 𝑋4 } = heights of next four students entering the room Random Process
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 193
In general, a random process can be represented as…
𝑋(𝜔, 𝑡) 𝑡∈𝑇
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 194
In general, a random process can be represented as…
𝑋(𝜔, 𝑡) 𝑡∈𝑇
Discrete Continuous
State State
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 195
In general, a random process can be represented as…
Discrete-time
Discrete Continuous
State State
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 196
In general, a random process can be represented as…
Discrete-time
Discrete Continuous
State State
Examples
- Discrete-State, Discrete-Time:
- Continuous-State, Discrete-Time:
- Continuous-State, Continuous-Time:
- Discrete-State, Continuous-Time:
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 197
In general, a random process can be represented as…
Discrete-time
Discrete Continuous
State State
Examples
- Discrete-State, Discrete-Time: Number of siblings next four students entering the room have.
- Continuous-State, Discrete-Time: Heights of next four students entering the room.
- Continuous-State, Continuous-Time: Temperature of the room over a day.
- Discrete-State, Continuous-Time: Number of students in a classroom over a day.
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 198
Having defined probabilities and
randomness, we now define functions
that collect probabilities…
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 199
Discrete vs. Continuous Case For collecting probabilities, we need to split the problem
into two cases depending on whether random variable 𝑋
is discrete-state or continuous-state.
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 200
Discrete vs. Continuous Case For collecting probabilities, we need to split the problem
into two cases depending on whether random variable 𝑋
is discrete-state or continuous-state.
Number of values
random variable
𝑋 takes, can be
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 201
Discrete vs. Continuous Case For collecting probabilities, we need to split the problem
into two cases depending on whether random variable 𝑋
is discrete-state or continuous-state.
Finite
Number of values
Countably Infinite
random variable
𝑋 takes, can be Uncountably Infinite
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 202
Discrete vs. Continuous Case For collecting probabilities, we need to split the problem
into two cases depending on whether random variable 𝑋
is discrete-state or continuous-state.
Finite Discrete
Number of values random variable
Countably Infinite
random variable
𝑋 takes, can be Continuous
Uncountably Infinite
random variable
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 203
Discrete vs. Continuous Case For collecting probabilities, we need to split the problem
into two cases depending on whether random variable 𝑋
is discrete-state or continuous-state.
Finite Discrete
Number of values random variable
Countably Infinite
random variable
𝑋 takes, can be Continuous
Uncountably Infinite
random variable
- 𝑋 ∈ {1, 2, 3, 4, 5, 6} : Finite
- 𝑋 ∈ {1, 2, 3, 4, … } : Countably Infinite
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 204
Discrete vs. Continuous Case For collecting probabilities, we need to split the problem
into two cases depending on whether random variable 𝑋
is discrete-state or continuous-state.
Finite Discrete
Number of values random variable
Countably Infinite
random variable
𝑋 takes, can be Continuous
Uncountably Infinite
random variable
- 𝑋 ∈ {1, 2, 3, 4, 5, 6} : Finite
- 𝑋 ∈ {1, 2, 3, 4, … } : Countably Infinite
- 𝑋∈𝑅
- 𝑋 𝑖𝑠 𝑎 𝑟𝑒𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑖𝑛 𝑡ℎ𝑒 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 [0 1]
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 205
We look at the discrete case first…
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 206
We look at the discrete case first…
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 207
Being defined on probabilities, the Distribution
Function 𝑝𝑋 (𝑥) must satisfy the three axioms of
probability, leading to the conditions
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 208
Being defined on probabilities, the Distribution
Function 𝑝𝑋 (𝑥) must satisfy the three axioms of
probability, leading to the conditions
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 209
Let’s play …
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 210
Let’s play …
• Let’s assume there is a bag with three balls in it with numbers 1 – 3 written on
them. You draw one ball at random (with each equally likely to be picked).
• 𝑋 = number written on the ball (random variable)
• 𝑥 = 1, 2, 3 (possible values of 𝑋)
• 𝑝𝑋 𝑥 = 𝑃(𝑋 = 𝑥)
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 211
Let’s play …
• Let’s assume there is a bag with three balls in it with numbers 1 – 3 written on
them. You draw one ball at random (with each equally likely to be picked).
• 𝑋 = number written on the ball (random variable)
• 𝑥 = 1, 2, 3 (possible values of 𝑋)
• 𝑝𝑋 𝑥 = 𝑃(𝑋 = 𝑥)
𝒙 𝒑𝑿 (𝒙)
1
1
3
1
2
3
1
3
3
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 212
Let’s play …
• Let’s assume there is a bag with three balls in it with numbers 1 – 3 written on
them. You draw one ball at random (with each equally likely to be picked).
• 𝑋 = number written on the ball (random variable)
• 𝑥 = 1, 2, 3 (possible values of 𝑋)
• 𝑝𝑋 𝑥 = 𝑃(𝑋 = 𝑥)
𝒙 𝒑𝑿 (𝒙)
Note that 𝑝𝑋 𝑥 satisfies the 1
three conditions 1
3
1. Its always non-negative 1
2. Sums up to 1 2
3. It assigns probabilities to 3
each value of 𝑋 1
3
3
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 213
We often find it useful to discuss whether
E.g., Probability next student entering is shorter than me?
a random variable lies in a certain range…
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 214
We often find it useful to discuss whether
E.g., Probability next student entering is shorter than me?
a random variable lies in a certain range…
Probability that 𝑋
takes a value less
than or equal to 𝑥
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 215
We often find it useful to discuss whether
E.g., Probability next student entering is shorter than me?
a random variable lies in a certain range…
Probability that 𝑋
takes a value less
than or equal to 𝑥
Clearly
CDF of a discrete
random variable is just
the sum of the PDF
values for 𝑥𝑖 ≤ 𝑥
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 216
Properties of the CDF
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 217
Properties of the CDF
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 218
Properties of the CDF
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 219
Example
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 220
Example
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 221
Example
1 1 5
𝑃 𝑋 ≤ 𝑥2 = sum of all these possibilities = + =
3 2 6
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 222
Example
1 1 5
𝑃 𝑋 ≤ 𝑥2 = sum of all these possibilities = + =
3 2 6
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 223
Let’s play …
• Let’s write 𝐹𝑋 𝑥 for the three balls example
𝒙 𝒑𝑿 (𝒙) 𝐹𝑋 𝑥 = 𝑃(𝑋 ≤ 𝑥)
1 1
1 𝐹𝑋 1 = 𝑝𝑋 1 = 3
3
1 2
2 𝐹𝑋 2 = 𝑝𝑋 1 + 𝑝𝑋 (2) = 3
3
1
3 𝐹𝑋 3 =𝑝𝑋 1 + 𝑝𝑋 2 + 𝑝𝑋 (3) = 1
3
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 224
Let’s play …
• Let’s write 𝐹𝑋 𝑥 for the three balls example
𝒙 𝒑𝑿 (𝒙) 𝐹𝑋 𝑥 = 𝑃(𝑋 ≤ 𝑥)
1 1
1 𝐹𝑋 1 = 𝑝𝑋 1 = 3
3
1 2 Note that 𝐹 𝑥 satisfies the
2 𝐹𝑋 2 = 𝑝𝑋 1 + 𝑝𝑋 (2) = 3
3 conditions
1
3 𝐹𝑋 3 =𝑝𝑋 1 + 𝑝𝑋 2 + 𝑝𝑋 (3) = 1
3
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 225
Let’s play …
• Let’s write 𝐹𝑋 𝑥 for the three balls example
𝒙 𝒑𝑿 (𝒙) 𝐹𝑋 𝑥 = 𝑃(𝑋 ≤ 𝑥)
1 1
1 𝐹𝑋 1 = 𝑝𝑋 1 = 3
3
1 2 Note that 𝐹 𝑥 satisfies the
2 𝐹𝑋 2 = 𝑝𝑋 1 + 𝑝𝑋 (2) = 3
3 conditions
1
3 𝐹𝑋 3 =𝑝𝑋 1 + 𝑝𝑋 2 + 𝑝𝑋 (3) = 1
3
𝐹𝑋 0.5 =?
𝐹𝑋 −∞ =?
𝐹𝑋 2.5 =?
𝐹𝑋 5 =?
𝐹𝑋 ∞ =?
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 226
Let’s play …
• Let’s write 𝐹𝑋 𝑥 for the three balls example
𝒙 𝒑𝑿 (𝒙) 𝐹𝑋 𝑥 = 𝑃(𝑋 ≤ 𝑥)
1 1
1 𝐹𝑋 1 = 𝑝𝑋 1 = 3
3
1 2 Note that 𝐹 𝑥 satisfies the
2 𝐹𝑋 2 = 𝑝𝑋 1 + 𝑝𝑋 (2) = 3
3 conditions
1
3 𝐹𝑋 3 =𝑝𝑋 1 + 𝑝𝑋 2 + 𝑝𝑋 (3) = 1
3
𝐹𝑋 0.5 = 𝑃 𝑋 ≤ 0.5 = 0
𝐹𝑋 −∞ = 𝑃 𝑋 ≤ −∞ = 0
𝐹𝑋 2.5 = 𝑃 𝑋 ≤ 2.5 = 𝑃 𝑋 ≤ 2 = 𝐹𝑋 (2)
𝐹𝑋 5 = 𝑃 𝑋 ≤ 5 = 𝑃 𝑋 ≤ 3 = 𝐹𝑋 3 = 1
𝐹𝑋 ∞ = 𝑃 𝑋 ≤ ∞ = 𝑃 𝑋 ≤ 3 = 𝐹𝑋 3 = 1
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 227
Now let’s repeat the above for Continuous Random Variables…
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 228
Now let’s repeat the above for Continuous Random Variables…
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 229
Now let’s repeat the above for Continuous Random Variables…
Now, for a continuous random variable, it only makes sense to talk about
the probability that it lies in some interval, say 𝑎 𝑏 , 𝑏 > 𝑎
𝑃 𝑎 < 𝑋 ≤ 𝑏 = 𝐹𝑋 𝑏 − 𝐹𝑋 (𝑎)
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 230
Now let’s repeat the above for Continuous Random Variables…
Why?
Recall… Continuous Random Variable → 𝑋 takes a continuum of values in 𝑅
- If 𝑋 is continuous, it can take
- 𝑋∈𝑅 infinite values over a range.
- 𝑋 𝑖𝑠 𝑎 𝑟𝑒𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑖𝑛 𝑡ℎ𝑒 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 [0 1]
- Clearly, the probability that
𝑋 takes any one value among
Now, for a continuous random variable, it only makes sense to talk about these infinite values would
the probability that it lies in some interval, say 𝑎 𝑏 , 𝑏 > 𝑎 have to be zero.
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 231
Now let’s repeat the above for Continuous Random Variables…
Why?
Recall… Continuous Random Variable → 𝑋 takes a continuum of values in 𝑅
- If 𝑋 is continuous, it can take
- 𝑋∈𝑅 infinite values over a range.
- 𝑋 𝑖𝑠 𝑎 𝑟𝑒𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑖𝑛 𝑡ℎ𝑒 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 [0 1]
- Clearly, the probability that
𝑋 takes any one value among
Now, for a continuous random variable, it only makes sense to talk about these infinite values would
the probability that it lies in some interval, say 𝑎 𝑏 , 𝑏 > 𝑎 have to be zero.
Where 𝐹𝑋 𝑥 = 𝑃[𝑋 ≤ 𝑥] is the CDF of the It’s kind of like: a line passes through a
continuous random variable 𝑋. continuum of points and has a length, even
though each point has no notion of length.
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 232
Distributions of Continuous Random Variables… Probability Density Function (PDF)
𝑃 𝑎 < 𝑋 ≤ 𝑏 = 𝐹𝑋 𝑏 − 𝐹𝑋 (𝑎)
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 233
Distributions of Continuous Random Variables… Probability Density Function (PDF)
𝑃 𝑎 < 𝑋 ≤ 𝑏 = 𝐹𝑋 𝑏 − 𝐹𝑋 (𝑎)
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 234
Distributions of Continuous Random Variables… Probability Density Function (PDF)
𝑃 𝑎 < 𝑋 ≤ 𝑏 = 𝐹𝑋 𝑏 − 𝐹𝑋 (𝑎)
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 235
Distributions of Continuous Random Variables… Probability Density Function (PDF)
𝑃 𝑎 < 𝑋 ≤ 𝑏 = 𝐹𝑋 𝑏 − 𝐹𝑋 (𝑎)
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 236
Distributions of Continuous Random Variables… Probability Density Function (PDF)
𝑃 𝑎 < 𝑋 ≤ 𝑏 = 𝐹𝑋 𝑏 − 𝐹𝑋 (𝑎)
or
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 237
Distributions of Continuous Random Variables… Probability Density Function (PDF)
or
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 238
Distributions of Continuous Random Variables… Probability Density Function (PDF)
𝑃 𝑋≤𝑥 =
In fact, if the derivative of 𝐹𝑋 (𝑥) exists, we could use it to
define a Probability Density Function (PDF) 𝑓𝑋 (𝑥) for the
continuous random variable as And for range 𝑎 𝑏 , 𝑏 > 𝑎
or
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 239
Some Properties of 𝑓𝑋 (𝑥)
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 240
Some Properties of 𝑓𝑋 (𝑥)
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 241
Some Properties of 𝑓𝑋 (𝑥)
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 242
- Q. If 𝑃 𝑋 = 𝑥 = 0 for continuous random variable 𝑋, and
𝑓𝑋 𝑥 ≠ 𝑃 𝑋 = 𝑥 , then what does 𝑓𝑋 (𝑥) really represent?
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 243
- Q. If 𝑃 𝑋 = 𝑥 = 0 for continuous random variable 𝑋, and
𝑓𝑋 𝑥 ≠ 𝑃 𝑋 = 𝑥 , then what does 𝑓𝑋 (𝑥) really represent?
Mathematically,
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 244
- Q. If 𝑃 𝑋 = 𝑥 = 0 for continuous random variable 𝑋, and
𝑓𝑋 𝑥 ≠ 𝑃 𝑋 = 𝑥 , then what does 𝑓𝑋 (𝑥) really represent?
Mathematically,
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 245
- Q. If 𝑃 𝑋 = 𝑥 = 0 for continuous random variable 𝑋, and
𝑓𝑋 𝑥 ≠ 𝑃 𝑋 = 𝑥 , then what does 𝑓𝑋 (𝑥) really represent?
Intuitively,
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 246
- Q. If 𝑃 𝑋 = 𝑥 = 0 for continuous random variable 𝑋, and
𝑓𝑋 𝑥 ≠ 𝑃 𝑋 = 𝑥 , then what does 𝑓𝑋 (𝑥) really represent?
Intuitively,
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 247
- Q. If 𝑃 𝑋 = 𝑥 = 0 for continuous random variable 𝑋, and
𝑓𝑋 𝑥 ≠ 𝑃 𝑋 = 𝑥 , then what does 𝑓𝑋 (𝑥) really represent?
Intuitively,
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 248
Probability Distribution Function 𝑝𝑋 (𝑥) (discrete case) and Probability
All the Info We Need!
Density Function 𝑓𝑋 (𝑥) (continuous case) are both very important as
they contain all the probability information about a random variable.
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 249
Probability Distribution Function 𝑝𝑋 (𝑥) (discrete case) and Probability
All the Info We Need!
Density Function 𝑓𝑋 (𝑥) (continuous case) are both very important as
they contain all the probability information about a random variable.
𝑋 ~ 𝑝𝑋 (𝑥) 𝑋 ~ 𝑓𝑋 (𝑥)
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 250
From One Random Variable to Two Bivariate Distributions
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 251
From One Random Variable to Two Bivariate Distributions
Bivariate CDF
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 252
From One Random Variable to Two Bivariate Distributions
Bivariate CDF
Bivariate Distribution
Function (Discrete Case)
𝑝𝑋𝑌 𝑥, 𝑦 ≥ 0.
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 253
From One Random Variable to Two Bivariate Distributions
Bivariate CDF
Bivariate Distribution
Function (Discrete Case)
𝑝𝑋𝑌 𝑥, 𝑦 ≥ 0.
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 254
Bivariate Distributions Important Properties and Relations
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 255
Bivariate Distributions Important Properties and Relations
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 256
Bivariate Distributions Important Properties and Relations
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 257
Properties of Bivariate CDF
Marginal CDFs
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 258
Conditional Distributions
Just as for two events, we had conditional probability 𝑃[𝐴|𝐵],
for two random variables we have conditional distributions
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 259
Conditional Distributions
Just as for two events, we had conditional probability 𝑃[𝐴|𝐵],
for two random variables we have conditional distributions
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 260
Conditional Distributions
Just as for two events, we had conditional probability 𝑃[𝐴|𝐵],
for two random variables we have conditional distributions
With properties…
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 261
Conditional Distributions
Just as for two events, we had conditional probability 𝑃[𝐴|𝐵],
for two random variables we have conditional distributions
Compare with:
𝑃 𝐴∩𝐵
𝑃𝐴𝐵 =
𝑃𝐵
With properties…
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 262
Independence of Two
Random Variables
𝑃[𝐴 ∩ 𝐵] = 𝑃 𝐴 𝑃[𝐵]
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 263
Independence of Two
Random Variables
𝑃[𝐴 ∩ 𝐵] = 𝑃 𝐴 𝑃[𝐵]
𝐹𝑋𝑌 𝑥, 𝑦 = 𝐹𝑋 𝑥 𝐹𝑌 𝑦 ∀𝑥, 𝑦
𝑓𝑋𝑌 𝑥, 𝑦 = 𝑓𝑋 𝑥 𝑓𝑌 𝑦 ∀𝑥, 𝑦
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 264
From Two to Many Random Variables Multivariate Distributions
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 265
From Two to Many Random Variables Multivariate Distributions
𝑿 = [𝑋1 𝑋2 … 𝑋𝑀 ]
𝐹𝑿 𝒙 = 𝑃[𝑋1 ≤ 𝑥1 , 𝑋2 ≤ 𝑥2 , … , 𝑋𝑀 ≤ 𝑥𝑚 ]
𝒙 = [𝑥1 𝑥2 … 𝑥𝑀 ]
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 266
From Two to Many Random Variables Multivariate Distributions
𝑿 = [𝑋1 𝑋2 … 𝑋𝑀 ]
𝐹𝑿 𝒙 = 𝑃[𝑋1 ≤ 𝑥1 , 𝑋2 ≤ 𝑥2 , … , 𝑋𝑀 ≤ 𝑥𝑚 ]
𝒙 = [𝑥1 𝑥2 … 𝑥𝑀 ]
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 267
From Two to Many Random Variables Multivariate Distributions
𝑿 = [𝑋1 𝑋2 … 𝑋𝑀 ]
𝐹𝑿 𝒙 = 𝑃[𝑋1 ≤ 𝑥1 , 𝑋2 ≤ 𝑥2 , … , 𝑋𝑀 ≤ 𝑥𝑚 ]
𝒙 = [𝑥1 𝑥2 … 𝑥𝑀 ]
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 268
Next time, we look at what key
information we can extract from
these distribution functions…
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES
270