0% found this document useful (0 votes)

37 views

Mathematics for Machine Learning

Uploaded by

eram cuet

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views

Mathematics for Machine Learning

Uploaded by

eram cuet

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 270

ES 691

Mathematics for
Machine Learning
with
Dr. Naveed R. Butt
@
GIKI - FES
Recall…

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 2
Recall…

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 3
Recall…

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 4
Recall…

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 5
Our next module is all about that…

Probability
- Modeling Uncertainties
- Collecting Probabilities (distributions)
- Extracting Key Indicators (moments)
- Reviewing Common Distributions
- Estimating Parameters

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 6
What is Probability?

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 7
What is Probability?

• Probability is a “lack of knowledge”!

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 8
What is Probability?

• Probability is a “lack of knowledge”!

• We know you are here today. We are sure.

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 9
What is Probability?

• Probability is a “lack of knowledge”!

• We know you are here today. We are sure.
• But will you be here in the next lecture? We are not sure anymore! There is
now a “lack of knowledge”
• “perhaps”, “maybe”, “probably”

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 10
What is Probability?

• Probability is a “lack of knowledge”!

• We know you are here today. We are sure.
• But will you be here in the next lecture? We are not sure anymore! There is
now a “lack of knowledge”
• “perhaps”, “maybe”, “probably”
• Another Example: You know your height. But what’s the height of the next
student who enters the room?

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 11
Why Do We Sometimes Lack Knowledge?

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 12
Why Do We Sometimes Lack Knowledge?

Future: A dice you haven’t rolled

yet! (how can we know which
number it will show).

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 13
Why Do We Sometimes Lack Knowledge?

Too hard to collect all the information

- Which places did you visit today?
- It may be possible to have a drone camera follow you
all the time.
- Then we will not have “lack of knowledge” about places
you go to.
- But this is too hard a thing to do.

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 14
Why Do We Sometimes Lack Knowledge?

Too hard to collect all the information

Another Example
- To perform facial recognition, we cannot ask a person to provide thousands of their photos
(different moods, lighting conditions, times of day, grooming levels)

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 15
Why Do We Sometimes Lack Knowledge?

Quantum Randomness
- Where’s the electron?
- According to current consensus, processes and
properties at quantum level are probabilistic by their
very nature.

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 16
Our way of making sense of the uncertain
Statistics world through whatever data we have…

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 17
Our way of making sense of the uncertain
Statistics world through whatever data we have…

What’s the height of the next student who enters the room?

We don’t know. There is lack of knowledge!

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 18
Our way of making sense of the uncertain
Statistics world through whatever data we have…

What’s the height of the next student who enters the room?

We don’t know. There is lack of knowledge!

But is our lack of knowledge “absolute”? (do we have

absolutely no idea what their height could be?)

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 19
Our way of making sense of the uncertain
Statistics world through whatever data we have…

What’s the height of the next student who enters the room?

We don’t know. There is lack of knowledge!

But is our lack of knowledge “absolute”? (do we have

absolutely no idea what their height could be?)

Well, we do know that the next student

is most likely taller than the shortest
man on record on shorter than the
tallest man on record (otherwise, let’s
call Guinness World Records!).

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 20
Our way of making sense of the uncertain
Statistics world through whatever data we have…

In fact, we also do know that

human heights have a typical
distribution (very short and very
tall less common etc.)

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 21
Our way of making sense of the uncertain
Statistics world through whatever data we have…

In fact, we also do know that

We can see these as
human heights have a typical
graphs of “relative
distribution (very short and very
likelihoods” of heights.
tall less common etc.)

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 22
Our way of making sense of the uncertain
Statistics world through whatever data we have…

In fact, we also do know that

We can see these as
human heights have a typical
graphs of “relative
distribution (very short and very
likelihoods” of heights.
tall less common etc.)

We could even make some “guesses” based on whatever

information we have (based on observation, experience, statistics)

Can you spot examples of such guesses around you?

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 23
Using Probability & Statistics …to make design decisions
based on educated guesses…

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 24
Using Probability & Statistics …to make design decisions
based on educated guesses…

Examples

Why don’t we make

doors this size?

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 25
Using Probability & Statistics …to make design decisions
based on educated guesses…

Examples

- The lecture hall doors are of width and height

Why don’t we make that allow most humans to pass through
doors this size? comfortably (of course making them too big
would be waste of resources).
- Your seats are designed with human dimensions
in mind!

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 26
Using Probability & Statistics …to make design decisions
based on educated guesses…

Examples

- The lecture hall doors are of width and height

- Probability and Statistics help us make smart guesses about

uncertain events.
- Based on the smart guesses we can plan, design, or take
steps to better control the situation.

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 27
How Do We Assign Probabilities?

- Through repeated experimentation/observation

(i.e., via collected statistics).

- Through some belief!

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 28
A Tale of Two Philosophies

Frequentist

- Probabilities can be found (in principle)

Bayesian
by a repeatable objective process (and
are thus ideally devoid of opinion).
- Probability expresses a degree of belief in an event. The
degree of belief may be based on prior knowledge about the
- Inferences are based on data only.
event, such as the results of previous experiments, or on
personal beliefs about the event.
- Hypotheses are tested and declared
true or false.
- Inferences are based on both data and prior beliefs.
- In practice: take as much data as you
- Hypotheses are tested and assigned probability of being
can, and use it to make educated guess.
true or false.

- In practice: Make a first guess about what to expect, then

update the guess based on new data.
Will be clearer with formulae.
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 29
Bayesian Approach Naturally
Allows “Learning Cycles”.

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 30
Assigning Probabilities – Four Steps

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 31
Assigning Probabilities – Four Steps
1. Clearly define your experiment…

- Rolling a six-sided die

- Tossing a coin
- Picking a student at random from a class

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 32
Assigning Probabilities – Four Steps
1. Clearly define your experiment…

- Rolling a six-sided die

- Tossing a coin
- Picking a student at random from a class

2. Define the Sample Space…

- Set of all distinct possibilities.

- For rolling a die Ω = {1 2 3 4 5 6}

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 33
Assigning Probabilities – Four Steps
1. Clearly define your experiment…

- Rolling a six-sided die

- Tossing a coin
- Picking a student at random from a class

2. Define the Sample Space…

- Set of all distinct possibilities.

- For rolling a die Ω = {1 2 3 4 5 6}

3. Define Events (of interest)…

- Event = some outcome of interest.

- Event = A subset of Ω (could even be ∅ or Ω)
- 𝐴 = {𝜔 ∈ Ω ∶ 𝜔 satisfies some conditions}

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 34
Assigning Probabilities – Four Steps
1. Clearly define your experiment…

- Rolling a six-sided die

- Tossing a coin
- Picking a student at random from a class

2. Define the Sample Space…

4. Assign probabilities (real numbers) to events such
- Set of all distinct possibilities. that the assignment scheme (“probability measure”)
makes sense (i.e., satisfies some axioms).
- For rolling a die Ω = {1 2 3 4 5 6}

3. Define Events (of interest)…

- Event = some outcome of interest.

- Event = A subset of Ω (could even be ∅ or Ω)
- 𝐴 = {𝜔 ∈ Ω ∶ 𝜔 satisfies some conditions}

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 35
Axioms of Probability (Kolmogorov)

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 36
Axioms of Probability (Kolmogorov)

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 37
Axioms of Probability (Kolmogorov)

Assigned probabilities should not be negative.

Assigned probabilities should add up to 1.

Probabilities assigned to mutually exclusive events

(events that cannot occur together) should make sense.

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 38
For the case of infinite number of possible
events, Axiom 3 is replaced by Axiom 4.

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 39
Formally speaking…
Ω 𝜔1 𝜔2

𝜔3
𝜔4 𝜔5
Sample Space
…

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 40
Formally speaking…
Ω 𝜔1 𝜔2

𝜔3
𝜔4 𝜔5
Sample Space
…

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 41
Formally speaking…

Probability Measure

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 42
Formally speaking…

Probability Measure

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 43
Events - Some Terminology and Results

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 44
Events - Some Terminology and Results
Sample Space
- Set of all possible distinct events

Ω
𝜔1 𝜔2

𝜔3
𝜔4 𝜔5
…

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 45
Events - Some Terminology and Results
Sample Space
- Set of all possible distinct events

Ω
𝜔1 𝜔2

𝜔3
𝜔4 𝜔5
…

𝜔𝑖 = Sample Point

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 46
Events - Some Terminology and Results
Sample Space Simple Event

- Set of all possible distinct events - An event containing only one

sample point
- E.g., 𝐴 = {𝜔3 } is a simple event
Ω
𝜔1 𝜔2

𝜔3
𝜔4 𝜔5
…

𝜔𝑖 = Sample Point

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 47
Events - Some Terminology and Results
Sample Space Simple Event

- Set of all possible distinct events - An event containing only one

sample point
- E.g., 𝐴 = {𝜔3 } is a simple event
Ω
Compound Event
𝜔1 𝜔2 - An event containing more than
one sample points
𝜔3 - E.g., 𝐵 = {𝜔1 , 𝜔3 } is a compound
𝜔4 𝜔5 event

𝜔𝑖 = Sample Point

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 48
Events - Some Terminology and Results
Sample Space Simple Event

- Set of all possible distinct events - An event containing only one

… Null Event
- An event containing no sample
points
𝜔𝑖 = Sample Point - E.g., 𝐶 = { } is a null event

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 49
Events - Some Terminology and Results
Ω
𝜔1 𝜔2

𝜔3
Sure Event
𝜔4 𝜔5
- An event consisting of all
the sample points …
- E.g., 𝐶 = Ω is a sure event

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 50
Events - Some Terminology and Results
Ω
𝜔1 𝜔2

𝜔3
Sure Event
𝜔4 𝜔5
- An event consisting of all
the sample points …
- E.g., 𝐶 = Ω is a sure event

Equally Likely Events

- Events having the same

probability of occurring
- E.g., if 𝑃 𝐴 = 𝑃[𝐵] then events
𝐴 and 𝐵 are equally likely

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 51
Events - Some Terminology and Results
Ω
𝜔1 𝜔2

𝜔3
Sure Event
𝜔4 𝜔5
- An event consisting of all
the sample points …
- E.g., 𝐶 = Ω is a sure event
Mutually Exclusive Events
Equally Likely Events
- Events that cannot occur at the same time
- Events having the same - E.g., in the depicted case, 𝐴 = {𝜔1 , 𝜔2 } and
probability of occurring 𝐵 = {𝜔3 , 𝜔4 } then clearly 𝐴 and 𝐵 cannot
- E.g., if 𝑃 𝐴 = 𝑃[𝐵] then events occur at the same time.
𝐴 and 𝐵 are equally likely

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 52
Events - Some Terminology and Results

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 53
Events - Some Terminology and Results

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 54
Events - Some Terminology and Results

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 55
Events - Some Terminology and Results

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 56
Events - Some Terminology and Results

de Morgan’s Laws

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 57
We are often interested in finding
Joint Probability probability of two events occurring at the
same time. This is called “Joint Probability”

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 58
We are often interested in finding
Joint Probability probability of two events occurring at the
same time. This is called “Joint Probability”

𝑃 𝐴 𝑎𝑛𝑑 𝐵 = 𝑃 𝐴 ∩ 𝐵 = 𝑃[𝐴, 𝐵]

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 59
We are often interested in finding
Joint Probability probability of two events occurring at the
same time. This is called “Joint Probability”

𝑃 𝐴 𝑎𝑛𝑑 𝐵 = 𝑃 𝐴 ∩ 𝐵 = 𝑃[𝐴, 𝐵]

e.g., 𝐴 = {𝐴𝑙𝑖 𝑖𝑠 𝑖𝑛 𝑙𝑒𝑐𝑡𝑢𝑟𝑒}, 𝐵 = {𝐴𝑙𝑖 𝑖𝑠 𝑠𝑙𝑒𝑒𝑝𝑖𝑛𝑔}

𝑃 𝐴 𝑎𝑛𝑑 𝐵 = 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑡ℎ𝑎𝑡 𝐴𝑙𝑖 𝑖𝑠 𝑖𝑛 𝑙𝑒𝑐𝑡𝑢𝑟𝑒 𝑎𝑛𝑑 𝑠𝑙𝑒𝑒𝑝𝑖𝑛𝑔

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 60
We are often interested in finding
Joint Probability probability of two events occurring at the
same time. This is called “Joint Probability”

𝑃 𝐴 𝑎𝑛𝑑 𝐵 = 𝑃 𝐴 ∩ 𝐵 = 𝑃[𝐴, 𝐵]

e.g., 𝐴 = {𝐴𝑙𝑖 𝑖𝑠 𝑖𝑛 𝑙𝑒𝑐𝑡𝑢𝑟𝑒}, 𝐵 = {𝐴𝑙𝑖 𝑖𝑠 𝑠𝑙𝑒𝑒𝑝𝑖𝑛𝑔}

𝑃 𝐴 𝑎𝑛𝑑 𝐵 = 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑡ℎ𝑎𝑡 𝐴𝑙𝑖 𝑖𝑠 𝑖𝑛 𝑙𝑒𝑐𝑡𝑢𝑟𝑒 𝑎𝑛𝑑 𝑠𝑙𝑒𝑒𝑝𝑖𝑛𝑔

𝐶 = 𝑑𝑖𝑒 𝑠ℎ𝑜𝑤𝑠 𝑒𝑣𝑒𝑛 𝑛𝑢𝑚𝑏𝑒𝑟 , 𝐷 = {𝑑𝑖𝑒 𝑠ℎ𝑜𝑤𝑠 𝑔𝑟𝑒𝑎𝑡𝑒𝑟 𝑡ℎ𝑎𝑛 3}

𝑃 𝐴 𝑎𝑛𝑑 𝐵 = 𝑃𝑟𝑜𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑔𝑒𝑡𝑡𝑖𝑛𝑔 𝑎𝑛 𝑒𝑣𝑒𝑛 𝑛𝑢𝑚𝑏𝑒𝑟 𝑎𝑏𝑜𝑣𝑒 3 (𝑖. 𝑒. , 4 𝑜𝑟 6)

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 61
Sometimes knowledge of one random event can help us
Conditional Probability assign probability to another random event – for this
the concept of “conditional probability” comes in handy.

𝐴 = 𝐴𝑙𝑖 𝑖𝑠 𝑖𝑛 𝑙𝑒𝑐𝑡𝑢𝑟𝑒 𝑎𝑡 8 𝑎𝑚
𝐵 = {𝐴𝑙𝑖 𝑖𝑠 𝑠𝑙𝑒𝑒𝑝𝑖𝑛𝑔 𝑎𝑡 8 𝑎𝑚}

Suppose I tell you that 𝐴 has occurred (i.e., Ali is in lecture at 8 am),
now what is the probability that he is sleeping at 8 am?

𝑃[𝐴 𝑔𝑖𝑣𝑒𝑛 𝐵] = 𝑃[𝐴|𝐵]

Not zero, but rather low (as students

do sometimes fall asleep in my class).

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 62
Understanding Conditional Probability
- Suppose I tell you that I’ve written an integer from 1 to 4 on a piece of
paper, but do not tell you the number.

- Under the assumption (“belief”) that I could have picked any of the four
numbers with equal chances, what is the probability that I wrote 3?

𝐴= 3 , 𝑃 𝐴 =?

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 63
Understanding Conditional Probability
- Suppose I tell you that I’ve written an integer from 1 to 4 on a piece of
paper, but do not tell you the number.

- Under the assumption (“belief”) that I could have picked any of the four
numbers with equal chances, what is the probability that I wrote 3?

𝐴= 3 , 𝑃 𝐴 =? 1
𝑃𝐴 =
4

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 64
Understanding Conditional Probability
- Suppose I tell you that I’ve written an integer from 1 to 4 on a piece of
paper, but do not tell you the number.

- Under the assumption (“belief”) that I could have picked any of the four
numbers with equal chances, what is the probability that I wrote 3?

𝐴= 3 , 𝑃 𝐴 =? 1
𝑃𝐴 =
4

- Suppose I tell you now (additional information) that I wrote an

odd number. Now what is the probability that I wrote 3?

New information: 𝐵 = {1,3} has occurred!

𝑃 𝐴 𝑔𝑖𝑣𝑒𝑛 𝐵 = 𝑃 𝐴 𝐵 = ?

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 65
Understanding Conditional Probability
- Suppose I tell you that I’ve written an integer from 1 to 4 on a piece of
paper, but do not tell you the number.

- Under the assumption (“belief”) that I could have picked any of the four
numbers with equal chances, what is the probability that I wrote 3?

𝐴= 3 , 𝑃 𝐴 =? 1
𝑃𝐴 =
4

- Suppose I tell you now (additional information) that I wrote an

odd number. Now what is the probability that I wrote 3?

New information: 𝐵 = {1,3} has occurred!

𝑃 𝐴 𝑔𝑖𝑣𝑒𝑛 𝐵 = 𝑃 𝐴 𝐵 = ? 1
𝑃𝐴𝐵 =
2

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 66
Understanding Conditional Probability What just happened? Let’s look at
the sample space.
- Suppose I tell you that I’ve written an integer from 1 to 4 on a piece of
paper, but do not tell you the number.
Sample Space before
Ω additional information.
- Under the assumption (“belief”) that I could have picked any of the four
numbers with equal chances, what is the probability that I wrote 3? 1 2
𝐴= 3 , 𝑃 𝐴 =? 𝑃𝐴 =
1 3 4
4

- Suppose I tell you now (additional information) that I wrote an

odd number. Now what is the probability that I wrote 3?

New information: 𝐵 = {1,3} has occurred!

𝑃 𝐴 𝑔𝑖𝑣𝑒𝑛 𝐵 = 𝑃 𝐴 𝐵 = ? 1
𝑃𝐴𝐵 =
2

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 67
Understanding Conditional Probability What just happened? Let’s look at
the sample space.
- Suppose I tell you that I’ve written an integer from 1 to 4 on a piece of
paper, but do not tell you the number.
Sample Space before
Ω additional information.
- Under the assumption (“belief”) that I could have picked any of the four
numbers with equal chances, what is the probability that I wrote 3? 1 2
𝐴= 3 , 𝑃 𝐴 =? 𝑃𝐴 =
1 3 4
4

- Suppose I tell you now (additional information) that I wrote an

Sample Space after
odd number. Now what is the probability that I wrote 3? Ωnew additional information.
New information: 𝐵 = {1,3} has occurred!

𝑃 𝐴 𝑔𝑖𝑣𝑒𝑛 𝐵 = 𝑃 𝐴 𝐵 = ? 1
𝑃𝐴𝐵 =
2

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 68
Understanding Conditional Probability What just happened? Let’s look at
the sample space.
- Suppose I tell you that I’ve written an integer from 1 to 4 on a piece of
paper, but do not tell you the number.
Sample Space before
Ω additional information.
- Under the assumption (“belief”) that I could have picked any of the four
numbers with equal chances, what is the probability that I wrote 3? 1 2
𝐴= 3 , 𝑃 𝐴 =? 𝑃𝐴 =
1 3 4
4

- Suppose I tell you now (additional information) that I wrote an

Sample Space after
odd number. Now what is the probability that I wrote 3? Ωnew additional information.
New information: 𝐵 = {1,3} has occurred!

1 Clearly, a relevant piece of information has shrunk the

𝑃 𝐴 𝑔𝑖𝑣𝑒𝑛 𝐵 = 𝑃 𝐴 𝐵 = ? 𝑃𝐴𝐵 = sample space leading to a more accurate assignment of
2
probability in light of the new knowledge.

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 69
Understanding Conditional Probability
Q. What if the additional information I give you is “I
wrote a positive integer”, or “I had cake for breakfast”

𝐶 = 𝑤𝑟𝑜𝑡𝑒 𝑎 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑖𝑛𝑡𝑒𝑔𝑒𝑟

𝐷 = {ℎ𝑎𝑑 𝑐𝑎𝑘𝑒 𝑓𝑜𝑟 𝑏𝑟𝑒𝑎𝑘𝑓𝑎𝑠𝑡}

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 70
Understanding Conditional Probability
Q. What if the additional information I give you is “I
wrote a positive integer”, or “I had cake for breakfast”

𝐶 = 𝑤𝑟𝑜𝑡𝑒 𝑎 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑖𝑛𝑡𝑒𝑔𝑒𝑟

𝐷 = {ℎ𝑎𝑑 𝑐𝑎𝑘𝑒 𝑓𝑜𝑟 𝑏𝑟𝑒𝑎𝑘𝑓𝑎𝑠𝑡}

Clearly…

𝑃 𝐴𝐶 =𝑃 𝐴
𝑃 𝐴 𝐷 = 𝑃[𝐴]

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 71
Understanding Conditional Probability
Q. What if the additional information I give you is “I
wrote a positive integer”, or “I had cake for breakfast”

𝐶 = 𝑤𝑟𝑜𝑡𝑒 𝑎 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑖𝑛𝑡𝑒𝑔𝑒𝑟

𝐷 = {ℎ𝑎𝑑 𝑐𝑎𝑘𝑒 𝑓𝑜𝑟 𝑏𝑟𝑒𝑎𝑘𝑓𝑎𝑠𝑡}

Clearly…

𝑃 𝐴𝐶 =𝑃 𝐴
𝑃 𝐴 𝐷 = 𝑃[𝐴]

Why? Note that the two new pieces of

information are quite useless/irrelevant in the
sense that they fail to shrink the sample space!

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 72
Understanding Conditional Probability
Q. What if the additional information I give you is “I
wrote a positive integer”, or “I had cake for breakfast”

𝐶 = 𝑤𝑟𝑜𝑡𝑒 𝑎 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑖𝑛𝑡𝑒𝑔𝑒𝑟 Sample Space before

𝐷 = {ℎ𝑎𝑑 𝑐𝑎𝑘𝑒 𝑓𝑜𝑟 𝑏𝑟𝑒𝑎𝑘𝑓𝑎𝑠𝑡}
Ω additional information.

1 2
Clearly… 4
3
𝑃 𝐴𝐶 =𝑃 𝐴
𝑃 𝐴 𝐷 = 𝑃[𝐴]
Sample Space after
Ωnew additional information.
Why? Note that the two new pieces of
information are quite useless/irrelevant in the
sense that they fail to shrink the sample space!

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 73
Linking Conditional and Joint Probabilities

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 74
Linking Conditional and Joint Probabilities

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 75
Linking Conditional and Joint Probabilities

Q. Apart from the mathematical reason (division by

zero), can you think of a logical reason for this?

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 76
Linking Conditional and Joint Probabilities

Q. Apart from the mathematical reason (division by

zero), can you think of a logical reason for this?

A. Note that if we say that event 𝐴 is impossible (𝑃 𝐴 = 0), then the

question “find probability of 𝐵 given that 𝐴 has occurred” is illogical
to begin with since 𝐴 could never have occurred!

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 77
Statistical Independence (Independent Events)

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 78
Statistical Independence (Independent Events)

When knowledge of one event does not change the probability of

another event, we say that the two are statistically independent.

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 79
Statistical Independence (Independent Events)

When knowledge of one event does not change the probability of

another event, we say that the two are statistically independent.

We already saw an example of this…

𝐶 = 𝑤𝑟𝑜𝑡𝑒 𝑎 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑖𝑛𝑡𝑒𝑔𝑒𝑟

𝑃 𝐴 𝐶 = 𝑃[𝐴]

𝐴 and 𝐶 are independent, since knowledge

of 𝐶 does not change probability of 𝐴.

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 80
Statistical Independence (Independent Events)

When knowledge of one event does not change the probability of

another event, we say that the two are statistically independent.

Sample Space before

We already saw an example of this…
additional information.
𝐶 = 𝑤𝑟𝑜𝑡𝑒 𝑎 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑖𝑛𝑡𝑒𝑔𝑒𝑟
1 2
3 4
𝑃 𝐴 𝐶 = 𝑃[𝐴]

𝐴 and 𝐶 are independent, since knowledge

of 𝐶 does not change probability of 𝐴. Ωnew Sample Space after additional
information (Ω𝑛𝑒𝑤 = Ω)

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 81
Statistical Independence (Independent Events)

When knowledge of one event does not change the probability of

another event, we say that the two are statistically independent.

Sample Space before

We already saw an example of this…
additional information.
𝐶 = 𝑤𝑟𝑜𝑡𝑒 𝑎 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑖𝑛𝑡𝑒𝑔𝑒𝑟
1 2
3 4
𝑃 𝐴 𝐶 = 𝑃[𝐴]

𝐴 and 𝐶 are independent, since knowledge

of 𝐶 does not change probability of 𝐴. Ωnew Sample Space after additional
information (Ω𝑛𝑒𝑤 = Ω)

So, in other words, two events are statistically independent

when knowledge of one adds no useful/new information
towards possibility of the other.

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 82
Statistical Independence (Independent Events)
In fact, this is one of the primary ways of checking independence

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 83
Statistical Independence (Independent Events)
In fact, this is one of the primary ways of checking independence

i.e., if conditional probability same

Test 1 𝑃 𝐴 𝐵 = 𝑃[𝐴]
as unconditional probability.

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 84
Statistical Independence (Independent Events)
In fact, this is one of the primary ways of checking independence

i.e., if conditional probability same

Test 1 𝑃 𝐴 𝐵 = 𝑃[𝐴]
as unconditional probability.

Test 2 𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐴 𝑃[𝐵] i.e., if the joint probability is separable.

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 85
Statistical Independence (Independent Events)
In fact, this is one of the primary ways of checking independence

i.e., if conditional probability same

Test 1 𝑃 𝐴 𝐵 = 𝑃[𝐴]
as unconditional probability.

Test 2 𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐴 𝑃[𝐵] i.e., if the joint probability is separable.

Where do we get this test from?

Recall 𝑃 𝐴 𝐵 = 𝑃 𝐴 ∩ 𝐵 /𝑃[𝐵]

Setting 𝑃 𝐴 𝐵 = 𝑃[𝐴] (from Test 1), leads to Test 2.

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 86
Some Examples and Consequences of the Axioms

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 87
Some Examples and Consequences of the Axioms

Given the sample space Ω containing six simple events, can you find 𝑃[𝜔1 ]?

1
Is 𝑃 𝜔1 = 6 ? Ω 𝜔1 𝜔2
𝜔3 𝜔4
𝜔6 𝜔5

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 88
Some Examples and Consequences of the Axioms

Given the sample space Ω containing six simple events, can you find 𝑃[𝜔1 ]?

1
Is 𝑃 𝜔1 = 6 ? Ω 𝜔1 𝜔2
No! In fact, we cannot claim this unless we have the information (or are
𝜔3 𝜔4
ready to assume) that all the simple events here are equally likely!
𝜔6 𝜔5

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 89
Some Examples and Consequences of the Axioms

Given the sample space Ω containing six simple events, can you find 𝑃[𝜔1 ]?

Equally likely assumption implies that

𝑃 𝜔1 = 𝑃 𝜔2 = ⋯ = 𝑃 𝜔6

1
In this case we can say that 𝑃 𝜔1 = 6

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 90
Some Examples and Consequences of the Axioms
Another Example

Let’s say we have a sample space Ω containing four

simple events, and I give you the information that

𝑃 𝜔1 = 𝑃 𝜔2 = 𝑃 𝜔3 = 5
1
Ω
𝜔1 𝜔2
- Find 𝑃[𝜔4 ]
- Are the events in Ω equally likely? 𝜔3 𝜔4

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 91
Some Examples and Consequences of the Axioms
Another Example

Let’s say we have a sample space Ω containing four

simple events, and I give you the information that

𝑃 𝜔1 = 𝑃 𝜔2 = 𝑃 𝜔3 = 5
1
Ω
𝜔1 𝜔2
- Find 𝑃[𝜔4 ]
- Are the events in Ω equally likely? 𝜔3 𝜔4

From the axioms we do know that

𝑃 𝜔1 + 𝑃 𝜔2 + 𝑃 𝜔3 + 𝑃 𝜔4 = 𝑃 Ω = 𝑃 𝑠𝑢𝑟𝑒 𝑒𝑣𝑒𝑛𝑡 = 1

This leads to

1 1 1 2
𝑃 𝜔4 = 1− − − =
5 5 5 5

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 92
Some Examples and Consequences of the Axioms
Another Example

Let’s say we have a sample space Ω containing four

simple events, and I give you the information that

𝑃 𝜔1 = 𝑃 𝜔2 = 𝑃 𝜔3 = 5
1
Ω
𝜔1 𝜔2
- Find 𝑃[𝜔4 ]
- Are the events in Ω equally likely? 𝜔3 𝜔4

From the axioms we do know that

𝑃 𝜔1 + 𝑃 𝜔2 + 𝑃 𝜔3 + 𝑃 𝜔4 = 𝑃 Ω = 𝑃 𝑠𝑢𝑟𝑒 𝑒𝑣𝑒𝑛𝑡 = 1

This leads to

1 1 1 2 Clearly while 𝜔1 , 𝜔2 and 𝜔3

𝑃 𝜔4 = 1− − − = are equally likely, 𝜔4 is not.
5 5 5 5

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 93
Some Examples and Consequences of the Axioms
Another Example
Establishing whether events
Let’s say we have a sample space Ω containing four are equally likely or not…
simple events, and I give you the information that
- In practice, we either assume events to
𝑃 𝜔1 = 𝑃 𝜔2 = 𝑃 𝜔3 = 5
1
Ω be equally likely (“belief/experience”).

𝜔1 𝜔2 - Or we find out through other

- Find 𝑃[𝜔4 ]
calculations that their probabilities are
- Are the events in Ω equally likely? 𝜔3 𝜔4 the same (hence they are equally likely).

From the axioms we do know that

𝑃 𝜔1 + 𝑃 𝜔2 + 𝑃 𝜔3 + 𝑃 𝜔4 = 𝑃 Ω = 𝑃 𝑠𝑢𝑟𝑒 𝑒𝑣𝑒𝑛𝑡 = 1

This leads to

1 1 1 2 Clearly while 𝜔1 , 𝜔2 and 𝜔3

𝑃 𝜔4 = 1− − − = are equally likely, 𝜔4 is not.
5 5 5 5

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 94
Probabilities of Mutually Exclusive and Overlapping Events Ω
𝜔1 𝜔2
𝒂𝒏𝒅 𝑣𝑠. 𝒐𝒓
𝜔3 𝜔4
𝜔6 𝐵
𝐴
𝜔5

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 95
Probabilities of Mutually Exclusive and Overlapping Events Ω
𝜔1 𝜔2
𝒂𝒏𝒅 𝑣𝑠. 𝒐𝒓
𝜔3 𝜔4
- 𝐴 and 𝐵 are compound events
𝜔6 𝐵
𝐴
𝜔5
- Clearly, the two cannot occur at the same time (since 𝐴 ∩ 𝐵 = ∅)

- 𝑃 𝐴 𝑎𝑛𝑑 𝐵 = 𝑃 𝐴 ∩ 𝐵 = 0

- Such events, we’ve already defined as being “Mutually Exclusive”.

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 96
Probabilities of Mutually Exclusive and Overlapping Events Ω
𝜔1 𝜔2
𝒂𝒏𝒅 𝑣𝑠. 𝒐𝒓
𝜔3 𝜔4
- 𝐴 and 𝐵 are compound events
𝜔6 𝐵
𝐴
𝜔5
- Clearly, the two cannot occur at the same time (since 𝐴 ∩ 𝐵 = ∅)

- 𝑃 𝐴 𝑎𝑛𝑑 𝐵 = 𝑃 𝐴 ∩ 𝐵 = 0

- Such events, we’ve already defined as being “Mutually Exclusive”.

But what about 𝑃 𝐴 ∪ 𝐵 ?

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 97
Probabilities of Mutually Exclusive and Overlapping Events Ω
𝜔1 𝜔2
𝒂𝒏𝒅 𝑣𝑠. 𝒐𝒓
𝜔3 𝜔4
- 𝐴 and 𝐵 are compound events
𝜔6 𝐵
𝐴
𝜔5
- Clearly, the two cannot occur at the same time (since 𝐴 ∩ 𝐵 = ∅)

- 𝑃 𝐴 𝑎𝑛𝑑 𝐵 = 𝑃 𝐴 ∩ 𝐵 = 0

- Such events, we’ve already defined as being “Mutually Exclusive”.

But what about 𝑃 𝐴 ∪ 𝐵 ?

- From the third axiom we know that for

mutually exclusive events we must have:

- 𝑃 𝐴 𝑜𝑟 𝐵 = 𝑃 𝐴 ∪ 𝐵 = 𝑃 𝐴 + 𝑃[𝐵]

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 98
Probabilities of Mutually Exclusive and Overlapping Events Ω
𝜔1 𝜔2
𝒂𝒏𝒅 𝑣𝑠. 𝒐𝒓
𝜔3 𝜔4
- 𝐴 and 𝐵 are compound events
𝜔6 𝐵
𝐴
𝜔5
- Clearly, the two cannot occur at the same time (since 𝐴 ∩ 𝐵 = ∅)

- 𝑃 𝐴 𝑎𝑛𝑑 𝐵 = 𝑃 𝐴 ∩ 𝐵 = 0
Interesting question: can mutually
exclusive events be independent?
- Such events, we’ve already defined as being “Mutually Exclusive”.

But what about 𝑃 𝐴 ∪ 𝐵 ?

- From the third axiom we know that for

mutually exclusive events we must have:

- 𝑃 𝐴 𝑜𝑟 𝐵 = 𝑃 𝐴 ∪ 𝐵 = 𝑃 𝐴 + 𝑃[𝐵]

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 99
Probabilities of Mutually Exclusive and Overlapping Events Ω
𝜔1 𝜔2
𝒂𝒏𝒅 𝑣𝑠. 𝒐𝒓
𝜔3 𝜔4
- 𝐴 and 𝐵 are compound events
𝜔6 𝐵
𝐴
𝜔5
- Clearly, the two cannot occur at the same time (since 𝐴 ∩ 𝐵 = ∅)

- 𝑃 𝐴 𝑎𝑛𝑑 𝐵 = 𝑃 𝐴 ∩ 𝐵 = 0
Interesting question: can mutually
exclusive events be independent?
- Such events, we’ve already defined as being “Mutually Exclusive”.
No! Since knowledge of one
occurring immediately changes the
But what about 𝑃 𝐴 ∪ 𝐵 ? probability of the other two zero.
- From the third axiom we know that for
mutually exclusive events we must have:

- 𝑃 𝐴 𝑜𝑟 𝐵 = 𝑃 𝐴 ∪ 𝐵 = 𝑃 𝐴 + 𝑃[𝐵]

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 100
Probabilities of Mutually Exclusive and Overlapping Events Ω
𝜔1 𝜔2
𝒂𝒏𝒅 𝑣𝑠. 𝒐𝒓
𝜔3 𝜔4
- 𝐴 and 𝐵 are compound events
𝜔6 𝐵
𝐴
𝜔5
- Clearly, the two cannot occur at the same time (since 𝐴 ∩ 𝐵 = ∅)

- 𝑃 𝐴 𝑜𝑟 𝐵 = 𝑃 𝐴 ∪ 𝐵 = 𝑃 𝐴 + 𝑃[𝐵] 𝑃 𝐴 = 𝑃 𝜔1 + 𝑃 𝜔1 + 𝑃 𝜔5

But

𝑃 𝐴𝐵 =0
ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 101
Probabilities of Mutually Exclusive and Overlapping Events 𝜔2
Ω
𝜔1
What if 𝐴 and 𝐵 are overlapping? i.e., 𝐴 ∩ 𝐵 ≠ ∅ 𝜔3 𝜔4
𝐴
𝜔6 𝐵
What about 𝑃[𝐴 𝑜𝑟 𝐵] now? 𝜔5

ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 102
Probabilities of Mutually Exclusive and Overlapping Events 𝜔2
Ω
𝜔1
What if 𝐴 and 𝐵 are overlapping? i.e., 𝐴 ∩ 𝐵 ≠ ∅ 𝜔3 𝜔4
𝐴
𝜔6 𝐵
What about 𝑃[𝐴 𝑜𝑟 𝐵] now? 𝜔5