0% found this document useful (0 votes)

15 views9 pages

Lecture 2

Lecture notes 2

Uploaded by

dahiyakhushi01

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views9 pages

Lecture 2

Lecture notes 2

Uploaded by

dahiyakhushi01

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

PROBABILITY THEORY – LECTURE 2 – BHATT & GRIFFITHS – FALL 2023

Integer-Valued and Discrete Random Variables

In some sense, the entire subject of probability and statistics is about distributions of random variables.
Random variables, as the very name suggests, are quantities that vary, over time, or from individual to
individual, and the reason for the variability is some underlying random process. Depending on exactly
how an underlying experiment ends, the random variable takes different values. In other words, the
value of the random variable is determined by the sample point  that prevails, when the underlying
experiment is actually conducted. We cannot know a priori the value of the random variable, because
we do not know a priori which sample point  will prevail when the experiment is conducted. We try to
understand the behaviour of a random variable by analysing the probability structure of that underlying
random experiment. Random variables, like probabilities, originated in gambling. Therefore, the random
variables that come to us more naturally, are integer-valued random variables; for examples, the sum of
the two rolls when a die is rolled twice. Integer-valued random variables are special cases of what are
known as discrete random variables. Discrete or not, a common mathematical definition of all random
variables is the following.

Definition: Let  be a sample space corresponding to some experiment and let X:   ℝ be a function
from the sample space to the real line. Then X is called a random variable.

Discrete random variables are those that take a finite or a countably infinite number of possible values.
In particular, all integer-valued random variables are discrete. From the point of view of understanding
the behaviour of a random variable, the important thing is to know the probabilities with which X takes
its different possible values.

Definition: Let X:   ℝ be a discrete random variable taking a finite or countably infinite number of
values x1, x2, x3, … The probability distribution or the probability mass function (PMF) of X is the function
p(x) = P(X = x), x = x1, x2, x3, … and p(x) = 0, otherwise. Note that it is common to not explicitly mention
the phrase “p(x) = 0, otherwise.”

Independence

Definition: Let X1, X2, …, Xk be k  2 discrete random variables defined on the same sample space . We
say that X1, X2, …, Xk are independent if P(X1 = x1, X2 = x2, …, Xk = xk) = P(X1 = x1) P (X2 = x2) … P(Xk = xk) for all
x1, x2, …, xk.

Example: Consider the experiment of tossing a fair coin (or any coin) four times. Suppose X1 is the
number of heads in the first two tosses, and X2 is the number of heads in the last two tosses. Then, it is
intuitively clear that X1 and X2 are independent, because the last two tosses carry no information
regarding the first two tosses. The independence can be easily mathematically verified by using the
definition of independence.

Example: Consider the experiment of drawing 13 cards at random from a deck of 52 cards. Suppose X1 is
the number of aces and X2 is the number of clubs among the 13 cards. Then, X1 and X2 are not
independent. For example, P(X1 = 4, X2 = 0) = 0, but P(X1 = 4) and P(X2 = 0) are both > 0, and so
P(X1 = 4) P(X2 = 0) > 0. So, X1, X2 cannot be independent.
Expected Value

By definition, a random variable takes different values on different occasions. It is natural to want to
know what value it takes on average. Averaging is a very primitive concept. A simple average of just the
possible values of the random variable will be misleading because some values may have so little
probability that they are relatively inconsequential. The average or the mean value, also called the
expected value of a random variable is a weighted average of the different values of X, weighted
according to how important the value is. Here is the definition.

Definition: Let X be a discrete random variable. We say that the expected value of X exists if

∑|xi|P ( x i ) <∞, in which case the expected value is defined as

μ=E ( X )=∑ x i P (x i)
i

The expected value is also known as the expectation or the mean of X.

Example: Let X be the number of heads obtained in two tosses of a fair coin.

Because the coin is fair, we expect it to show heads 50% of the number of times it is tossed, which is 50%
of 2, that is, 1.

Example: Let X be the sum of the two rolls when a fair die is rolled twice.

Definition: Let a random variable X have a finite mean . The variance of X is defined as
σ 2=E [ ( X−μ )2 ]and the standard deviation of X is defined as  = √ σ 2 .
Example: Consider the experiment of two tosses of a fair coin and let X be the number of heads
obtained.

Bayesian Networks

A Bayesian network is a data structure that represents the dependencies among random variables.
Bayesian networks have the following properties:

a) They are directed graphs

b) Each node on the graph represents a random variable
c) An arrow from X to Y represents that X is a parent of Y. That is, the probability distribution of Y
depends on the value of X
d) Each node X has probability distribution P(X | Parents(X)).

Example: Consider an example of a Bayesian network that involves variables that affect whether we get
to our appointment on time.
Let’s describe this Bayesian network from the top down:

Rain (R) is the root node in this network. This means that its probability distribution is not reliant on any
prior event. In our example, Rain is a random variable that can take the values {none, light, heavy} with
the following probability distribution:

none light heavy

0.7 0.2 0.1

Maintenance (M), in our example, encodes whether there is train track maintenance, taking the values
{yes, no}. Rain is a parent node of Maintenance, which means that the probability distribution of
Maintenance is affected by Rain.

R yes no
none 0.4 0.6
light 0.2 0.8
heavy 0.1 0.9

Train (T) is the variable that encodes whether the train is on time or delayed, taking the values {on time,
delayed}. Note that Train has arrows pointing to it from both Maintenance and Rain. This means that
both are parents of Train, and their values affect the probability distribution of Train.

R M on time delayed
none yes 0.8 0.2
none no 0.9 0.1
light yes 0.6 0.4
light no 0.7 0.3

yes 0.4 0.6

heavy

no 0.5 0.5
heavy

Appointment (A) is a random variable that represents whether we attend our appointment, taking the
values {attend, miss}. Note that its only parent is Train. This point about Bayesian network is
noteworthy: parents include only direct relations. It is true that maintenance affects whether the train is
on time, and whether the train is on time affects whether we attend the appointment. However, in the
end, what directly affects our chances of attending the appointment is whether the train came on time,
and this is what is represented in the Bayesian network. For example, if the train came on time, it could
be heavy rain and track maintenance, but that has no effect over whether we made it to our
appointment.

T attend miss
on
0.9 0.1
time
T attend miss
delayed 0.6 0.4

We can create the following diagram to represent the conditional probabilities.

Example: Find the probability of missing the meeting when the train was delayed on a day with no
maintenance and light rain, or P(light, no, delayed, miss).

Example: If, while working for the company, you are expected to attend 100 meetings, calculate the
expected number of meetings that will be missed.
Bayesian Inference

Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the
probability for a hypothesis as more evidence or information becomes available.

Example: We want to compute the probability distribution of the Appointment variable given the
evidence that there is light rain and no track maintenance. That is, we know that there is light rain and
no track maintenance, and we want to figure out what are the probabilities that we attend the
appointment and that we miss the appointment.

Note: In this area of inference, there has been a debate for centuries as to whether the Bayesian
approach or the frequentist approach is better. The difference between the two approaches will be
outlined in the next two examples.

Example: We have a population of M&M’s, and in this population the percentage of yellow M&M’s is
either 10% or 20%. You have been hired as a statistical consultant to decide whether the true percentage
of yellow M&M’s is 10% or 20%. If you make the correct decision, your boss gives you a bonus. On the
other hand, if you make the wrong decision, you lose your job.

Let’s start with the frequentist inference. Using standard statistical notation, we can write

Hypothesis: H0 is 10% yellow M&Ms, and HA is >10% yellow M&Ms

Significance level: α = 0.05
One sample: red, green, yellow, blue, orange
Observed data: k = 1, n = 5
P(k  1 | n = 5, p = 0.10) = 1 – P(k = 0 | n = 5, p = 0.10) = 1 – 0.95 ≈ 0.41

Note: The p-value is the probability of observed or more extreme outcome given that the null
hypothesis is true.
Therefore, we fail to reject H0 and conclude that the data do not provide convincing evidence that the
proportion of yellow M&M’s is greater than 10%. This means that if we had to pick between 10% and
20% for the proportion of M&M’s, even though this hypothesis testing procedure does not actually
confirm the null hypothesis, we would likely stick with 10% since we couldn’t find evidence that the
proportion of yellow M&M’s is greater than 10%.

The Bayesian inference works differently as below.

Hypotheses: H1 is 10% yellow M&Ms, and H2 is 20% yellow M&Ms

Prior: P(H1) = P(H2) = 0.5
Sample: red, green, yellow, blue, orange
Observed data: k = 1, n = 5
Likelihood: P(k = 1| H1) =5C1 × 0.10 × 0.904 ≈ 0.33
P(k = 1| H2) = 5C1 × 0.20 × 0.804 ≈ 0.41

¿ 0.5 × 0.33
Posterior: P ( H 1|k=1 ¿=P ( H 1) P ( k =1| H 1 ¿ = ≈ 0.45
P (k =1) ( 0.5 ×0.33 )+(0.5 ×0.41)

P ( H 2|k=1 ¿=1−0.45=0.55

The posterior probabilities of whether H1 or H2 is correct are close to each other. As a result, with equal
priors and a low sample size, it is difficult to make a decision with a strong confidence, given the
observed data. However, H2 has a higher posterior probability than H1, so if we had to make a decision at
this point, we should pick H2, i.e., the proportion of yellow M&Ms is 20%, which is different to the
decision based on the frequentist approach.

The table below summarizes what the results would look like if we had chosen larger sample sizes.
Under each of these scenarios, the frequentist method yields a higher p-value than our significance
level, so we would fail to reject the null hypothesis with any of these samples. On the other hand, the
Bayesian method always yields a higher posterior for the second model where p is equal to 0.20. So, the
decisions that we would make are contradictory to each other.

Frequentist Bayesian H1 Bayesian H2

Observed Data P(k or more | 10% yellow) P(10% yellow | n, k) P(20% yellow | n, k)

n = 5, k = 1 0.41 0.45 0.55

n = 10, k = 2 0.26 0.39 0.61

n = 15, k = 3 0.18 0.34 0.66

n = 20, k = 4 0.13 0.29 0.71

However, if we had set up our framework differently in the frequentist method and set our null
hypothesis to be p = 0.20 and our alternative to be p < 0.20, we would obtain different results. This
shows that the frequentist method is highly sensitive to the null hypothesis, while in the Bayesian
method, our results would be the same regardless of which order we evaluate our models.

Example: Suppose that in all the ICC tournament cricket matches between India and Pakistan, India has
won 15 times while Pakistan won 5 times.

So, if you were to bet on the winner of the next match, who would it be? You would likely say India, as
they have won 75% of the matches.

However, what if you are told that in rain affected matches, India won 5 times and Pakistan won 5 times,
and that it will definitely rain during the next match? Intuitively, it is easy to see that the chances of
winning for Pakistan have increased. But by how much?

Suppose B is the event of Pakistan winning and A is the event of rain. Therefore,

P(A) = 1/2, since it rained ten times during the 20 matches.

P(B) is 1/4, since Pakistan has won five matches out of 20.

P(A|B) = 1, since it rained every time when Pakistan won.

Substituting the values in the conditional probability formula, we get

P ( B| A ¿=P ( A|B ¿ P( B) ¿ =0.5

P( A)

Shreve Stochcal4fin 2
No ratings yet
Shreve Stochcal4fin 2
99 pages
Solution Exercises List 2 Brownian Motion and Stochastic Calculus
No ratings yet
Solution Exercises List 2 Brownian Motion and Stochastic Calculus
9 pages
Reflective Essay of Probability Statistics
No ratings yet
Reflective Essay of Probability Statistics
24 pages
Information Theory: 1 Random Variables and Probabilities X
No ratings yet
Information Theory: 1 Random Variables and Probabilities X
8 pages
Mastering Probabilistic Graphical Models Using Python - Sample Chapter
No ratings yet
Mastering Probabilistic Graphical Models Using Python - Sample Chapter
36 pages
Business Econometrics Using SAS Tools (BEST) : Class IV - Probability Refresher
No ratings yet
Business Econometrics Using SAS Tools (BEST) : Class IV - Probability Refresher
31 pages
Probability and Random Variables
No ratings yet
Probability and Random Variables
14 pages
Probability Formula Sheet
No ratings yet
Probability Formula Sheet
11 pages
MM3&4 - Probability and Distributions Summary Notes
No ratings yet
MM3&4 - Probability and Distributions Summary Notes
31 pages
Iit-Jam: Mathematical Statistics (MS)
No ratings yet
Iit-Jam: Mathematical Statistics (MS)
51 pages
Introduction To Probability and Random Processes: Appendix
No ratings yet
Introduction To Probability and Random Processes: Appendix
19 pages
Lecture 9
No ratings yet
Lecture 9
28 pages
Engineering Statistics - 4
No ratings yet
Engineering Statistics - 4
43 pages
Theory of Probability .
No ratings yet
Theory of Probability .
11 pages
Lecture Note-TKD-2
No ratings yet
Lecture Note-TKD-2
7 pages
Scribe: Naive Bayes Classifier
No ratings yet
Scribe: Naive Bayes Classifier
16 pages
Stat - G. Assignment
No ratings yet
Stat - G. Assignment
21 pages
MODULE 1 - Random Variables and Probability Distributions
No ratings yet
MODULE 1 - Random Variables and Probability Distributions
12 pages
Basic Statistics For Lms
0% (1)
Basic Statistics For Lms
23 pages
Information & Communication
No ratings yet
Information & Communication
13 pages
Unit1 - Read-Only
No ratings yet
Unit1 - Read-Only
191 pages
Chapter 6 (Non-Math)
No ratings yet
Chapter 6 (Non-Math)
14 pages
Lecturenotes 2
No ratings yet
Lecturenotes 2
10 pages
Quantitative Methods in Management
No ratings yet
Quantitative Methods in Management
100 pages
Distribution and Statistical Interference
No ratings yet
Distribution and Statistical Interference
43 pages
Introduction to Probability and Statistics (2)
No ratings yet
Introduction to Probability and Statistics (2)
28 pages
Ugc Net Economics English Book 2
No ratings yet
Ugc Net Economics English Book 2
17 pages
CHP 5
No ratings yet
CHP 5
63 pages
Review of Stat
No ratings yet
Review of Stat
34 pages
EEN330 Topic 2
No ratings yet
EEN330 Topic 2
2 pages
ProbabilityStatistics_Probability2 (1)
No ratings yet
ProbabilityStatistics_Probability2 (1)
11 pages
Chapter 4 Bayesian Networks
No ratings yet
Chapter 4 Bayesian Networks
62 pages
Draw PDF
No ratings yet
Draw PDF
21 pages
Probability Review Stochastic
No ratings yet
Probability Review Stochastic
23 pages
Probability Theory: Much Inspired by The Presentation of Kren and Samuelsson
No ratings yet
Probability Theory: Much Inspired by The Presentation of Kren and Samuelsson
27 pages
LN06 Random Variables
No ratings yet
LN06 Random Variables
5 pages
4&5 Basic Probability Concepts and Discrete Probability Distribution
No ratings yet
4&5 Basic Probability Concepts and Discrete Probability Distribution
10 pages
Random Variable Definition, Types, Formula & Example
No ratings yet
Random Variable Definition, Types, Formula & Example
1 page
Statistics Tutorial
No ratings yet
Statistics Tutorial
4 pages
statatics and probability chapter 3 and 4
No ratings yet
statatics and probability chapter 3 and 4
10 pages
Statistical Inference
No ratings yet
Statistical Inference
106 pages
E3BAIS Chapter 1
No ratings yet
E3BAIS Chapter 1
74 pages
r3MM_Chapter 1 (9)
No ratings yet
r3MM_Chapter 1 (9)
72 pages
Introductory Probability and The Central Limit Theorem
No ratings yet
Introductory Probability and The Central Limit Theorem
11 pages
What Is Statistic
No ratings yet
What Is Statistic
129 pages
Random Variables and Distributions - New
No ratings yet
Random Variables and Distributions - New
84 pages
060-random-variables
No ratings yet
060-random-variables
5 pages
Biostatistics - Probability - 02 October 2024
No ratings yet
Biostatistics - Probability - 02 October 2024
42 pages
MAS 102_Topic 1
No ratings yet
MAS 102_Topic 1
13 pages
Probability Distribution: Question Booklet
No ratings yet
Probability Distribution: Question Booklet
8 pages
Probability
No ratings yet
Probability
36 pages
ECON1005 U5
No ratings yet
ECON1005 U5
32 pages
Finals (MS)
No ratings yet
Finals (MS)
3 pages
3 Expectation
No ratings yet
3 Expectation
70 pages
Lesson 1. Exploring Random Variables
No ratings yet
Lesson 1. Exploring Random Variables
27 pages
NLP Module 2
No ratings yet
NLP Module 2
73 pages
IICT Unit-1 Notes-1
No ratings yet
IICT Unit-1 Notes-1
34 pages
Chapter Two
No ratings yet
Chapter Two
10 pages
204 DS Notes Unit 5
No ratings yet
204 DS Notes Unit 5
2 pages
Slide 2 - 20191
No ratings yet
Slide 2 - 20191
44 pages
Random Variables
No ratings yet
Random Variables
44 pages
Lectures on the Coupling Method
From Everand
Lectures on the Coupling Method
Torgny Lindvall
No ratings yet
Discrete Probability Distributions
No ratings yet
Discrete Probability Distributions
8 pages
Lecture-4 - Random Variables and Probability Distributions
67% (3)
Lecture-4 - Random Variables and Probability Distributions
42 pages
Mathematical Expectation: Examples
No ratings yet
Mathematical Expectation: Examples
12 pages
Stanford Stats 200
No ratings yet
Stanford Stats 200
6 pages
Stats 116 SU
No ratings yet
Stats 116 SU
128 pages
Unit-V (1)
No ratings yet
Unit-V (1)
57 pages
3.17) Refer To Exercise 3.7. Find The Mean and Standard Deviation For Y The Number of Empty
No ratings yet
3.17) Refer To Exercise 3.7. Find The Mean and Standard Deviation For Y The Number of Empty
2 pages
2024 MAY Exam P Syllabus
100% (1)
2024 MAY Exam P Syllabus
7 pages
Lecture 04
No ratings yet
Lecture 04
4 pages
3 October-2020 R18 (JNTUH)
No ratings yet
3 October-2020 R18 (JNTUH)
1 page
Commonly Used Discrete Distributions: 1st Semester 2022
No ratings yet
Commonly Used Discrete Distributions: 1st Semester 2022
46 pages
Basic Stochastic Processes 1st Edition Devolder - The ebook in PDF format is available for download
No ratings yet
Basic Stochastic Processes 1st Edition Devolder - The ebook in PDF format is available for download
76 pages
Chapter_5_Discrete_Probability_Distributions
No ratings yet
Chapter_5_Discrete_Probability_Distributions
36 pages
Random Variables and Probability Distribution
No ratings yet
Random Variables and Probability Distribution
54 pages
Syllabus
No ratings yet
Syllabus
2 pages
STD: Xii (Sem: Iii) Maths Test Total MARK: 100 CH: 2, 3, 4, 5, 6, 7
No ratings yet
STD: Xii (Sem: Iii) Maths Test Total MARK: 100 CH: 2, 3, 4, 5, 6, 7
9 pages
Boole's Inequality
No ratings yet
Boole's Inequality
5 pages
Math 1429 Code Guess Paper 2024-2025
No ratings yet
Math 1429 Code Guess Paper 2024-2025
5 pages
Tutorial 4 Expected Value, Variance and Moment Generating Function
No ratings yet
Tutorial 4 Expected Value, Variance and Moment Generating Function
15 pages
Class 12 Probability Project
No ratings yet
Class 12 Probability Project
15 pages
Distribution and Loss Functions: Alex Robinson
No ratings yet
Distribution and Loss Functions: Alex Robinson
20 pages
TOPIC: Random Variables and Probability Distributions: Chapter Test
100% (2)
TOPIC: Random Variables and Probability Distributions: Chapter Test
2 pages
Sta416 Topic 5 3
No ratings yet
Sta416 Topic 5 3
22 pages
CS 725: Foundations of Machine Learning: Lecture 2. Overview of Probability Theory For ML
No ratings yet
CS 725: Foundations of Machine Learning: Lecture 2. Overview of Probability Theory For ML
23 pages
Sta 112 Past Questions Answer
No ratings yet
Sta 112 Past Questions Answer
5 pages
Certificate of Calibration (2050 Model)
No ratings yet
Certificate of Calibration (2050 Model)
2 pages
S2 Review Exercise 1
No ratings yet
S2 Review Exercise 1
15 pages
Lecture Notes in Probability: Raz Kupferman Institute of Mathematics The Hebrew University April 5, 2009
No ratings yet
Lecture Notes in Probability: Raz Kupferman Institute of Mathematics The Hebrew University April 5, 2009
159 pages

Lecture 2

Uploaded by

Lecture 2

Uploaded by

PROBABILITY THEORY – LECTURE 2 – BHATT & GRIFFITHS – FALL 2023

Integer-Valued and Discrete Random Variables

∑|xi|P ( x i ) <∞, in which case the expected value is defined as

The expected value is also known as the expectation or the mean of X.

a) They are directed graphs

none light heavy

yes 0.4 0.6

We can create the following diagram to represent the conditional probabilities.

Hypothesis: H0 is 10% yellow M&Ms, and HA is >10% yellow M&Ms

The Bayesian inference works differently as below.

Hypotheses: H1 is 10% yellow M&Ms, and H2 is 20% yellow M&Ms

Frequentist Bayesian H1 Bayesian H2

n = 5, k = 1 0.41 0.45 0.55

n = 10, k = 2 0.26 0.39 0.61

n = 15, k = 3 0.18 0.34 0.66

n = 20, k = 4 0.13 0.29 0.71

P(A) = 1/2, since it rained ten times during the 20 matches.

P(A|B) = 1, since it rained every time when Pakistan won.

Substituting the values in the conditional probability formula, we get

P ( B| A ¿=P ( A|B ¿ P( B) ¿ =0.5

You might also like