0% found this document useful (0 votes)

113 views

Introduction To Probability 1

This document provides an introduction to basic probability theory concepts including: sample spaces, events, axioms of probability, joint, conditional, and marginal probability, Bayes' theorem, independence and conditional independence, random variables and distributions. Key terms are defined such as experiment, outcome, sample space, and event. Probability rules and formulas are presented, including the addition rule, conditional probability, multiplication rule, and total probability theorem. Examples are provided to illustrate concepts.

Uploaded by

Venkata Krishna Morthla

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

113 views

Introduction To Probability 1

Uploaded by

Venkata Krishna Morthla

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 71

Basic Probability Theory (I)

Intro to Bayesian Data Analysis & Cognitive Modeling

Adrian Brasoveanu

[partly based on slides by Sharon Goldwater & Frank Keller and John K. Kruschke]

Fall 2012 UCSC Linguistics

1 Sample Spaces and Events
Sample Spaces
Events
Axioms and Rules of Probability

2 Joint, Conditional and Marginal Probability

Joint and Conditional Probability
Marginal Probability

3 Bayes Theorem

4 Independence and Conditional Independence

5 Random Variables and Distributions

Random Variables
Distributions
Expectation
Terminology

Terminology for probability theory:

experiment: process of observation or measurement; e.g.,
coin flip;
outcome: result obtained through an experiment; e.g., coin
shows tails;
sample space: set of all possible outcomes of an
experiment; e.g., sample space for coin flip: S = {H, T}.
Sample spaces can be finite or infinite.
Terminology

Example: Finite Sample Space

Roll two dice, each with numbers 16. Sample space:

S1 = {hx, yi : x {1, 2, . . . , 6} y {1, 2, . . . , 6}}

Alternative sample space for this experiment sum of the dice:

S2 = {x + y : x {1, 2, . . . , 6} y {1, 2, . . . , 6}}

S2 = {z : z {2, 3, . . . , 12}} = {2, 3, . . . , 12}

Example: Infinite Sample Space

Flip a coin until heads appears for the first time:

S3 = {H, TH, TTH, TTTH, TTTTH, . . . }

Events

Often we are not interested in individual outcomes, but in

events. An event is a subset of a sample space.

Example
With respect to S1 , describe the event B of rolling a total of 7
with the two dice.

B = {h1, 6i, h2, 5i, h3, 4i, h4, 3i, h5, 2i, h6, 1i}
Events

The event B can be represented graphically:

die 2
SG I K M O Q

6
RSFG HI JK LM NO PQ

RF H J L N P

E9 ; = ? A C

5
DE89 :; <= >? @A BC

D8 : < > @ B

7+ - / 1 3 5

4
67*+ ,- ./ 01 23 45

6* , . 0 2 4

) ! # % '

3
() ! "# $% &'

( " $ &

die 1
1 2 3 4 5 6
Events

Often we are interested in combinations of two or more events.

This can be represented using set theoretic operations.
Assume a sample space S and two events A and B:
complement A (also A0 ): all elements of S that are not in A;
subset A B: all elements of A are also elements of B;
union A B: all elements of S that are in A or B;
intersection A B: all elements of S that are in A and B.

These operations can be represented graphically using Venn

diagrams.
Venn Diagrams

B A
A

A AB

A B A B

AB AB
Axioms of Probability

Events are denoted by capital letters A, B, C, etc. The

probability of an event A is denoted by p(A).

Axioms of Probability
1 The probability of an event is a nonnegative real number:
p(A) 0 for any A S.
2 p(S) = 1.
3 If A1 , A2 , A3 , . . . , is a set of mutually exclusive events of S,
then:

p(A1 A2 A3 . . . ) = p(A1 ) + p(A2 ) + p(A3 ) + . . .

Probability of an Event

Theorem: Probability of an Event

If A is an event in a sample space S and O1 , O2 ,P
. . . , On , are the
individual outcomes comprising A, then p(A) = ni=1 p(Oi )

Example
Assume all strings of three lowercase letters are equally
probable. Then whats the probability of a string of three
vowels?
There are 26 letters, of which 5 are vowels. So there are
N = 263 three letter strings, and n = 53 consisting only of
vowels. Each outcome (string) is equally likely, with probability
1
N , so event A (a string of three vowels) has probability
53
p(A) = Nn = 26 3 0.00711.
Rules of Probability

Theorems: Rules of Probability

1 If A and A are complementary events in the sample space
S, then p(A) = 1 p(A).
2 p() = 0 for any sample space S.
3 If A and B are events in a sample space S and A B, then
p(A) p(B).
4 0 p(A) 1 for any event A.
Addition Rule

Axiom 3 allows us to add the probabilities of mutually exclusive

events. What about events that are not mutually exclusive?

Theorem: General Addition Rule

If A and B are two events in a sample space S, then:

p(A B) = p(A) + p(B) p(A B)

Ex: A = has glasses, B = is blond.

p(A) + p(B) counts blondes with glasses
A B
twice, need to subtract once.
Conditional Probability

Definition: Conditional Probability, Joint Probability

If A and B are two events in a sample space S, and p(A) 6= 0
then the conditional probability of B given A is:

p(A B)
p(B|A) =
p(A)
p(A B) is the joint probability of A and B, also written p(A, B).

Intuitively, p(B|A) is the probability that B

will occur given that A has occurred.
Ex: The probability of being blond given A B

that one wears glasses: p(blond|glasses).

Conditional Probability

Example
A manufacturer knows that the probability of an order being
ready on time is 0.80, and the probability of an order being
ready on time and being delivered on time is 0.72.
What is the probability of an order being delivered on time,
given that it is ready on time?
R: order is ready on time; D: order is delivered on time.
p(R) = 0.80, p(R, D) = 0.72. Therefore:

p(R, D) 0.72
p(D|R) = = = 0.90
p(R) 0.80
Conditional Probability

Example
Consider sampling an adjacent pair of words (bigram) from a
large text T . Let BI = the set of bigrams in T (this is our sample
space), A = first word is run = {hrun, w2 i : w2 T } BI and
B = second word is amok = {hw1 , amoki : w1 T } BI.
If p(A) = 103.5 , p(B) = 105.6 , and p(A, B) = 106.5 , what is
the probability of seeing amok following run, i.e., p(B|A)? How
about run preceding amok, i.e., p(A|B)?
p(A, B) 106.5
p(run before amok) = p(A|B) = = = .126
p(B) 105.6
p(A, B) 106.5
p(amok after run) = p(B|A) = = = .001
p(A) 103.5
[How do we determine p(A), p(B), p(A, B) in the first place?]
(Con)Joint Probability and the Multiplication Rule

From the definition of conditional probability, we obtain:

Theorem: Multiplication Rule

If A and B are two events in a sample space S and p(A) 6= 0,
then:
p(A, B) = p(A)p(B|A)
Since A B = B A, we also have that:

p(A, B) = p(B)p(A|B)
Marginal Probability and the Rule of Total Probability

Theorem: Marginalization (a.k.a. Rule of Total Probability)

If events B1 , B2 , . . . , Bk constitute a partition of the sample
space S and p(Bi ) 6= 0 for i = 1, 2, . . . , k, then for any event A
in S:
Xk Xk
p(A) = p(A, Bi ) = p(A|Bi )p(Bi )
i=1 i=1

B1 , B2 , . . . , Bk form a B1 B
B 6
partition of S if they are 2
pairwise mutually exclusive
B5
and if
B1 B2 . . . Bk = S. B
7
B3 B4
Marginalization

Example
In an experiment on human memory, participants have to
memorize a set of words (B1 ), numbers (B2 ), and pictures (B3 ).
These occur in the experiment with the probabilities
p(B1 ) = 0.5, p(B2 ) = 0.4, p(B3 ) = 0.1.
Then participants have to recall the items (where A is the recall
event). The results show that p(A|B1 ) = 0.4, p(A|B2 ) = 0.2,
p(A|B3 ) = 0.1. Compute p(A), the probability of recalling an
item.
By the theorem of total probability:
Pk
p(A) = i=1 p(Bi )p(A|Bi )
= p(B1 )p(A|B1 ) + p(B2 )p(A|B2 ) + p(B3 )p(A|B3 )
= 0.5 0.4 + 0.4 0.2 + 0.1 0.1 = 0.29
Joint, Marginal & Conditional Probability

Example
Proportions for a sample of University of Delaware students
1974, N = 592. Data adapted from Snee (1974).

hairColor
eyeColor black brunette blond red
blue .03 .14 .16 .03 .36
brown .12 .20 .01 .04 .37
hazel/green .03 .14 .04 .05 .27
.18 .48 .21 .12
Joint, Marginal & Conditional Probability

Example

These are the joint probabilities p(eyeColor, hairColor).

hairColor
eyeColor black brunette blond red
blue .03 .14 .16 .03 .36
brown .12 .20 .01 .04 .37
hazel/green .03 .14 .04 .05 .27
.18 .48 .21 .12
Joint, Marginal & Conditional Probability

Example

E.g., p(eyeColor = brown, hairColor = brunette) = .20.

hairColor
eyeColor black brunette blond red
blue .03 .14 .16 .03 .36
brown .12 .20 .01 .04 .37
hazel/green .03 .14 .04 .05 .27
.18 .48 .21 .12
Joint, Marginal & Conditional Probability

Example

These are the marginal probabilities p(eyeColor).

hairColor
eyeColor black brunette blond red
blue .03 .14 .16 .03 .36
brown .12 .20 .01 .04 .37
hazel/green .03 .14 .04 .05 .27
.18 .48 .21 .12
Joint, Marginal & Conditional Probability

Example

E.g., p(eyeColor = brown) =

P
p(eyeColor = brown, hairColor) =
hairColor

.12 + .20 + .01 + .04 = .37

hairColor
eyeColor black brunette blond red
blue .03 .14 .16 .03 .36
brown .12 .20 .01 .04 .37
hazel/green .03 .14 .04 .05 .27
.18 .48 .21 .12
Joint, Marginal & Conditional Probability

Example

These are the marginal probabilities p(hairColor).

hairColor
eyeColor black brunette blond red
blue .03 .14 .16 .03 .36
brown .12 .20 .01 .04 .37
hazel/green .03 .14 .04 .05 .27
.18 .48 .21 .12
Joint, Marginal & Conditional Probability

Example

E.g., p(hairColor = brunette) =

P
p(eyeColor, hairColor = brunette) =
eyeColor

.14 + .20 + .14 = .48

hairColor
eyeColor black brunette blond red
blue .03 .14 .16 .03 .36
brown .12 .20 .01 .04 .37
hazel/green .03 .14 .04 .05 .27
.18 .48 .21 .12
Joint, Marginal & Conditional Probability

Example
To obtain the cond. prob. p(eyeColor|hairColor = brunette),
we do two things:

hairColor
eyeColor black brunette blond red
blue .03 .14 .16 .03 .36
brown .12 .20 .01 .04 .37
hazel/green .03 .14 .04 .05 .27
.18 .48 .21 .12
Joint, Marginal & Conditional Probability

Example
To obtain the cond. prob. p(eyeColor|hairColor = brunette),
we do two things:
i. reduction: we consider only the probabilities in the
brunette column;

hairColor
eyeColor black brunette blond red
blue .14
brown .20
hazel/green .14
.48
Joint, Marginal & Conditional Probability

Example
To obtain the cond. prob. p(eyeColor|hairColor = brunette),
we do two things:
ii. normalization: we divide by the marginal p(brunette),
since all the probability mass is now concentrated here.

hairColor
eyeColor black brunette blond red
blue .14/.48
brown .20/.48
hazel/green .14/.48
.48
Joint, Marginal & Conditional Probability

Example
E.g., p(eyeColor = brown|hairColor = brunette) = .20/.48.

hairColor
eyeColor black brunette blond red
blue .14/.48
brown .20/.48
hazel/green .14/.48
.48
Joint, Marginal & Conditional Probability

Example
Moreover:
p(eyeColor = brown|hairColor = brunette) 6=
p(hairColor = brunette|eyeColor = brown)

Consider p(hairColor|eyeColor = brown):

hairColor
eyeColor black brunette blond red
blue .03 .14 .16 .03 .36
brown .12 .20 .01 .04 .37
hazel/green .03 .14 .04 .05 .27
.18 .48 .21 .12
Joint, Marginal & Conditional Probability

Example
To obtain p(hairColor|eyeColor = brown), we reduce,
hairColor
eyeColor black brunette blond red
blue
brown .12 .20 .01 .04 .37
hazel/green

and we normalize.
hairColor
eyeColor black brunette blond red
blue
brown .12/.37 .20/.37 .01/.37 .04/.37 .37
hazel/green
Joint, Marginal & Conditional Probability

Example
So p(hairColor = brunette|eyeColor = brown) = .20/.37,
hairColor
eyeColor black brunette blond red
blue
brown .12/.37 .20/.37 .01/.37 .04/.37 .37
hazel/green

but p(eyeColor = brown|hairColor = brunette) = .20/.48.

hairColor
eyeColor black brunette blond red
blue .14/.48
brown .20/.48
hazel/green .14/.48
.48
Conditional Probability: p(A|B) vs p(B|A)

Example 1: Disease Symptoms (from Lindley 2006)

Doctors studying a disease D noticed that 90% of patients
with the disease exhibited a symptom S.
Later, another doctor sees a patient and notices that she
exhibits symptom S.
As a result, the doctor concludes that there is a 90%
chance that the new patient has the disease D.

But: while p(S|D) = .9, p(D|S) might be very different.

Conditional Probability: p(A|B) vs p(B|A)

Example 2: Forensic Evidence (from Lindley 2006)

A crime has been committed and a forensic scientist
reports that the perpetrator must have attribute P. E.g., the
DNA of the guilty party is of type P.
The police find someone with P, who is charged with the
crime. In court, the forensic scientist reports that attribute P
only occurs in a proportion of the population.
Since is very small, the court infers that the defendant is
highly likely to be guilty, going on to assess the chance of
guilt as 1 since an innocent person would only have a
chance of having P.

But: while p(P|innocent) = , p(innocent|P) might be

much bigger.
Conditional Probability: p(A|B) vs p(B|A)

Example 3: Significance Tests (from Lindley 2006)

As scientistis, we often set up a straw-man/null hypothesis.
E.g., we may suppose that a chemical has no effect on a
reaction and then perform an experiment which, if the
effect does not exist, gives numbers that are very small.
If we obtain large numbers compared to expectation, we
say the null is rejected and the effect exists.
Large means numbers that would only arise a small
proportion of times if the null hypothesis is true.
So we say that we have confidence 1 that the effect
exists, and (often .05) is the significance level of the test.

But: while p(effect|null) = , p(null|effect) might

be bigger.
Bayes Theorem:

Relating p(A|B) and p(B|A)

We can infer something about a disease from a symptom, but
we need to do it with some care the proper inversion is
accomplished by the Bayes rule
Bayes Theorem
p(A|B)p(B)
p(B|A) =
p(A)

Derived using mult. rule: p(A, B) = p(A|B)p(B) = p(B|A)p(A).

Denominator p(A) can be computed using theorem of total
k
P
probability: p(A) = p(A|Bi )p(Bi ).
i=1

Denominator is a normalizing constant: ensures p(B|A) sums to

1. If we only care about relative sizes of probabilities, we can
ignore it: p(B|A) p(A|B)p(B).
Bayes Theorem

Example
Consider the memory example again. What is the probability
that an item that is correctly recalled (A) is a picture (B3 )?
By Bayes theorem:
p(B )p(A|B3 )
p(B3 |A) = Pk 3
i=1 p(Bi )p(A|Bi )

0.10.1
= 0.29 = 0.0345

The process of computing p(B|A) from p(A|B) is sometimes

called Bayesian inversion.
Bayes Theorem

Example
A fair coin is flipped three times. There are 8 possible
outcomes, and each of them is equally likely.

For each outcome, we can count the number of heads and the
number of switches (i.e., HT or TH subsequences):

outcome probability #heads #switches

HHH 1/8 3 0
THH 1/8 2 1
HTH 1/8 2 2
HHT 1/8 2 1
TTH 1/8 1 1
THT 1/8 1 2
HTT 1/8 1 1
TTT 1/8 0 0
Bayes Theorem

Example
The joint probability p(#heads, #switches) is therefore:

#heads
#switches 0 1 2 3
0 1/8 0 0 1/8 2/8
1 0 2/8 2/8 0 4/8
2 0 1/8 1/8 0 2/8
1/8 3/8 3/8 1/8

Let us use Bayes theorem to relate the two conditional

probabilities:
p(#switches = 1|#heads = 1)
p(#heads = 1|#switches = 1)
Bayes Theorem

Example

#heads
#switches 0 1 2 3
0 1/8 0 0 1/8 2/8
1 0 2/8 2/8 0 4/8
2 0 1/8 1/8 0 2/8
1/8 3/8 3/8 1/8

Note that:
p(#switches = 1|#heads = 1) = 2/3
p(#heads = 1|#switches = 1) = 1/2
Bayes Theorem

Example

#heads
#switches 0 1 2 3
0 1/8 0 0 1/8 2/8
1 0 2/8 2/8 0 4/8
2 0 1/8 1/8 0 2/8
1/8 3/8 3/8 1/8

2
The joint probability p(#switches = 1, #heads = 1) = 8 can
be expressed in two ways:
2 3 2
p(#switches = 1|#heads = 1) p(#heads = 1) = 3 8 = 8
Bayes Theorem

Example

#heads
#switches 0 1 2 3
0 1/8 0 0 1/8 2/8
1 0 2/8 2/8 0 4/8
2 0 1/8 1/8 0 2/8
1/8 3/8 3/8 1/8

2
The joint probability p(#switches = 1, #heads = 1) = 8 can
be expressed in two ways:

p(#heads = 1|#switches = 1)p(#switches = 1) = 21 48 = 2

8
Bayes Theorem

Example

#heads
#switches 0 1 2 3
0 1/8 0 0 1/8 2/8
1 0 2/8 2/8 0 4/8
2 0 1/8 1/8 0 2/8
1/8 3/8 3/8 1/8

Bayes theorem is a consequence of the fact that we can reach

the joint p(#switches = 1, #heads = 1) in these two ways:
by restricting attention to the row #switches = 1
by restricting attention to the column #heads = 1
Bayes Theorem and Significance Tests

Example: Selenium and cancer (from Lindley 2006)

A clinical trial tests the effect of a selenium-based treatment on
cancer.
Bayes Theorem and Significance Tests

Example: Selenium and cancer (from Lindley 2006)

A clinical trial tests the effect of a selenium-based treatment on
cancer.
We assume the existence of a parameter such that: if = 0,
selenium has no effect on cancer; if > 0, selenium has a
beneficial effect; finally, if < 0, selenium has a harmful effect.
The trial would not have been set up if the negative value was
reasonably probable, i.e., p( < 0|cancer) is small.
The value = 0 is of special interest: it is the null value. The
hypothesis that = 0 is the null hypothesis.
The non-null values of are the alternative hypothese(s), and
the procedure to be developed is a test of the null hypothesis.
The null hypothesis is a straw man that the trial attempts to
reject: we hope the trial will show selenium to be of value.
Bayes Theorem and Significance Tests

Example: Selenium and cancer (from Lindley 2006)

Assume the trial data is a single number d: the difference in
recovery rates between the patients receiving selenium and
those on the placebo.
Bayes Theorem and Significance Tests

Example: Selenium and cancer (from Lindley 2006)

Assume the trial data is a single number d: the difference in
recovery rates between the patients receiving selenium and
those on the placebo.
Before seeing the data d provided by the trial, the procedure
selects values of d that in total have small probability if = 0.
We declare the result significant if the actual value of d
obtained in the trial is one of them.
The small probability is the significance level . The trial is
significant at the level if the actually observed d is in this set.
Assume the actual d is one of these improbable values. Since
improbable events happen (very) rarely, doubt is cast on the
assumption that = 0, i.e., that the null hypothesis is true.
That is: either an improbable event has occurred or the null
hypothesis is false.
Bayes Theorem and Significance Tests

Example: Selenium and cancer (from Lindley 2006)

The test uses only one probability of the form p(d| = 0), i.e.,
the probability of data when the null is true.
Importantly: is not the probability of the actual difference d
observed in the trial, but the (small) probability of the set of
extreme values.
Thus, a significance test does not use only the observed value
d, but also those values that might have occurred but did not.
Determining what might have occurred is the major source of
problems with null hypothesis significance testing (NHST). See
Kruschke (2011), ch. 11, for more details.
Bayes Theorem and Significance Tests

Example: Selenium and cancer (from Lindley 2006)

The test uses only p(d| = 0), but its goal is to make inferences
about the inverse probability p( = 0|d), i.e., the probability of the
null given the data. Two Bayesian ways (Kruschke 2011, ch. 12):
Bayesian model comparison: we want the posterior odds, i.e.,
odds after the trial, of the null relative to the alternative(s):
p(d|=0)p(=0)
p(=0|d) p(d) p(d|=0)p(=0) p(d|=0)
o( = 0|d) = p(6=0|d) = p(d|6=0)p(6=0) = p(d|6=0)p(6=0) = p(d|6=0) o( = 0)
p(d)

Bayesian parameter estimation: we compute the posterior

probability of all the (relevant) values of the parameter and
examine it to see if the null value is credible:

compute p(|d) = p(d|)p()

p(d) , then check whether the null value is in
the interval of values with the highest posterior probability.
Independence

Definition: Independent Events

Two events A and B are independent iff:

p(A, B) = p(A)p(B)

Intuition: two events are independent if knowing whether one

event occurred does not change the probability of the other.
Note that the following are equivalent:

p(A, B) = p(A)p(B) (1)

p(A|B) = p(A) (2)
p(B|A) = p(B) (3)
Independence

Example
A coin is flipped three times. Each of the eight outcomes is equally
likely. A: heads occurs on each of the first two flips, B: tails occurs on
the third flip, C: exactly two tails occur in the three flips. Show that A
and B are independent, B and C dependent.

A = {HHH, HHT} p(A) = 41

B = {HHT, HTT, THT, TTT} p(A) = 12
C = {HTT, THT, TTH} p(C) = 83
A B = {HHT} p(A B) = 18
B C = {HTT, THT} p(B C) = 14

p(A)p(B) = 14 21 = 18 = p(A B), hence A and B are independent.

p(B)p(C) = 21 83 = 163
6= p(B C), hence B and C are dependent.
Independence

Example
A simple example of two attributes that are independent: the
suit and value of cards in a standard deck: there are 4 suits
{, , , } and 13 values of each suit {2, , 10, J, Q, K, A},
for a total of 52 cards.

Consider a randomly dealt card:

marginal probability its a heart:
p(suit = ) = 13/52 = 1/4
conditional probability its a heart given that its a queen:
p(suit = |value = Q) = 1/4
in general, p(suit|value) = p(suit), hence suit and value
are independent
Independence

Example
We can verify independence by cross-multiplying marginal
probabilities too. For every suit s {, , , } and value
v {2, , 10, J, Q, K, A}:
1
p(suit = s, value = v) = 52 (in a well-shuffled deck)
13
p(suit = s) = 52 = 41
4
p(value = v ) = 52 1
= 13
p(suit = s) p(value = v ) = 41 13
1 1
= 52

Independence comes up when we construct mathematical

descriptions of our beliefs about more than one attribute: to
describe what we believe about combinations of attributes, we
often assume independence and simply multiply the separate
beliefs about individual attributes to specify the joint beliefs.
Conditional Independence

Definition: Conditionally Independent Events

Two events A and B are conditionally independent given event
C iff:
p(A, B|C) = p(A|C)p(B|C)

Intuition: Once we know whether C occurred, knowing about A

or B doesnt change the probability of the other.

Show that the following are equivalent:

p(A, B|C) = p(A|C)p(B|C) (4)

p(A|B, C) = p(A|C) (5)
p(B|A, C) = p(B|C) (6)
Conditional Independence

Example
In a noisy room, I whisper the same number n {1, . . . , 10} to
two people A and B on two separate occasions. A and B
imperfectly (and independently) draw a conclusion about what
number I whispered. Let the numbers A and B think they heard
be na and nb , respectively.

Are na and nb independent (a.k.a. marginally independent)?

No. E.g., wed expect p(na = 1|nb = 1) > p(na = 1).

Are na and nb conditionally independent given n? Yes: if you

Definition: Random Variable

If S is a sample space with a probability measure and X is a
real-valued function defined over the elements of S, then X is
called a random variable.
We symbolize random variables (r.v.s) by capital letters
(e.g., X ), and their values by lower-case letters (e.g., x).

Example
Given an experiment in which we roll a pair of 4-sided dice, let
the random variable X be the total number of points rolled with
the two dice.
E.g. X = 5 picks out the set {h1, 4i, h2, 3i, h3, 2i, h4, 1i}.
Specify the full function denoted by X and determine the probabilities
associated with each value of X .
Random Variables

Example
Assume a balanced coin is flipped three times. Let X be the
random variable denoting the total number of heads obtained.

Outcome Probability x Outcome Probability x

1 1
HHH 8 3 TTH 8 1
1 1
HHT 8 2 THT 8 1
1 1
HTH 8 2 HTT 8 1
1 1
THH 8 2 TTT 8 0

Hence, p(X = 0) = 18 , p(X = 1) = p(X = 2) = 83 ,

p(X = 3) = 18 .
Probability Distributions

Definition: Probability Distribution

If X is a random variable, the function f (x) whose value is
p(X = x) for each value x in the range of X is called the
probability distribution of X .
Note: the set of values x (the support) = the domain of f = the range of X .

Example
For the probability function defined in the previous example:

x f (x)
1
0 8
3
1 8
3
2 8
1
3 8
Probability Distributions

A probability distribution is often represented as a probability

histogram. For the previous example:
1
0.9
0.8
0.7
0.6
f(x)

0.5
0.4
0.3
0.2
0.1
0
0 1 2 3
x
Probability Distributions

Any probability distribution function (or simply: probability

distribution) f of a random variable X is such that:
1 f (x) 0, x Domain(f )
P
2 xDomain(f ) f (x) = 1.
Distributions over Infinite Sets

Example: geometric distribution

Let X be the number of coin flips needed before getting heads,
where ph is the probability of heads on a single flip. What is the
distribution of X ?
Assume flips are independent, so:

p(Tn1 H) = p(T)n1 p(H)

Therefore:
p(X = n) = (1 ph )n1 ph
Expectation

The notion of mathematical expectation derives from games of

chance. Its the product of the amount a player can win and the
probability of wining.

Example
In a raffle, there are 10,000 tickets. The probability of winning is
1
therefore 10,000 for each ticket. The prize is worth $4,800.
$4,800
Hence the expectation per ticket is 10,000 = $0.48.

In this example, the expectation can be thought of as the

average win per ticket.
Expectation

This intuition can be formalized as the expected value (or

mean) of a random variable:

Definition: Expected Value

If X is a random variable and f (x) is the value of its probability
distribution at x, then the expected value of X is:
X
E(X ) = x f (x)
x
Expectation

Example
A balanced coin is flipped three times. Let X be the number of
heads. Then the probability distribution of X is:
1
for x = 0
83

f (x) = 8 for x = 1
3
for x = 2
81

8 for x = 3

The expected value of X is:

X 1 3 3 1 3
E(X ) = x f (x) = 0 +1 +2 +3 =
x
8 8 8 8 2
Expectation

The notion of expectation can be generalized to cases in which

a function g(X ) is applied to a random variable X .

Theorem: Expected Value of a Function

If X is a random variable and f (x) is the value of its probability
distribution at x, then the expected value of g(X ) is:
X
E[g(X )] = g(x)f (x)
x
Expectation

Example
Let X be the number of points rolled with a balanced (6-sided)
die. Find the expected value of X and of g(X ) = 2X 2 + 1.
The probability distribution for X is f (x) = 16 . Therefore:
6
X X 1 21
E(X ) = x f (x) = x =
x
6 6
x=1
6
X X 1 94
E[g(X )] = g(x)f (x) = (2x 2 + 1) =
x
6 6
x=1
Summary
Sample space S contains all possible outcomes of an
experiment; events A and B are subsets of S.
rules of probability: p(A) = 1 p(A).
if A B, then p(A) p(B).
0 p(B) 1.
addition rule: p(A B) = p(A) + p(B) p(A, B).
conditional probability: p(B|A) = p(A,B)
p(A) .
independence: p(A, B) = p(A)p(B).
P
marginalization: p(A) = Bi p(Bi )p(A|Bi ).
Bayes theorem: p(B|A) = p(B)p(A|B)
p(A) .
any value of an r.v. picks out a subset of the sample
space.
for any value of an r.v., a distribution returns a probability.
the expectation of an r.v. is its average value over a
distribution.
References

Anderson, John R.: 1990, The adaptive character of thought.

Lawrence Erlbaum Associates, Hillsdale, NJ.
Kruschke, John K.: 2011, Doing Bayesian Data Analysis: A Tutorial
with R and BUGS. Academic Press/Elsevier.
Lindley, Dennis V.: 2006, Understanding Uncertainty. Wiley,
Hoboken, NJ.
Snee, R. D.: 1974, Graphical display of two-way contingency tables,
The American Statistician 38, 912.

STAT 400 Midterm 1 Cheat Sheet
No ratings yet
STAT 400 Midterm 1 Cheat Sheet
4 pages
Course Notes STATS 325 Stochastic Processes: Department of Statistics University of Auckland
No ratings yet
Course Notes STATS 325 Stochastic Processes: Department of Statistics University of Auckland
195 pages
Moment Generating Functions
No ratings yet
Moment Generating Functions
7 pages
NUS ST2334 Lecture Notes
No ratings yet
NUS ST2334 Lecture Notes
56 pages
Chapter II Normal Distribution
No ratings yet
Chapter II Normal Distribution
39 pages
A3 - Random Variables and Distributions
100% (1)
A3 - Random Variables and Distributions
19 pages
Ch1 Random Variables and Probability Distributions 0
No ratings yet
Ch1 Random Variables and Probability Distributions 0
25 pages
Statistics For Begineers
No ratings yet
Statistics For Begineers
28 pages
Chapter 1 Basic Definitions of Stochastic Process, Kolmogorov Consistency Theorem (Lecture On 01-05-2021) - STAT 243 - Stochastic Process
No ratings yet
Chapter 1 Basic Definitions of Stochastic Process, Kolmogorov Consistency Theorem (Lecture On 01-05-2021) - STAT 243 - Stochastic Process
5 pages
Chapter 4: Probability Distributions: 4.1 Random Variables
100% (1)
Chapter 4: Probability Distributions: 4.1 Random Variables
53 pages
Solution Exercises Estimation
No ratings yet
Solution Exercises Estimation
7 pages
3 Basics of Probability
No ratings yet
3 Basics of Probability
84 pages
Probability
100% (2)
Probability
520 pages
CH 10
No ratings yet
CH 10
18 pages
Estimation and Hypothesis
100% (1)
Estimation and Hypothesis
32 pages
Set Theory Exercise 1 2017
No ratings yet
Set Theory Exercise 1 2017
6 pages
Sampling Distribution of The Proportion
No ratings yet
Sampling Distribution of The Proportion
8 pages
Normal Distribution & Z-Scores
100% (1)
Normal Distribution & Z-Scores
82 pages
Introduction To Descriptive Statistics
No ratings yet
Introduction To Descriptive Statistics
156 pages
Permutation Combination Probability
No ratings yet
Permutation Combination Probability
13 pages
Bayesian Statistics: A User's Perspective
No ratings yet
Bayesian Statistics: A User's Perspective
24 pages
Random Variables
No ratings yet
Random Variables
57 pages
Summary of Probability 2 1
No ratings yet
Summary of Probability 2 1
3 pages
The Normal Distribution and Areas Under The Normal Curve
No ratings yet
The Normal Distribution and Areas Under The Normal Curve
48 pages
An Introduction To Bayesian Statistics and MCMC Methods
No ratings yet
An Introduction To Bayesian Statistics and MCMC Methods
69 pages
Probability and Statistics Class 7
No ratings yet
Probability and Statistics Class 7
25 pages
Lab 4: Logistic Regression: PSTAT 131/231, Winter 2019
No ratings yet
Lab 4: Logistic Regression: PSTAT 131/231, Winter 2019
10 pages
3 - (M) Discrete Random Variables & Probability Distributions
100% (1)
3 - (M) Discrete Random Variables & Probability Distributions
58 pages
Introduction To Probability Theory
100% (1)
Introduction To Probability Theory
207 pages
Gamma Distribution
No ratings yet
Gamma Distribution
30 pages
Chapter 12
No ratings yet
Chapter 12
26 pages
Probability Theory Problems PDF
No ratings yet
Probability Theory Problems PDF
11 pages
Poisson Distribution
No ratings yet
Poisson Distribution
22 pages
6.5 Order Statistik
No ratings yet
6.5 Order Statistik
13 pages
Discrete Probability Distributions
No ratings yet
Discrete Probability Distributions
2 pages
App.A - Detection and Estimation in Additive Gaussian Noise PDF
No ratings yet
App.A - Detection and Estimation in Additive Gaussian Noise PDF
55 pages
Fundamental of Mathematical Statistics-S C Gupta & V K Kapoor - 1 - 1
No ratings yet
Fundamental of Mathematical Statistics-S C Gupta & V K Kapoor - 1 - 1
25 pages
Full Stats Notes
No ratings yet
Full Stats Notes
126 pages
Sampling Distribution
No ratings yet
Sampling Distribution
37 pages
PPT7-Discrete Time - Markov Chain
No ratings yet
PPT7-Discrete Time - Markov Chain
37 pages
CH 02
No ratings yet
CH 02
53 pages
Chapter3-Discrete Distribution
No ratings yet
Chapter3-Discrete Distribution
141 pages
05 ES Random Variables and Their Probability Distributions
100% (2)
05 ES Random Variables and Their Probability Distributions
30 pages
Statistics
No ratings yet
Statistics
193 pages
FHMM 1134 Tutorial 5 Correlation and Regression
No ratings yet
FHMM 1134 Tutorial 5 Correlation and Regression
4 pages
Statistics
No ratings yet
Statistics
41 pages
Mathematical Induction Exercise PDF
No ratings yet
Mathematical Induction Exercise PDF
4 pages
Compiled by Birhan Fetene: Stat 276: Introductory Probability Lecture Notes
No ratings yet
Compiled by Birhan Fetene: Stat 276: Introductory Probability Lecture Notes
77 pages
Tutorial-Set Theory PDF
No ratings yet
Tutorial-Set Theory PDF
4 pages
Stats Formula
No ratings yet
Stats Formula
2 pages
PPT9-Renewal Process
No ratings yet
PPT9-Renewal Process
29 pages
Statistics 131 Worksheet 10: Let X, · · ·, X ∼ U (0, θ), θ > 0. Find unbiased estimators of θ
No ratings yet
Statistics 131 Worksheet 10: Let X, · · ·, X ∼ U (0, θ), θ > 0. Find unbiased estimators of θ
2 pages
(Solutions Manual) Probability and Statistics For Engineers and Scientists Manual Hayler
100% (1)
(Solutions Manual) Probability and Statistics For Engineers and Scientists Manual Hayler
51 pages
Descriptive Stats Exercises Solutions 162
No ratings yet
Descriptive Stats Exercises Solutions 162
6 pages
Discrete Distributions Modified
No ratings yet
Discrete Distributions Modified
12 pages
Math 111 Conditional Probability Solutions
No ratings yet
Math 111 Conditional Probability Solutions
3 pages
Multivariate Distributions
No ratings yet
Multivariate Distributions
8 pages
Hyper Geometric Distribution: Examples and Formula
No ratings yet
Hyper Geometric Distribution: Examples and Formula
9 pages
Lectures on the Coupling Method
From Everand
Lectures on the Coupling Method
Torgny Lindvall
No ratings yet
Chapter 4: Probability and Probabilistic Models: Statistics I
No ratings yet
Chapter 4: Probability and Probabilistic Models: Statistics I
85 pages
Indian Astrology Books - Indian Astrology Books at MLBD Books - Buy Indology Books From India Online
No ratings yet
Indian Astrology Books - Indian Astrology Books at MLBD Books - Buy Indology Books From India Online
9 pages
Brief Biodata Chintan Edited For Luz
No ratings yet
Brief Biodata Chintan Edited For Luz
4 pages
Shpeizer 2018 Teaching Critical Thinking As A Vehicle For Personal and Social Transformation
No ratings yet
Shpeizer 2018 Teaching Critical Thinking As A Vehicle For Personal and Social Transformation
18 pages
A Critical Review of Technology Acceptance Literature - Long Li PHD
No ratings yet
A Critical Review of Technology Acceptance Literature - Long Li PHD
20 pages
Citing Textual Evidence Rubric 2
No ratings yet
Citing Textual Evidence Rubric 2
2 pages
The Paragraph Part 2
No ratings yet
The Paragraph Part 2
14 pages
TSS Training Package Implementation Guidefinal 0
No ratings yet
TSS Training Package Implementation Guidefinal 0
18 pages
Slavic Mythology
No ratings yet
Slavic Mythology
3 pages
General English Famous Quotes
No ratings yet
General English Famous Quotes
4 pages
The Figure of Suicide For Love
No ratings yet
The Figure of Suicide For Love
5 pages
AMWIK Connection 8th Edition
No ratings yet
AMWIK Connection 8th Edition
36 pages
Arms and The Man Act 3
50% (2)
Arms and The Man Act 3
2 pages
A Review On Machine Learning Techniques
No ratings yet
A Review On Machine Learning Techniques
5 pages
Behavioral Interviewing: Hiring Smart For Hiring Managers
No ratings yet
Behavioral Interviewing: Hiring Smart For Hiring Managers
53 pages
Students Perceptions On The Availability of Prescribed Study Material Under The New NSFAS Book Allowance Funding Model
No ratings yet
Students Perceptions On The Availability of Prescribed Study Material Under The New NSFAS Book Allowance Funding Model
22 pages
Stages of Teacher'S Development
No ratings yet
Stages of Teacher'S Development
8 pages
GS4 Jan 2024
No ratings yet
GS4 Jan 2024
76 pages
Critical Thinking 12th Edition Brooke Noel Moore - Read the ebook now with the complete version and no limits
100% (1)
Critical Thinking 12th Edition Brooke Noel Moore - Read the ebook now with the complete version and no limits
48 pages
Plotinus Complete Works PDF
100% (2)
Plotinus Complete Works PDF
400 pages
Assignment No - 1 (Who Am I)
No ratings yet
Assignment No - 1 (Who Am I)
8 pages
Pangasinan State University School of Advanced Studies
No ratings yet
Pangasinan State University School of Advanced Studies
23 pages
Spiritual Reflection PDF
No ratings yet
Spiritual Reflection PDF
1 page
Lahirnya Pancasila Sebagai Pemersatu Bangsa Indonesia
No ratings yet
Lahirnya Pancasila Sebagai Pemersatu Bangsa Indonesia
13 pages
Bremond - The Logic of Narrative Possibilities
100% (1)
Bremond - The Logic of Narrative Possibilities
26 pages
Fisher, Pioneers, Settlers, Aliens, Exiles (Zimbabwe)
No ratings yet
Fisher, Pioneers, Settlers, Aliens, Exiles (Zimbabwe)
290 pages
Late Roman Art Industry. Edited by Rolf Wnkes and Alois Riegl
100% (1)
Late Roman Art Industry. Edited by Rolf Wnkes and Alois Riegl
371 pages
Battletome Age of Sigmar: BATTLETOME: FLESH-EATER COURTS (HB) ENG
No ratings yet
Battletome Age of Sigmar: BATTLETOME: FLESH-EATER COURTS (HB) ENG
2 pages
Theories of International Relations 6th Edition Scott Burchill 2024 Scribd Download
100% (2)
Theories of International Relations 6th Edition Scott Burchill 2024 Scribd Download
40 pages
Female Muslim
No ratings yet
Female Muslim
5 pages
(FREE PDF Sample) The Other Orpheus A Poetics of Modern Homosexuality Literary Criticism and Cultural Theory Merrill Cole Ebooks
100% (3)
(FREE PDF Sample) The Other Orpheus A Poetics of Modern Homosexuality Literary Criticism and Cultural Theory Merrill Cole Ebooks
70 pages