0% found this document useful (0 votes)
34 views

STA2100 Probability

Uploaded by

kigsboni
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

STA2100 Probability

Uploaded by

kigsboni
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

STA 2100 Probability and Statistics I

Chapter 6
Introduction to Probability

Learning outcomes
Upon completing this topic, you should be able to:

• Define probability

• Calculate probabilities

• List and use the rules of probability

• Identify mutually exclusive events and independent events

79
STA 2100 Probability and Statistics I

1. Introduction
Probability is the language we use to model uncertainty. We all intuitively under-
stand that few things in life are certain. There is usually an element of uncertainty
or randomness around outcomes of our choices. For instance in business this un-
certainty can make all the difference between a good investment and a poor one.
Hence an understanding of probability and how we might incorporate this into
our decision making processes is important. In this lesson, we look at the logical
basis for how we might express a probability and some basic rules that probabilities
should follow. In subsequent lessons, we look at how we can use probabilities to
aid decision making. It is advisable that you revisit the set theory lesson to help
you understand this lesson better.

2. Definitions
The probability of a specific event is a mathematical statement about the likelihood
that it will occur. All probabilities are numbers between 0 and 1, inclusive; a
probability of 0 means that the event will never occur, and a probability of 1
means that the event will always occur. We often use the letter P to represent a
probability. For example, P (Rain) would be the probability that it rains. In other
cases P r is used to represent a probability. It is important to understand some
terms used in probability. They include:

Probability Experiment
An experiment is an activity where we do not know for certain what will happen,
but we will observe what happens. For example:

• We may ask someone whether or not they have used our IT products.

• We may observe the temperature at midday tomorrow.

• We may toss a coin and observe whether it shows “heads” or “tails”.

• Rolling a die and observing the number that is rolled is a probability experi-
ment.

Or A probability experiment is an action through which specific results (counts,


measurements or responses) are obtained.

80
STA 2100 Probability and Statistics I

Outcome
An outcome, or elementary event, is one of the possible things that can happen.
For example, suppose that we are interested in the shoe size of the next customer
to come into a shoe shop. Possible outcomes include “eight”, “twelve”, “nine and
a half” and so on. In any experiment, one, and only one, outcome occurs.
The result of a single trial in a probability experiment is the outcome.

Sample space
The sample space is the set of all possible outcomes. For example, it could be the
set of all shoe sizes or the sample space when rolling a die has six outcomes. {1,
2, 3, 4, 5, 6}

Event
An event consists of one or more outcomes and is a subset of the sample space.
An event is usually denoted using a capital (uppercase) letter. For example “the
shoe size of the next customer is less than 9” is an event. It is made up of all of
the outcomes where the shoe size is less than 9. Of course an event might contain
just one outcome. We can set a letter say E to represent this event.
For instance, A die is rolled. Event A is rolling an even number.
A simple event is an event that consists of a single outcome.
Example. A die is rolled. Event A is rolling an even number. This is not a simple
event because the outcomes of event A are {2, 4, 6}.

3. Rules of probability
• Probabilities are usually expressed in terms of fractions or decimal numbers
or percentages.

Therefore we could express the probability of it raining today as


1
p(rain) = 20 = 0.05

• All probabilities are measured on a scale ranging from zero to one. The
probabilities of most events lie strictly between zero and one. An event with
probability zero is an impossible event and an event with probability one is
said to be a certain event.

81
STA 2100 Probability and Statistics I

• The collection of all possible outcomes, that is the sample space, has a
probability of 1. For example, if an experiment consists of only two outcomes
– success or failure – then the probability of either a success or a failure is
1. That is P(success or failure) = 1.
0
• With respect to an event E, the complementary event, denoted as E c or E
or ∼ E (read as “E prime”), is the negation of the event E. For example, if
we consider the event that it will rain tomorrow. The complement of this
event is the event that it will not rain tomorrow.We should note that the
probability of an event E and its complement is equal to 1 i.e.
P (E) + P (E c ) = 1

Example. There are 5 red chips, 4 blue chips, and 6 white chips in a basket. Find
the probability of randomly selecting a chip that is not blue.
Solution: P (selecting a blue chip) = 4/15 = 0.267
implying P (not selecting a blue chip) = 1 − 0.267 = 0.733

• Two or more events are said to be mutually exclusive if both cannot occur
simultaneously. In the example above, the outcomes success and failure are
mutually exclusive because both cannot occur at the same time.Two events
A and B are mutually exclusive if A ∩ B = 0.

• A list of collectively exhaustive events contains all possible elementary events


for an experiment. For example, rolling a die once, the possible events are the
numbers {1,2,3,4,5,6} which are said to be collectively exhaustive because
it includes all possible outcomes. Thus, the sample spaces are collectively
exhaustive.

Example. Let A = the event that it is Monday, B = the event that it is Tuesday,
and C = the event that it is the year 2014. A and B are mutually exclusive events,
since it cannot be both Monday and Tuesday at the same time. A and C are not
mutually exclusive events, since it can be a Monday in the year 2014.

• Two events are said to be independent if the occurrence of one does not affect
the probability of the second occurring. If two events are independent, then
the probability that both will occur is equal to the product of their individual
probabilities. In other words, if A and B are independent, then

82
STA 2100 Probability and Statistics I

P (A ∩ B) = P (A) × P (B)

Example. If you toss a coin and look out of the window, it would be reasonable
to suppose that the events “get heads” and “it is raining” would be independent.
However, not all events are independent.

4. How do we measure probability?


There are three main ways in which we can measure probability. All three obey
the basic rules described above. Different people argue in favour of the different
views of probability and some will argue that each kind has its uses depending on
the circumstances.

4.1. Classical (or theoretical) probability


If all possible outcomes are “equally likely” then we can adopt the classical approach
to measuring probability. For example, if we tossed a fair coin, there are only two
possible outcomes – a head or a tail – both of which are equally likely, and hence
P (head) = P (tail) = 12
The underlying idea behind this view of probability is symmetry. In this ex-
ample, there is no reason to think that the outcome Head and the outcome Tail
have different probabilities and so they should have the same probability. Since
there are two outcomes and one of them must occur, both outcomes must have
probability 1/2. Another commonly used example is rolling dice. There are six
possible outcomes {1, 2, 3, 4, 5, 6} when a die is rolled and each of them should
have an equal chance of occurring. Hence the P (1) = 16 ,P (2) = 16 , . . . .
Other calculations can be made such as P (Even number) = 36 .
This follows from the formula

T otal number of outcomes in which event occurs


P (Event) =
T otal number of possible outcomes

4.2. Frequentist or Empirical (or statistical) probability


When the outcomes of an experiment are not equally likely, we can conduct exper-
iments to give us some idea of how likely the different outcomes are. For example,
suppose we were interested in measuring the probability of producing a defective

83
STA 2100 Probability and Statistics I

item in a manufacturing process. This probability could be measured by monitoring


the process over a reasonably long period of time and calculating the proportion of
defective items. What constitutes a reasonably long period of time is, of course,
a difficult question to answer. In a more simple case, if we did not believe that a
coin was fair, we could toss the coin a large number of times and see how often we
obtained a head. In both cases we perform the same experiment a large number
of times and observe the outcome. This is the basis of the frequentist view. By
conducting experiments the probability of an event can easily be estimated using
the following formula:

N umber of times an event occurs


P (Event) =
T otal number of times experiment perf ormed

The larger the experiment, the closer this probability is to the “true” probability.
The frequentist view of probability regards probability as the long run relative
frequency (or proportion). So, in the defects example, the “true” probability of
getting a defective item is the proportion obtained in a very large experiment
(strictly an infinitely long sequence of trials). In the frequentist view, probability is
a property of nature and, since, in practice, we cannot conduct infinite sequences
of trials, in many cases we never really know the “true” values of probabilities. We
also have to be able to imagine a long sequence of “identical” trials. This does not
seem to be appropriate for “one-off” experiments like the launch of a new product.
For these reasons (and others) some people prefer the subjective or Bayesian view
of probability.
Example. A travel agent determines that in every 50 reservations she makes, 12
will be for a cruise. What is the probability that the next reservation she makes
will be for a cruise?
Solution:

12
p(cruise) = 50
= 0.24 

Note: As an experiment is repeated over and over, the empirical probability of an


event approaches the theoretical (actual) probability of the event.

For instance, Sally flips a coin 20 times and gets 3 heads. The empirical probability

84
STA 2100 Probability and Statistics I

is 20
3
. This is not representative of the theoretical probability which is 12 .
As the number of times Sally tosses the coin increases, the law of large
numbers indicates that the empirical probability will get closer and closer
to the theoretical probability. This is referred to as the Law of Large
Numbers.

4.3. Subjective
We are probably all intuitively familiar with this method of assigning probabilities.
When we board an Airplane, we judge the probability of it crashing to be sufficiently
small that we are happy to undertake the journey. Similarly, the odds given by
bookmakers on a football match reflect people’s beliefs about which team will win.
This probability does not fit within the frequentist definition as the match cannot
be played more than once.
One potential difficulty with using subjective probabilities is that it is sub-
jective. So the probabilities which two people assign to the same event can be
different. This becomes important if these probabilities are to be used in deci-
sion making. For example, if you were deciding whether to launch a new product
and two people had very different ideas about how likely success or failure of this
product was, then the decision to go ahead could be controversial.
If both individuals assessed the probability of success to be 0.8 then the decision
to go ahead could easily be based on this belief. However, if one said 0.8 and the
other 0.3, then the decision is not straightforward. We would need a way to
reconcile these different positions.
Subjective probability is based on personal judgment, accumulation of knowl-
edge and experience. For instance, medical doctors sometimes assign subjective
probabilities to the length of life expectancy of people with breast cancer.

5. Laws of probability
5.1. Multiplication law

The probability of two independent events E1 and E2 both occurring can be written
as
E2) = P (E1) × P (E2), and this is known as the multiplication law
T
P (E1

85
STA 2100 Probability and Statistics I

of probability.
For example, the probability of throwing a six followed by another six on two
rolls of a die is calculated as follows. The outcomes of the two rolls of the die are
independent. Let E1 denote a six on the first roll and E2 a six on the second roll.
Then
P (two sixes) = P (E1 and E2)
P (E1) × P (E2) = 1
6
× 61 )= 1
36

5.2. Addition law


The multiplication law is concerned with the probability of two or more independent
events occurring. The addition law describes the probability of any of two or more
events occurring. The addition law for two events E1 and E2 is
P (E1 or E2) = P (E1)+P (E2)−P (E1 and E2). recall, n(A∪B) = n(A)+
n(B) − n(A ∩ B) from set theory.
This describes the probability of either event E1 or event E2 happening.
A more basic version of the rule works where events are mutually exclusive: if
events E1 and E2 are mutually exclusive then
P (E1 or E2) = P (E1) + P (E2).
This simplification occurs because when two events are mutually exclusive they
cannot happen together and so P (E1 and E2) = 0.
Example. Consider the following information: 50 percent of families in a certain
city subscribe to the morning newspaper, 65 percent subscribe to the afternoon
newspaper, and 30 percent of the families subscribe to both newspapers. What
percentage of families subscribe to at least one newspaper?
We are told P(Morning) = 0.5, P(Afternoon) = 0.65 and P(Morning and
Afternoon) = 0.3.
Therefore using the addition law
P(at least one paper) = P(Morning or Afternoon) = P(Morning) + P(Afternoon)
− P(Morning and Afternoon)
= 0.5 + 0.65 − 0.3 = 0.85.

86
STA 2100 Probability and Statistics I

So 85% of the city subscribe to at least one of the newspapers.


Below are some graphical illustrations on the laws, and use of the laws of
probability.
Mutually Exclusive Events
Two events, A and B, are mutually exclusive if they
cannot occur at the same time.

A and B

A
B A B

A and B are A and B are not


mutually exclusive. mutually exclusive.

Athiany H,K O 18

87
STA 2100 Probability and Statistics I

Mutually Exclusive Events


Example:
Decide if the two events are mutually exclusive.
Event A: Roll a number less than 3 on a die.
Event B: Roll a 4 on a die.

A B
1
4
2

These events cannot happen at the same time, so


the events are mutually exclusive.
Athiany H,K O 19

88
STA 2100 Probability and Statistics I

Mutually Exclusive Events


Example:
Decide if the two events are mutually exclusive.
Event A: Select a Jack from a deck of cards.
Event B: Select a heart from a deck of cards.

A J 9 2 B
3 10
J J A 7
K 4
J 5
6Q8

Because the card can be a Jack and a heart at the


same time, the events are not mutually exclusive.
Athiany H,K O 20

89
STA 2100 Probability and Statistics I

The Addition Rule


The probability that event A or B will occur is given by
P (A or B) = P (A) + P (B) – P (A and B ).
If events A and B are mutually exclusive, then the rule
can be simplified to P (A or B) = P (A) + P (B).
Example:
You roll a die. Find the probability that you roll a number
less than 3 or a 4.
The events are mutually exclusive.
P (roll a number less than 3 or roll a 4)
= P (number is less than 3) + P (4)
2 1 3
    0.5
6 6 6
Athiany H,K O 21

90
STA 2100 Probability and Statistics I

The Addition Rule


Example:
A card is randomly selected from a deck of cards. Find the
probability that the card is a Jack or the card is a heart.
The events are not mutually exclusive because the
Jack of hearts can occur in both events.

P (select a Jack or select a heart)


= P (Jack) + P (heart) – P (Jack of hearts)
4 13 1
  
52 52 52
16

52  0.308

Athiany H,K O 22

91
STA 2100 Probability and Statistics I

The Addition Rule


Example:
100 college students were surveyed and asked how many
hours a week they spent studying. The results are in the
table below. Find the probability that a student spends
between 5 and 10 hours or more than 10 hours studying.
Less More
5 to 10 Total
then 5 than 10
Male 11 22 16 49
Female 13 24 14 51
Total 24 46 30 100

The events are mutually exclusive.


P (5 to10 hours or more than 10 hours) = P (5 to10) + P (10)
46 30 76
    0.76
100 100 100
Athiany H,K O 23

6. Conditional probability
So far we have only considered probabilities of single events or of several indepen-
dent events, like two rolls of a die. However, in reality, many events are related.
For example, the probability of it raining in 5 minutes time is dependent on whether
or not it is raining now. We need a mathematical notation to capture how the
probability of one event depends on other events taking place. We do this as
follows. Consider two events A and B. We write P (A|B) for the probability of
A given that B has already happened. We describe P (A|B) as the conditional
probability of A given B.
We can calculate these conditional probabilities using the formula

P (A and B)
P (A|B) =
P (B)
that is, in terms of the probability of both events occurring, P(A and B), and the
probability of the event that has already taken place, P(B).

6.1. Independence of two compound events


Suppose twenty identical cards, numbered 1-20 inclusive are placed in a large box,
a card if then drawn at random. The events A and B are defined as follows:
A: the number is prime
B: the number is 14 or more
If the number is 14 or more, a player is given a price. Suppose the person
drawing the card does not reveal the number on the card but only says that the
number is prime. How does this affect the players chance of wining?

92
STA 2100 Probability and Statistics I

To fully understand this question, we need to use a contingency table as shown


below.
0
A A
B 2/20 5/20 7/20
0
B 6/20 7/20 13/20
8/20 12/20 1
This implies that
(i) P (B/A) = P (B∩A)
P (A)
= P (A∩B)
P (A)
= 2/20
8/20
= 1
4
0
0
(ii) P (B/A ) = P (B∩A )
P (A0 )
= 5/20
12/20
= 5
12
0
Is P (B/A) = P (B/A )?
0
Consider now B is 16 or more, is P (B/A) = P (B/A )?
Is the players winning of a price dependent on the previous event?
0
Two events A and B are said to be Independent whenever P (B/A) = P (B/A ) =
P (A)
To test the independence of the two events A and B, it will be sufficient to show
0
that any two of these probabilities P (B/A) = P (B/A ) = P (A) are equal.
0 0 0 0
Example. Show that P (B ) = P (B /A) = P (B /A ) = 13/20 and P (A) =
0
P (A/B) = P (A/B ) = 8/20 = 2/5
For independent events, we can multiply their probabilities
Thus the statements, ’A and B are independent’ and P (A ∩ B) = P (A)P (B)
are equivalent.
Proof
Suppose A and B are independent, then P (A) = P (A/B),that is P (A) =
P (A∩B)
P (B)
=⇒P (A) ∗ P (B) = P (A ∩ B)

Exercise 15. Given the events A and B are independent, copy and complete the
following contingency table. The results can b obtained as follows:

93
STA 2100 Probability and Statistics I

0
A A
3
B 20
y u
0
B x z v
1
4
t 1

7. Tree Diagrams
In some cases, especially where there are three or more different events being
considered, tree diagrams are an alternative to the contingency tables.
Tree diagrams or probability trees are simple clear ways of presenting proba-
bilistic information. Let us first consider a simple example in which a fair coin is
tossed twice. Suppose we are interested in the probability that we get a head on
both tosses. This probability can be calculated as
P(Head and Head) = P(Head on 1st toss) × P(Head on 2nd toss|head on 1st
toss)
This example can be represented as a tree diagram in which experiments are
represented by circles (called nodes) and the outcomes of the experiments as
branches. The branches are annotated by the probability of the particular out-
come.
Example. In a large farm, 20% of a particular kind of flower is red and 80% is
white. The farmer decides to take samples of flowers from the production of this
particular kind. What is the probability that he obtains;
(a) One or two red flowers in a sample of two?
(b) At least two red flowers in a sample of three?
Solution:
This information can be represented in the tree diagram as follows.

94
STA 2100 Probability and Statistics I

Start

1/5 4/5

R W
1/5 4/5 1/5 4/5

R W R W

4/5 1/5 1/5 4/5


1/5 4/5 1/5 4/5

R W R W R W
W R
Resulting in…
RRR RRW RWR RWW WRR WRW WWR WWW

In this problem, we assume that probability of these events remain the same
even after picking a small number of flowers from the production line.
(a) P (RR) + P (RW ) + P (W R) this represents one or two red flowers
But
P (RR) = 1/5 ∗ 1/5 = 1/25
P (RW ) = 1/5 ∗ 4/5 = 4/25
P (W R) = 4/5 ∗ 1/5 = 4/25
=⇒P (RR) + P (RW ) + P (W R) = 1/25 + 4/25 + 4/25 = 9/25
Alternatively,
P (one or two red f lowers) = 1 − P (no red f lower) = 1 − P (W W )
= 1 − (4/5 ∗ 4/5) = 1 − 16/20 = 9/25
(b) P (RRR) + P (RRW ) + P (RW R) + P (W RR)
= (1/5)3 + (1/5 ∗ 1/5 ∗ 4/5) + (1/5 ∗ 4/5 ∗ 1/5) + (4/5 ∗ 1/5 ∗ 1/5) = 13/125

Example. A box has 6 blue beads and 4 red beads. Three beads are drawn at
random (without replacement). What is the probability that: (a) they are all blue
(b) there are exactly two blue balls (c) there is at least one blue bead
Solution:

95
STA 2100 Probability and Statistics I

In the case of draws made without replacement, and tree diagrams being com-
plex/many branches, we can use the combinations for quick computation of prob-
abilities
(a) For this case, total number of ways of selecting 3 beads from 10 is
10
C3 = 120
Selecting 3 from 6 is 6 C3 = 20
Therefore, P (All blue) = 6 C3 /10 C3 = 20/120 = 1/6
(b) Selecting 2 from 6= 6 C2 = 15
selecting 1 from 4 =4 C1 = 4
Therefore, exactly 2 red will be 15∗4
120
= 1/2
(c) 1 − P (all red) = 1 − 4 C3 /10 C3 = 1 − 4/120 = 29/30 

8. Bayes Theorem
Suppose we know P (A),P (∼ A) and also P (B/A) and P (B/ ∼ A), then we can
represent the first branches of a tree diagram and those of B and ∼ B in the
second branches. Can we then determine P (A/B)?
This problem can be solved by using Thomas Bayes theorem. Bayes was
an English Mathematician and his theorem has given us a fundamental result of
statistical inference.
Mathematically, Bayes theorem gives the relationship between probabilities of
A and B, P (A) and P (B) and the conditional probabilities of A given B and
Bgiven A; denoted by P (A/B), P (B/A)
Commonly, Bayes theorem is;
Simple P (A/B) = P (B/A)P
P (B)
(A)
f or P (B) 6= 0
(The meaning depends on the interpretation of probability ascribed to the
terms)
Extended P (A/B) = P (B/A)PP(A)+P
(B/A)P (A)
(B/A0 )P (A0 )

Example. Kamau has two gardeners, David and James. David comes on 1/3 of
the occasions and James 2/3 of the occasions. There is a probability of 1/10
that David will forget to water the flowers and a probability of 1/2 that James
will forget to water the flowers. One day, Kamau had to leave the house before
the gardener arrived. On his return, he found that the gardener had come and
gone, and also that the flowers were not watered. What is the probability that it
is James who came that day?

96
STA 2100 Probability and Statistics I

Solution:

Let
D: David comes
J: James comes
W: Flowers watered
The tree diagram will then look like this

W’
1/2

J
2/3
W

D
9/10
W

We need to find P (J/W 0 )


0)
P (J/W 0 ) = PP(J∩W
(W 0 )
=⇒ P (W 0 ) = P (W 0 /J)P (J) + P (W 0 /D)P (D)
From the diagram
P (W 0 /J) = 1/2
Then
0)
P (J/W 0 ) = PP(J∩W
(W 0 )
(2/3∗1/2)
= (1/3∗1/10)+(2/3∗1/2) 1/3
= 11/30 = 10
11


Exercise 16. A certain video store uses blank tapes bought from two sources,
say source A and source B. Suppose that the owner of the video store buys 30%
of the tapes from A and its is known that 5% of the video tapes are defective,
then buys 70% from source B when 20% are usually defective. On recording some
movies on the tapes, the owner discovers that certain tape is defective. What is

97
STA 2100 Probability and Statistics I

the probability that the video tape was supplied by A?

9. Summary
An experiment is a process that, when performed, results in one and only one of
many observations. The observations are called the outcomes of an experiment.
The collection of all possible outcomes of an experiment is called a sample space.
A sample space is denoted by S. Therefore, the sample space for an experiment
of inspecting a computer fan is written as: S = {good, def ective} or for tossing
a coin twice is S = {0, 1, 2}for the number of heads obtained.
For three or more events, it is easy to construct a probability space than a
contingency table, for contingency tables are only practicable for two events!

98
STA 2100 Probability and Statistics I

9.1. Revision questions or guidelines


1. A box contains three pieces of identical mouse pads of different colors: Red
(R), White (W) and Blue (B). Two pieces of pads are randomly picked from
the box.

(a) What is the sample space if the picked piece is not replaced?
(b) What is the sample space if the picked piece is replaced?

2. If 85% of people have a bowl of cereal for breakfast, 60% of people have
toast, and 50% of people have both cereal and toast for breakfast, what
percentage of people have neither cereal nor toast for breakfast?

3. Do you think the following pairs of events are independent or dependent?


Explain.

(a) E: An individual has a high IQ


F: An individual is accepted for a University place
(b) A: A student plays table tennis
B: A student is good at maths
(c) E1: An individual has a large outstanding credit card debt
E2: An individual is allowed to extend his bank overdraft

4. A company manufactures a device which contains three components A, B


and C. The device fails if any of these components fail and the company offers
to its customers a full money-back warranty if the product fails within one
year. The company has assessed the probabilities of each of the components
lasting at least a year as 0.98, 0.99 and 0.95 for A, B and C respectively. The
three components within a single device are considered to be independent.
Consider a single device chosen at random. Calculate the probability that

(a) all three components will last for at least a year;


(b) the device will be returned for a refund

5. Two events A and B are such that P (A) = 1/4, P (A|B) = 1/2 and
P (B|A) = 2/3. (a) Are A and B independent? (b) Are A and B Mutually
exclusive? (c) Find P (A ∩ B) (d) Find P (B).

99
STA 2100 Probability and Statistics I

6. A group of 50 BIT students were asked which of the three Computer Science
Journals, A, B or C they read. The results showed that 25 read A, 16 read
B, 14 read C. 5 read both A and B, 4 read both B and C, 6 read both C
and A and 2 read all three.

(a) Represent these data on a Venn diagram


(b) Find the probability that a person selected at random from this group
reads
i. At least one of the Journals
ii. Only one of the Journals

100
STA 2100 Probability and Statistics I

Learning Activities
• Two fair six faced dice are rolled. Let T be the sum is 10 and B be the score
is double. Construct a tree diagram with the first branch being T, and also
another tree diagram with the first branch being D.

101
STA 2100 Probability and Statistics I

Chapter 7
Discrete Probability Distribution

Learning outcomes
Upon completing this topic, you should be able to:

• Define probability distributions.

• Differentiate between discrete and continuous variables and probability dis-


tributions.

• Explain the discrete probability distributions considered.

• Calculate the expected value from probability distribution.

• Evaluate variance from probability distribution.

102
STA 2100 Probability and Statistics I

1. Introduction
An important part of any analysis of decision making under stochastic conditions
is a probability distribution.Probability distributions state the relative frequency of
occurrence of a set of mutually exclusive events. Probability distributions can be
univariate or multivariate. They give the relative frequency of observing a particular
event.We saw that surveys can be used to get information on population quanti-
ties.In most cases, it is not possible to measure the variables on every member of
the population and so some sampling scheme is used. This means that there is
uncertainty in our conclusions. Before we can make inferences about populations,
we need a language to describe the uncertainty we find when taking samples from
populations.This can be done using probability distributions.

2. Random Variable
In many experiments the outcomes of the experiment can be assigned numerical
values. For instance, if you roll a die, each outcome has a value from 1 through
6. If you ascertain the midterm test score of a student in your class, the outcome
is again a number.
A random variable is just a rule that assigns a number to each outcome of
an experiment. These numbers are called the values of the random variable. We
often use letters like X, Y and Z to denote a random variable. Here are some
examples

1. Experiment: Select a mutual fund; X = the number of companies in the


fund portfolio. The values of X are 2, 3, 4,......

2. Experiment: Select a soccer player; Y = the number of goals the player


has scored during the season. The values of Y are 0, 1, 2, 3,.........

Random variables may be discrete or continuous.


A discrete random variable can take on only specific, isolated numerical
values, like the outcome of a roll of a die, or the number of shillings in a randomly
chosen bank account.

• Discrete random variables that can take on only finitely many values (like
the outcome of a roll of a die) are called finite random variables.

103
STA 2100 Probability and Statistics I

• Discrete random variables that can take on an unlimited number of values


(like the number of stars estimated to be in the universe) are infinite discrete
random variables.

A continuous random variable, on the other hand, can take on any values
within a continuous range or an interval, like the temperature, or the height of an
athlete in centimeters, the yield of maize from an acre of land, the weight of a
laptops in a supplier’s store.

3. Discrete probability distribution


Given a random variable X, it is natural to look at certain events, for instance,
the event that X = 2. By this, we mean the event consisting of all outcomes that
have an assigned X value of 2. To illustrate this let’s look at an example: suppose
we throw a pair of fair dice, and take X to be the sum of the numbers facing up
as 2. Then the event that X = 2 is (1; 1) . The event that X = 3 is (2; 1);
(1; 2). The event that X = 4 is (3, 1), (2, 2), (1, 3) and so on. Each of these
events has a certain probability associated with each of them. For instance, the
probability that X = 4 is 36 3 1
= 12 because the event in question consists of three
of the thirty-six possible (equally likely) outcomes.
For the probability distribution of a discrete random variable, we need to know
both the set of values of the random variables and the probability with which it
takes each of the values.
The function that is responsible for allocating probabilities, P (X = x), is
known as the probability mass function of X, sometimes abbreviated as PMF of
X.
A PMF/PDF can either be a list of probabilities individually or a summary of
them in a formula.
Example. You toss a coin 4 times. There are 16 possible outcomes:
HHHH;HHHT;HHTH;HHTT;HTHH;HTHT;HTTH;HTTT; THHH;
THHT; THTH; THTT; TTHH; TTHT; TTTH; TTTT (H = heads; T = tails)
Now take X = number of heads. Here are some of the probabilities: P(X
= 0) = 16
1
. Only one of the 16 possible outcomes has X = 0 (no head); namely
TTTT, P(X = 1) = 16 4
Four of the 16 possible outcomes have X = 1; namely
HTTT; THTT; TTHT and TTTH:

104
STA 2100 Probability and Statistics I

The distinction between the capital letter X and small letter x is important;
X stands for the random variable in question, whereas x stands for a specific value
or outcome.
Or
Example. Two tetrahedral dice, with faces labeled 1,2,3,4 are thrown and the
score noted, where the score is the sum of the two numbers on which the dice
land. Find the probability density function (pdf ) of X, where X is the random
variable ’the score where two dice are thrown’
Solution:
x 2 3 4 5 6 7 8
P (X = x) 1/16 2/16 3/16 4/16 3/16 2/16 1/16
Since, 1
P
P (X = x) = 16 (1 + 2 + 3 + 4 + 3 + 2 + 1) = 1 
Thus X is a random variable.

3.1. Finding probabilities

Example. The pdf of a discrete random variable Y is given by P (Y = y) = Cy 2


for y = 0, 1, 2, 3, 4. Given that C is a constant, find the value of C.
Solution
y 0 1 2 3 4
P (Y = y) 0 c 4c 9c 16c
Since Y is a random variable,
P
P (Y = y) = 1,
then c + 4c + 9c + 16c = 1 =⇒ c = 1/30
Example. The discrete random variable T has the probability distribution as shown
below;
T -3 -2 -1 0 1
P (T = t) 0.1 0.25 0.3 0.15 d
Find:
(a) the value of d
P
P (T = t) = 1
0.1 + 0.25 + 0.3 + 0.15 + d = 1 =⇒ d = 0.2
(b) P (−3 ≤ T ≤ 0)
P (−3 ≤ T ≤ 0) = P (T = −3) + P (T = −2) + P (T = −1) + P (T = 0) =
0.65

105
STA 2100 Probability and Statistics I

(c) P (T > −1)


P (T > −1) = P (T = 0) + P (T = 1) = 0.35
(d) P (−1 < T < 1)
P (−1 < T < 1) = P (T = 0) = 0.15

3.2. Expectation
E(X) read as 0 E of X 0 gives the average or typical value of X, known as the
expected value or expectation of X. X represents the random variable.
The mean of a discrete random variable is the mean of its probability distribu-
tion. This mean is also called the expected value or population mean of a random
variable and it indicates its average or central value.
This is the value we expect to observe per repetition, if we repeat an experiment
several times. This value is a useful summary of the variable’s distribution.
Stating the expected value gives a general impression of some random variable
without giving full details of its probability distribution. The expected value of a
random variable X is symbolized by E(X) or µ, read as “E of X” and is denoted
as;

X
E [X] = xP (X = x)

Example. A random variable X has probability distribution as shown below.


Find the expectation, E(X).
x -2 -1 0 1 2
P (X = x) 0.3 0.1 0.15 0.4 0.05
Solution:
P
E(X) = xP (X = x)
= (−2 ∗ 0.3) + (−1 ∗ 0.1) + (0 ∗ 0.15) + (1 ∗ 0.4) + (2 ∗ 0.05)
= −0.2


Exercise 17. A fruit machine consists of three windows which operate indepen-
dently. Each window shows pictures of fruits: Lemon, Apples, Cherries or Bananas.
The probability that a window shows a particular fruit is as follows:
P (Lemon) = 0.4
P (Cherries) = 0.2

106
STA 2100 Probability and Statistics I

P (Apple) = 0.1
P (Cherries) = 0.3
The rule for playing the game on the fruit machine is as follows: It costs Kshs
10 to play the game. A player will win Kshs 100 if he/she gets three Apples in a
row, Kshs 50 if he/she gets three Cherries in a row, Kshs 40 if he/she gets three
Lemons in a row and Kshs 80 if he/she gets two Apples and a Cherry in the game.
The order in which the fruits appear is not important. Based on this information,
would you expect to gain or lose if you play the game?

• Expectation of a function of a discrete random variable


The definition of expectation can be expanded to any function of X such as 10X,
X 2 , X − 4 e.t.c. In general, g(X) is any function of the discrete random variable
X, then E(g(X)) = g(X) ∗ P (X = x)
P

For instance,
P
E(10X) = 10xP (X = x)
E(X 2 ) = x2 P (X = x)
P

E( X1 ) =
P1
x
P (X = x)
P
E(X − 4) = (x − 4)P (X = x)
Example. The random variable X has a distribution function shown below.
x 1 2 3
P (X = x) 0.1 0.6 0.3
Find;
i) E(X)
P
E(X) = xP (X = x) = (1 ∗ 0.1) + (2 ∗ 0.6) + (3 ∗ 0.3) = 2.2
ii) E(3)
P
E(3) = 3P (X = 3) = (3 ∗ 0.1) + ... + (3 ∗ 0.3) = 3
iii) E(5X)
P
E(5X) = 5xP (X = x) = (5 ∗ 0.1) + ... + (15 ∗ 0.3) = 11,
Notice that 5E(X) = 5 ∗ 2.2 = 11
In general, for two constants a and b;
E(a) = a
E(aX) = aE(X)
E(aX + b) = aE(X) + b

107
STA 2100 Probability and Statistics I

Exercise 18. X is the number of heads obtained when two coins are tossed.
Find (a) the expected number of heads (b)E(X 2 ) (c) E(X 2 − X)

3.3. Variance
The variance of a random variable is a non-negative number which gives an idea
of how widely spread the values of the random variable are likely to be; the larger
the variance, the more scattered the observations on average.
Stating the variance gives an impression of how closely concentrated round the
expected value the distribution is; it is a measure of the ’spread’ of a distribution
about its average value. Variance is symbolized by V (X) or V ar(X) or σ 2 and is
defined as:

var(X) = E[(X − E[X])2 ]

or var(X) = E(X − µ)2


which is equivalent to

var(X) = E[X 2 ] − (E[X])2

or var(X) = E(X 2 ) − µ2
Example. Find the variance of the following distribution
x 1 2 3 4 5
P (X = x) 0.1 0.3 0.2 0.3 0.1
var(X) = x2 P (X = x) − xP (X = x)
P P

= (12 ∗ 0.1) + (22 ∗ 0.3) + ... + (52 ∗ 0.1) − [(1 ∗ 0.1) + ... + (0.1 ∗ 5)]2
= 10.4 − 9 = 1.4
In general, if a and b are any two constants, then;
var(a) = 0
var(aX) = a2 var(X)
var(aX + b) = a2 var(X)

3.4. The cumulative distribution function, F (X)

In a frequency distribution, the cumulative frequency are obtained by summing all


the frequencies up to a part value. In the same way, in a probability distribution,

108
STA 2100 Probability and Statistics I

the probability up to a certain value are summed to give a cumulative probability.


The cumulative probability function is denoted as F (x)

Example. Consider the following probability distribution


x 1 2 3 4 5
P (X = x) 0.05 0.4 0.3 0.15 0.1
From the above table,
F (1) = P (X ≤ 1) = 0.05
F (2) = P (X ≤ 2) = P (X = 1) + P (X = 2) = 0.05 + 0.4 = 0.45
.
.
F (5) = P (X ≤ 5) = 1.0
Thus, cumulative distribution is given by
x 1 2 3 4 5
P (X = x) 0.05 0.45 0.75 0.9 1.0
Generally, for a discrete random variable X, the cumulative distribution func-
tion is given by F (x) where F (x) = P (X ≤ x)

4. Special discrete probability distributions


The following are some of the well known examples of discrete distributions:
Bernoulli, Binomial, Uniform, Geometric, Poisson, Hypergoemetric distributions,
amongst others. In this lesson, we only focus on two of these distributions, that
is Bernoulli and Binomial distributions.

4.1. Bernoulli distribution


A random variable X has a Bernoulli distribution with parameter p if it can assume
a value of 1 with a probability of p and the value of 0 with a probability of (1 − p).
The random variable X is also known as a Bernoulli variable with parameter p and
has the following probability mass function:
P (X = x) = p(1 − p)
The mean of a random variable X that has a Bernoulli distribution with pa-
rameter p is
E(X) = 1(p) + 0(1 − p) = p

109
STA 2100 Probability and Statistics I

The variance of X is pq
Example. A random variable whose value represents the outcome of a coin toss
(1 for heads, 0 for tails, or vice-versa) is a Bernoulli variable with parameter p,
where p is the probability that the outcome corresponding to the value 1 occurs.
For an unbiased coin, where heads or tails are equally likely to occur, p = 0.5.

4.2. Binomial distribution


• The binomial distribution is one of the most popular distributions.

• The origin of binomial distribution lies in Bernoulli’s trials.

• A Bernoulli trial is an experiment having only two possible outcomes, i.e.


success or failure. For example when one flips a coin, either head or tail
will show on the upper face, the sex of an expected baby will either be male
or female, a person will either be healthy or sick, a computer fan will either
be working or defective etc

• A binomial experiment is a probability experiment that satisfies the following


conditions.

– The experiment is repeated for a fixed number of trials, where each


trial is independent of other trials.
– There are only two possible outcomes of interest for each trial. The
outcomes can be classified as a success (S) or as a failure (F).
– The probability of a success P (S) is the same for each trial.
– The random variable x counts the number of successful trials.

Example. Decide whether the experiment is a binomial experiment. If it is, specify


the values of n, p, and q, and list the possible values of the random variable x. If
it is not a binomial experiment, explain why.
Experiment: You randomly select a card from a deck of cards, and note if
the card is an Ace. You then put the card back and repeat this process 8 times.
Solution: This is a binomial experiment. Each of the 8 selections represent
an independent trial because the card is replaced before the next one is drawn.

110
STA 2100 Probability and Statistics I

There are only two possible outcomes: either the card is an Ace or not. Therefore,
n = 8, p = 4/52 = 1/13, q = 12/13 and x = 0, 1, 2, 3, 4, 5, 6, 7, 8
In the next few sections of the lesson, we discuss the binomial distribution and
mainly showing how to solve a number of problems.We also define the Binomial
probability function.
Binomial Probability Formula
In a binomial experiment, the probability of exactly x
successes in n trials is
P (x )  nC x p xq n x  n! p xq n x .
(n  x )! x !
Example:
A bag contains 10 chips. 3 of the chips are red, 5 of the chips are
white, and 2 of the chips are blue. Three chips are selected, with
replacement. Find the probability that you select exactly one red chip.
p = the probability of selecting a red chip  3  0.3
10
q = 1 – p = 0.7 P (1)  3C1(0.3)1(0.7)2
n=3  3(0.3)(0.49)
x=1  0.441
Athiany, HKO 28

111
STA 2100 Probability and Statistics I

Binomial Probability Distribution


Example:
A bag contains 10 chips. 3 of the chips are red, 5 of the chips are
white, and 2 of the chips are blue. Four chips are selected, with
replacement. Create a probability distribution for the number of red
chips selected.
p = the probability of selecting a red chip  3  0.3
10
q = 1 – p = 0.7
n=4 x P (x)
x = 0, 1, 2, 3, 4 0 0.240 The binomial
1 0.412 probability
2 0.265 formula is used
3 0.076 to find each
4 0.008 probability.

Athiany, HKO 29

112
STA 2100 Probability and Statistics I

Finding Probabilities
Example:
The following probability distribution represents the probability of
selecting 0, 1, 2, 3, or 4 red chips when 4 chips are selected.
x P ( x) a.) Find the probability of selecting no
0 0.24 more than 3 red chips.
1 0.412
2 0.265
3 0.076 b.) Find the probability of selecting at
4 0.008 least 1 red chip.
a.) P (no more than 3) = P (x  3) = P (0) + P (1) + P (2) + P (3)
= 0.24 + 0.412 + 0.265 + 0.076 = 0.993
b.) P (at least 1) = P (x  1) = 1 – P (0) = 1 – 0.24 = 0.76
Complement
Athiany, HKO 30

113
STA 2100 Probability and Statistics I

Graphing Binomial Probabilities


Example:
The following probability distribution represents the probability of
selecting 0, 1, 2, 3, or 4 red chips when 4 chips are selected. Graph
the distribution using a histogram.
x P ( x) P (x)
0 0.24 0.5 Selecting Red Chips
1 0.412

Probability
0.4
2 0.265
0.3
3 0.076
4 0.008 0.2

0.1
0 x
0 1 2 3 4
Number of red chips
Athiany, HKO 31

114
STA 2100 Probability and Statistics I

Mean, Variance and Standard Deviation


Population Parameters of a Binomial Distribution
Mean: μ  np
Variance: σ 2  npq
Standard deviation: σ  npq
Example:
One out of 5 students at a local college say that they skip breakfast in
the morning. Find the mean, variance and standard deviation if 10
students are randomly selected.
n  10 μ  np σ 2  npq σ  npq
p  1  0.2  10(0.2)  (10)(0.2)(0.8)  1.6
5
q  0.8 2  1.6  1.3

Athiany, HKO 32

5. Summary
A discrete probability distribution lists each possible value the random variable can
assume, together with its probability. A probability distribution must satisfy the
following conditions.

• The probability of each value of the discrete random variable is between 0


and 1, inclusive.

• The sum of all the probabilities is 1.

The mean of a discrete random variable is the mean of its probability distribution.
This mean is also called the expected value or population mean of a random
variable and it indicates its average or central value. This is the value we expect
to observe per repetition, if we repeat an experiment several times. This value is
a useful summary of the variable’s distribution. Stating the expected value gives
a general impression of some random variable without giving full details of its
probability distribution.
The variance of a random variable is a non-negative number which gives an
idea of how widely spread the values of the random variable are likely to be; the
larger the variance, the more scattered the observations on average. Stating the
variance gives an impression of how closely concentrated round the expected value
the distribution is; it is a measure of the ’spread’ of a distribution about its
average value
Guidelines for Constructing a Discrete Probability Distribution

• Let x be a discrete random variable with possible outcomes x1 , x2 , . . . , xn .

115
STA 2100 Probability and Statistics I

• Make a frequency distribution for the possible outcomes.

• Find the sum of the frequencies.

• Find the probability of each possible outcome by dividing its frequency by


the sum of the frequencies.

• Check that each probability is between 0 and 1 and that the sum is 1.

As indicated above, in a binomial experiment, the probability of x successes in n


trials can be obtained by the following formula:
P (X = x) =n Cx P x (1 − P )n−x , for x = 0, 1, 2, 3, ...n
From any one of the discrete distributions, we can create a probability distri-
bution of the random variable. Using the same distributions, we can then find the
expected value, variance, standard deviation and other measures.
We have seen that the expected value (mean) of a random variable can
be obtained from the probability distribution by using the formula; E(X) =
xP (X = x). This implies that after creating the probability distribution table
P

for a Bernoulli/binomial experiment, we can then obtain these measures. How-


ever, we have also seen that for the Bernoulli/binomial distribution, the mean and
variance can also be obtained using the parameters. That is:
Bernoulli distribution, mean = p, variance = pq
Binomial distribution, mean = np, variance = npq

6. Revision questions or guidelines


1. A computer firm claims that only 5% of all new computers delivered to its
customers have been infected by a virus. If the firm has 15 new computers
to deliver to its customers, find the following probabilities:

a) None reach them has been infected by the virus.


b) One or more have been infected by the virus.
c) Two or more have been infected by the virus

2. Are they continuous or discrete variables

(a) Number of misspelled words

116
STA 2100 Probability and Statistics I

(b) Amount of water through Hoover dam in a day


(c) How late a student is for class
(d) Number of bacteria in a water sample
(e) Amount of carbon monoxide produced from burning a gallon of un-
leaded gas
(f) The number of workers in the institution
(g) Number of checkout lanes at grocery store
(h) Amount of time waiting in line at grocery store

3. Suppose a fair six sided die is tossed 5 times. What is the probability of
getting exactly 2 fours?

4. Explain the meaning of the probability distribution of a discrete random


variable. Give an example of such a probability distribution. What are the
ways to present the probability distribution of a discrete random variable?

5. Approximately 25% of students at a local high school participate in after-


school sports all four years of high school. A group of four seniors is randomly
chosen. Let X be a random variable that represents the number of seniors
in the sample who have participated in sports all four years.

(a) Use the binomial probability formula to complete the probability distri-
bution .

X 0 1 2 3 4
P (X = x) 0.316 0.422 ? 0.047 ?

(a) Which value of X is most likely? Which value of X is least likely?


(b) On average, how many seniors in the sample would you expect to have
played sports all four years? In other words, what is the mean of X?
(c) What is the standard deviation of X?
(d) What is the probability that all four seniors in the sample played sports
all four years?

117
STA 2100 Probability and Statistics I

(e) What is the probability that two or fewer seniors in the sample played
sports all four years?
(f) If a new random variable Y = X 2 + 2X, use the above table to obtain
E(Y ) and sd(Y )

6. According to a study carried out by a computer company in Uganda, the


probability that a randomly selected laptop fan will last longer than 1.2 years
is 0.15. What is the probability that out of six randomly selected fans: (a)
Exactly two last longer than 1.2 years (b)None lasts longer than 1.2 years?

118

You might also like