STA2100 Probability
STA2100 Probability
Chapter 6
Introduction to Probability
Learning outcomes
Upon completing this topic, you should be able to:
• Define probability
• Calculate probabilities
79
STA 2100 Probability and Statistics I
1. Introduction
Probability is the language we use to model uncertainty. We all intuitively under-
stand that few things in life are certain. There is usually an element of uncertainty
or randomness around outcomes of our choices. For instance in business this un-
certainty can make all the difference between a good investment and a poor one.
Hence an understanding of probability and how we might incorporate this into
our decision making processes is important. In this lesson, we look at the logical
basis for how we might express a probability and some basic rules that probabilities
should follow. In subsequent lessons, we look at how we can use probabilities to
aid decision making. It is advisable that you revisit the set theory lesson to help
you understand this lesson better.
2. Definitions
The probability of a specific event is a mathematical statement about the likelihood
that it will occur. All probabilities are numbers between 0 and 1, inclusive; a
probability of 0 means that the event will never occur, and a probability of 1
means that the event will always occur. We often use the letter P to represent a
probability. For example, P (Rain) would be the probability that it rains. In other
cases P r is used to represent a probability. It is important to understand some
terms used in probability. They include:
Probability Experiment
An experiment is an activity where we do not know for certain what will happen,
but we will observe what happens. For example:
• We may ask someone whether or not they have used our IT products.
• Rolling a die and observing the number that is rolled is a probability experi-
ment.
80
STA 2100 Probability and Statistics I
Outcome
An outcome, or elementary event, is one of the possible things that can happen.
For example, suppose that we are interested in the shoe size of the next customer
to come into a shoe shop. Possible outcomes include “eight”, “twelve”, “nine and
a half” and so on. In any experiment, one, and only one, outcome occurs.
The result of a single trial in a probability experiment is the outcome.
Sample space
The sample space is the set of all possible outcomes. For example, it could be the
set of all shoe sizes or the sample space when rolling a die has six outcomes. {1,
2, 3, 4, 5, 6}
Event
An event consists of one or more outcomes and is a subset of the sample space.
An event is usually denoted using a capital (uppercase) letter. For example “the
shoe size of the next customer is less than 9” is an event. It is made up of all of
the outcomes where the shoe size is less than 9. Of course an event might contain
just one outcome. We can set a letter say E to represent this event.
For instance, A die is rolled. Event A is rolling an even number.
A simple event is an event that consists of a single outcome.
Example. A die is rolled. Event A is rolling an even number. This is not a simple
event because the outcomes of event A are {2, 4, 6}.
3. Rules of probability
• Probabilities are usually expressed in terms of fractions or decimal numbers
or percentages.
• All probabilities are measured on a scale ranging from zero to one. The
probabilities of most events lie strictly between zero and one. An event with
probability zero is an impossible event and an event with probability one is
said to be a certain event.
81
STA 2100 Probability and Statistics I
• The collection of all possible outcomes, that is the sample space, has a
probability of 1. For example, if an experiment consists of only two outcomes
– success or failure – then the probability of either a success or a failure is
1. That is P(success or failure) = 1.
0
• With respect to an event E, the complementary event, denoted as E c or E
or ∼ E (read as “E prime”), is the negation of the event E. For example, if
we consider the event that it will rain tomorrow. The complement of this
event is the event that it will not rain tomorrow.We should note that the
probability of an event E and its complement is equal to 1 i.e.
P (E) + P (E c ) = 1
Example. There are 5 red chips, 4 blue chips, and 6 white chips in a basket. Find
the probability of randomly selecting a chip that is not blue.
Solution: P (selecting a blue chip) = 4/15 = 0.267
implying P (not selecting a blue chip) = 1 − 0.267 = 0.733
• Two or more events are said to be mutually exclusive if both cannot occur
simultaneously. In the example above, the outcomes success and failure are
mutually exclusive because both cannot occur at the same time.Two events
A and B are mutually exclusive if A ∩ B = 0.
Example. Let A = the event that it is Monday, B = the event that it is Tuesday,
and C = the event that it is the year 2014. A and B are mutually exclusive events,
since it cannot be both Monday and Tuesday at the same time. A and C are not
mutually exclusive events, since it can be a Monday in the year 2014.
• Two events are said to be independent if the occurrence of one does not affect
the probability of the second occurring. If two events are independent, then
the probability that both will occur is equal to the product of their individual
probabilities. In other words, if A and B are independent, then
82
STA 2100 Probability and Statistics I
P (A ∩ B) = P (A) × P (B)
Example. If you toss a coin and look out of the window, it would be reasonable
to suppose that the events “get heads” and “it is raining” would be independent.
However, not all events are independent.
83
STA 2100 Probability and Statistics I
The larger the experiment, the closer this probability is to the “true” probability.
The frequentist view of probability regards probability as the long run relative
frequency (or proportion). So, in the defects example, the “true” probability of
getting a defective item is the proportion obtained in a very large experiment
(strictly an infinitely long sequence of trials). In the frequentist view, probability is
a property of nature and, since, in practice, we cannot conduct infinite sequences
of trials, in many cases we never really know the “true” values of probabilities. We
also have to be able to imagine a long sequence of “identical” trials. This does not
seem to be appropriate for “one-off” experiments like the launch of a new product.
For these reasons (and others) some people prefer the subjective or Bayesian view
of probability.
Example. A travel agent determines that in every 50 reservations she makes, 12
will be for a cruise. What is the probability that the next reservation she makes
will be for a cruise?
Solution:
12
p(cruise) = 50
= 0.24
For instance, Sally flips a coin 20 times and gets 3 heads. The empirical probability
84
STA 2100 Probability and Statistics I
is 20
3
. This is not representative of the theoretical probability which is 12 .
As the number of times Sally tosses the coin increases, the law of large
numbers indicates that the empirical probability will get closer and closer
to the theoretical probability. This is referred to as the Law of Large
Numbers.
4.3. Subjective
We are probably all intuitively familiar with this method of assigning probabilities.
When we board an Airplane, we judge the probability of it crashing to be sufficiently
small that we are happy to undertake the journey. Similarly, the odds given by
bookmakers on a football match reflect people’s beliefs about which team will win.
This probability does not fit within the frequentist definition as the match cannot
be played more than once.
One potential difficulty with using subjective probabilities is that it is sub-
jective. So the probabilities which two people assign to the same event can be
different. This becomes important if these probabilities are to be used in deci-
sion making. For example, if you were deciding whether to launch a new product
and two people had very different ideas about how likely success or failure of this
product was, then the decision to go ahead could be controversial.
If both individuals assessed the probability of success to be 0.8 then the decision
to go ahead could easily be based on this belief. However, if one said 0.8 and the
other 0.3, then the decision is not straightforward. We would need a way to
reconcile these different positions.
Subjective probability is based on personal judgment, accumulation of knowl-
edge and experience. For instance, medical doctors sometimes assign subjective
probabilities to the length of life expectancy of people with breast cancer.
5. Laws of probability
5.1. Multiplication law
The probability of two independent events E1 and E2 both occurring can be written
as
E2) = P (E1) × P (E2), and this is known as the multiplication law
T
P (E1
85
STA 2100 Probability and Statistics I
of probability.
For example, the probability of throwing a six followed by another six on two
rolls of a die is calculated as follows. The outcomes of the two rolls of the die are
independent. Let E1 denote a six on the first roll and E2 a six on the second roll.
Then
P (two sixes) = P (E1 and E2)
P (E1) × P (E2) = 1
6
× 61 )= 1
36
86
STA 2100 Probability and Statistics I
A and B
A
B A B
Athiany H,K O 18
87
STA 2100 Probability and Statistics I
A B
1
4
2
88
STA 2100 Probability and Statistics I
A J 9 2 B
3 10
J J A 7
K 4
J 5
6Q8
89
STA 2100 Probability and Statistics I
90
STA 2100 Probability and Statistics I
Athiany H,K O 22
91
STA 2100 Probability and Statistics I
6. Conditional probability
So far we have only considered probabilities of single events or of several indepen-
dent events, like two rolls of a die. However, in reality, many events are related.
For example, the probability of it raining in 5 minutes time is dependent on whether
or not it is raining now. We need a mathematical notation to capture how the
probability of one event depends on other events taking place. We do this as
follows. Consider two events A and B. We write P (A|B) for the probability of
A given that B has already happened. We describe P (A|B) as the conditional
probability of A given B.
We can calculate these conditional probabilities using the formula
P (A and B)
P (A|B) =
P (B)
that is, in terms of the probability of both events occurring, P(A and B), and the
probability of the event that has already taken place, P(B).
92
STA 2100 Probability and Statistics I
Exercise 15. Given the events A and B are independent, copy and complete the
following contingency table. The results can b obtained as follows:
93
STA 2100 Probability and Statistics I
0
A A
3
B 20
y u
0
B x z v
1
4
t 1
7. Tree Diagrams
In some cases, especially where there are three or more different events being
considered, tree diagrams are an alternative to the contingency tables.
Tree diagrams or probability trees are simple clear ways of presenting proba-
bilistic information. Let us first consider a simple example in which a fair coin is
tossed twice. Suppose we are interested in the probability that we get a head on
both tosses. This probability can be calculated as
P(Head and Head) = P(Head on 1st toss) × P(Head on 2nd toss|head on 1st
toss)
This example can be represented as a tree diagram in which experiments are
represented by circles (called nodes) and the outcomes of the experiments as
branches. The branches are annotated by the probability of the particular out-
come.
Example. In a large farm, 20% of a particular kind of flower is red and 80% is
white. The farmer decides to take samples of flowers from the production of this
particular kind. What is the probability that he obtains;
(a) One or two red flowers in a sample of two?
(b) At least two red flowers in a sample of three?
Solution:
This information can be represented in the tree diagram as follows.
94
STA 2100 Probability and Statistics I
Start
1/5 4/5
R W
1/5 4/5 1/5 4/5
R W R W
R W R W R W
W R
Resulting in…
RRR RRW RWR RWW WRR WRW WWR WWW
In this problem, we assume that probability of these events remain the same
even after picking a small number of flowers from the production line.
(a) P (RR) + P (RW ) + P (W R) this represents one or two red flowers
But
P (RR) = 1/5 ∗ 1/5 = 1/25
P (RW ) = 1/5 ∗ 4/5 = 4/25
P (W R) = 4/5 ∗ 1/5 = 4/25
=⇒P (RR) + P (RW ) + P (W R) = 1/25 + 4/25 + 4/25 = 9/25
Alternatively,
P (one or two red f lowers) = 1 − P (no red f lower) = 1 − P (W W )
= 1 − (4/5 ∗ 4/5) = 1 − 16/20 = 9/25
(b) P (RRR) + P (RRW ) + P (RW R) + P (W RR)
= (1/5)3 + (1/5 ∗ 1/5 ∗ 4/5) + (1/5 ∗ 4/5 ∗ 1/5) + (4/5 ∗ 1/5 ∗ 1/5) = 13/125
Example. A box has 6 blue beads and 4 red beads. Three beads are drawn at
random (without replacement). What is the probability that: (a) they are all blue
(b) there are exactly two blue balls (c) there is at least one blue bead
Solution:
95
STA 2100 Probability and Statistics I
In the case of draws made without replacement, and tree diagrams being com-
plex/many branches, we can use the combinations for quick computation of prob-
abilities
(a) For this case, total number of ways of selecting 3 beads from 10 is
10
C3 = 120
Selecting 3 from 6 is 6 C3 = 20
Therefore, P (All blue) = 6 C3 /10 C3 = 20/120 = 1/6
(b) Selecting 2 from 6= 6 C2 = 15
selecting 1 from 4 =4 C1 = 4
Therefore, exactly 2 red will be 15∗4
120
= 1/2
(c) 1 − P (all red) = 1 − 4 C3 /10 C3 = 1 − 4/120 = 29/30
8. Bayes Theorem
Suppose we know P (A),P (∼ A) and also P (B/A) and P (B/ ∼ A), then we can
represent the first branches of a tree diagram and those of B and ∼ B in the
second branches. Can we then determine P (A/B)?
This problem can be solved by using Thomas Bayes theorem. Bayes was
an English Mathematician and his theorem has given us a fundamental result of
statistical inference.
Mathematically, Bayes theorem gives the relationship between probabilities of
A and B, P (A) and P (B) and the conditional probabilities of A given B and
Bgiven A; denoted by P (A/B), P (B/A)
Commonly, Bayes theorem is;
Simple P (A/B) = P (B/A)P
P (B)
(A)
f or P (B) 6= 0
(The meaning depends on the interpretation of probability ascribed to the
terms)
Extended P (A/B) = P (B/A)PP(A)+P
(B/A)P (A)
(B/A0 )P (A0 )
Example. Kamau has two gardeners, David and James. David comes on 1/3 of
the occasions and James 2/3 of the occasions. There is a probability of 1/10
that David will forget to water the flowers and a probability of 1/2 that James
will forget to water the flowers. One day, Kamau had to leave the house before
the gardener arrived. On his return, he found that the gardener had come and
gone, and also that the flowers were not watered. What is the probability that it
is James who came that day?
96
STA 2100 Probability and Statistics I
Solution:
Let
D: David comes
J: James comes
W: Flowers watered
The tree diagram will then look like this
W’
1/2
J
2/3
W
D
9/10
W
Exercise 16. A certain video store uses blank tapes bought from two sources,
say source A and source B. Suppose that the owner of the video store buys 30%
of the tapes from A and its is known that 5% of the video tapes are defective,
then buys 70% from source B when 20% are usually defective. On recording some
movies on the tapes, the owner discovers that certain tape is defective. What is
97
STA 2100 Probability and Statistics I
9. Summary
An experiment is a process that, when performed, results in one and only one of
many observations. The observations are called the outcomes of an experiment.
The collection of all possible outcomes of an experiment is called a sample space.
A sample space is denoted by S. Therefore, the sample space for an experiment
of inspecting a computer fan is written as: S = {good, def ective} or for tossing
a coin twice is S = {0, 1, 2}for the number of heads obtained.
For three or more events, it is easy to construct a probability space than a
contingency table, for contingency tables are only practicable for two events!
98
STA 2100 Probability and Statistics I
(a) What is the sample space if the picked piece is not replaced?
(b) What is the sample space if the picked piece is replaced?
2. If 85% of people have a bowl of cereal for breakfast, 60% of people have
toast, and 50% of people have both cereal and toast for breakfast, what
percentage of people have neither cereal nor toast for breakfast?
5. Two events A and B are such that P (A) = 1/4, P (A|B) = 1/2 and
P (B|A) = 2/3. (a) Are A and B independent? (b) Are A and B Mutually
exclusive? (c) Find P (A ∩ B) (d) Find P (B).
99
STA 2100 Probability and Statistics I
6. A group of 50 BIT students were asked which of the three Computer Science
Journals, A, B or C they read. The results showed that 25 read A, 16 read
B, 14 read C. 5 read both A and B, 4 read both B and C, 6 read both C
and A and 2 read all three.
100
STA 2100 Probability and Statistics I
Learning Activities
• Two fair six faced dice are rolled. Let T be the sum is 10 and B be the score
is double. Construct a tree diagram with the first branch being T, and also
another tree diagram with the first branch being D.
101
STA 2100 Probability and Statistics I
Chapter 7
Discrete Probability Distribution
Learning outcomes
Upon completing this topic, you should be able to:
102
STA 2100 Probability and Statistics I
1. Introduction
An important part of any analysis of decision making under stochastic conditions
is a probability distribution.Probability distributions state the relative frequency of
occurrence of a set of mutually exclusive events. Probability distributions can be
univariate or multivariate. They give the relative frequency of observing a particular
event.We saw that surveys can be used to get information on population quanti-
ties.In most cases, it is not possible to measure the variables on every member of
the population and so some sampling scheme is used. This means that there is
uncertainty in our conclusions. Before we can make inferences about populations,
we need a language to describe the uncertainty we find when taking samples from
populations.This can be done using probability distributions.
2. Random Variable
In many experiments the outcomes of the experiment can be assigned numerical
values. For instance, if you roll a die, each outcome has a value from 1 through
6. If you ascertain the midterm test score of a student in your class, the outcome
is again a number.
A random variable is just a rule that assigns a number to each outcome of
an experiment. These numbers are called the values of the random variable. We
often use letters like X, Y and Z to denote a random variable. Here are some
examples
• Discrete random variables that can take on only finitely many values (like
the outcome of a roll of a die) are called finite random variables.
103
STA 2100 Probability and Statistics I
A continuous random variable, on the other hand, can take on any values
within a continuous range or an interval, like the temperature, or the height of an
athlete in centimeters, the yield of maize from an acre of land, the weight of a
laptops in a supplier’s store.
104
STA 2100 Probability and Statistics I
The distinction between the capital letter X and small letter x is important;
X stands for the random variable in question, whereas x stands for a specific value
or outcome.
Or
Example. Two tetrahedral dice, with faces labeled 1,2,3,4 are thrown and the
score noted, where the score is the sum of the two numbers on which the dice
land. Find the probability density function (pdf ) of X, where X is the random
variable ’the score where two dice are thrown’
Solution:
x 2 3 4 5 6 7 8
P (X = x) 1/16 2/16 3/16 4/16 3/16 2/16 1/16
Since, 1
P
P (X = x) = 16 (1 + 2 + 3 + 4 + 3 + 2 + 1) = 1
Thus X is a random variable.
105
STA 2100 Probability and Statistics I
3.2. Expectation
E(X) read as 0 E of X 0 gives the average or typical value of X, known as the
expected value or expectation of X. X represents the random variable.
The mean of a discrete random variable is the mean of its probability distribu-
tion. This mean is also called the expected value or population mean of a random
variable and it indicates its average or central value.
This is the value we expect to observe per repetition, if we repeat an experiment
several times. This value is a useful summary of the variable’s distribution.
Stating the expected value gives a general impression of some random variable
without giving full details of its probability distribution. The expected value of a
random variable X is symbolized by E(X) or µ, read as “E of X” and is denoted
as;
X
E [X] = xP (X = x)
Exercise 17. A fruit machine consists of three windows which operate indepen-
dently. Each window shows pictures of fruits: Lemon, Apples, Cherries or Bananas.
The probability that a window shows a particular fruit is as follows:
P (Lemon) = 0.4
P (Cherries) = 0.2
106
STA 2100 Probability and Statistics I
P (Apple) = 0.1
P (Cherries) = 0.3
The rule for playing the game on the fruit machine is as follows: It costs Kshs
10 to play the game. A player will win Kshs 100 if he/she gets three Apples in a
row, Kshs 50 if he/she gets three Cherries in a row, Kshs 40 if he/she gets three
Lemons in a row and Kshs 80 if he/she gets two Apples and a Cherry in the game.
The order in which the fruits appear is not important. Based on this information,
would you expect to gain or lose if you play the game?
For instance,
P
E(10X) = 10xP (X = x)
E(X 2 ) = x2 P (X = x)
P
E( X1 ) =
P1
x
P (X = x)
P
E(X − 4) = (x − 4)P (X = x)
Example. The random variable X has a distribution function shown below.
x 1 2 3
P (X = x) 0.1 0.6 0.3
Find;
i) E(X)
P
E(X) = xP (X = x) = (1 ∗ 0.1) + (2 ∗ 0.6) + (3 ∗ 0.3) = 2.2
ii) E(3)
P
E(3) = 3P (X = 3) = (3 ∗ 0.1) + ... + (3 ∗ 0.3) = 3
iii) E(5X)
P
E(5X) = 5xP (X = x) = (5 ∗ 0.1) + ... + (15 ∗ 0.3) = 11,
Notice that 5E(X) = 5 ∗ 2.2 = 11
In general, for two constants a and b;
E(a) = a
E(aX) = aE(X)
E(aX + b) = aE(X) + b
107
STA 2100 Probability and Statistics I
Exercise 18. X is the number of heads obtained when two coins are tossed.
Find (a) the expected number of heads (b)E(X 2 ) (c) E(X 2 − X)
3.3. Variance
The variance of a random variable is a non-negative number which gives an idea
of how widely spread the values of the random variable are likely to be; the larger
the variance, the more scattered the observations on average.
Stating the variance gives an impression of how closely concentrated round the
expected value the distribution is; it is a measure of the ’spread’ of a distribution
about its average value. Variance is symbolized by V (X) or V ar(X) or σ 2 and is
defined as:
or var(X) = E(X 2 ) − µ2
Example. Find the variance of the following distribution
x 1 2 3 4 5
P (X = x) 0.1 0.3 0.2 0.3 0.1
var(X) = x2 P (X = x) − xP (X = x)
P P
= (12 ∗ 0.1) + (22 ∗ 0.3) + ... + (52 ∗ 0.1) − [(1 ∗ 0.1) + ... + (0.1 ∗ 5)]2
= 10.4 − 9 = 1.4
In general, if a and b are any two constants, then;
var(a) = 0
var(aX) = a2 var(X)
var(aX + b) = a2 var(X)
108
STA 2100 Probability and Statistics I
109
STA 2100 Probability and Statistics I
The variance of X is pq
Example. A random variable whose value represents the outcome of a coin toss
(1 for heads, 0 for tails, or vice-versa) is a Bernoulli variable with parameter p,
where p is the probability that the outcome corresponding to the value 1 occurs.
For an unbiased coin, where heads or tails are equally likely to occur, p = 0.5.
110
STA 2100 Probability and Statistics I
There are only two possible outcomes: either the card is an Ace or not. Therefore,
n = 8, p = 4/52 = 1/13, q = 12/13 and x = 0, 1, 2, 3, 4, 5, 6, 7, 8
In the next few sections of the lesson, we discuss the binomial distribution and
mainly showing how to solve a number of problems.We also define the Binomial
probability function.
Binomial Probability Formula
In a binomial experiment, the probability of exactly x
successes in n trials is
P (x ) nC x p xq n x n! p xq n x .
(n x )! x !
Example:
A bag contains 10 chips. 3 of the chips are red, 5 of the chips are
white, and 2 of the chips are blue. Three chips are selected, with
replacement. Find the probability that you select exactly one red chip.
p = the probability of selecting a red chip 3 0.3
10
q = 1 – p = 0.7 P (1) 3C1(0.3)1(0.7)2
n=3 3(0.3)(0.49)
x=1 0.441
Athiany, HKO 28
111
STA 2100 Probability and Statistics I
Athiany, HKO 29
112
STA 2100 Probability and Statistics I
Finding Probabilities
Example:
The following probability distribution represents the probability of
selecting 0, 1, 2, 3, or 4 red chips when 4 chips are selected.
x P ( x) a.) Find the probability of selecting no
0 0.24 more than 3 red chips.
1 0.412
2 0.265
3 0.076 b.) Find the probability of selecting at
4 0.008 least 1 red chip.
a.) P (no more than 3) = P (x 3) = P (0) + P (1) + P (2) + P (3)
= 0.24 + 0.412 + 0.265 + 0.076 = 0.993
b.) P (at least 1) = P (x 1) = 1 – P (0) = 1 – 0.24 = 0.76
Complement
Athiany, HKO 30
113
STA 2100 Probability and Statistics I
Probability
0.4
2 0.265
0.3
3 0.076
4 0.008 0.2
0.1
0 x
0 1 2 3 4
Number of red chips
Athiany, HKO 31
114
STA 2100 Probability and Statistics I
Athiany, HKO 32
5. Summary
A discrete probability distribution lists each possible value the random variable can
assume, together with its probability. A probability distribution must satisfy the
following conditions.
The mean of a discrete random variable is the mean of its probability distribution.
This mean is also called the expected value or population mean of a random
variable and it indicates its average or central value. This is the value we expect
to observe per repetition, if we repeat an experiment several times. This value is
a useful summary of the variable’s distribution. Stating the expected value gives
a general impression of some random variable without giving full details of its
probability distribution.
The variance of a random variable is a non-negative number which gives an
idea of how widely spread the values of the random variable are likely to be; the
larger the variance, the more scattered the observations on average. Stating the
variance gives an impression of how closely concentrated round the expected value
the distribution is; it is a measure of the ’spread’ of a distribution about its
average value
Guidelines for Constructing a Discrete Probability Distribution
115
STA 2100 Probability and Statistics I
• Check that each probability is between 0 and 1 and that the sum is 1.
116
STA 2100 Probability and Statistics I
3. Suppose a fair six sided die is tossed 5 times. What is the probability of
getting exactly 2 fours?
(a) Use the binomial probability formula to complete the probability distri-
bution .
X 0 1 2 3 4
P (X = x) 0.316 0.422 ? 0.047 ?
117
STA 2100 Probability and Statistics I
(e) What is the probability that two or fewer seniors in the sample played
sports all four years?
(f) If a new random variable Y = X 2 + 2X, use the above table to obtain
E(Y ) and sd(Y )
118