Probability and Statistics: Appendix
Probability and Statistics: Appendix
A P P E N D I X
A
Probability and Statistics
This appendix is intended to serve as a brief review of the probability and statistics
concepts used in this text. Students requiring more review than is available in this ap-
pendix should consult one of the texts listed in the bibliography.
A.1 PROBABILITY
672
AppA.qxd 12/6/02 2:54 PM Page 673
• Subjective probability
• Logical probability
• Experimental probability
Subjective Probability
Subjective probability is based on individual information and belief. Different indi-
viduals will assess the chances of a particular event in different ways, and the same
individual may assess different probabilities for the same event at different points in
time. For example, one need only watch the blackjack players in Las Vegas to see that
different people assess probabilities in different ways. Also, daily trading in the stock
market is the result of different probability assessments by those trading. The sellers
sell because it is their belief that the probability of appreciation is low, and the buyers
buy because they believe that the probability of appreciation is high. Clearly, these
different probability assessments are about the same events.
Logical Probability
Logical probability is based on physical phenomena and on symmetry of events. For
example, the probability of drawing a three of hearts from a standard 52-card playing
deck is 1/52. Each card has an equal likelihood of being drawn. In flipping a coin, the
chance of “heads’’ is 0.50. That is, since there are only two possible outcomes from
one flip of a coin, each event has one-half the total probability, or 0.50. A final exam-
ple is the roll of a single die. Since each of the six sides are identical, the chance of
any one event occurring (i.e., a 6, a 3, etc.) is 1/6.
Experimental Probability
Experimental probability is based on frequency of occurrence of events in trial situa-
tions. For example, in determining the appropriate inventory level to maintain in the
raw material inventory, we might measure and record the demand each day from that
inventory. If, in 100 days, demand was 20 units on 16 days, the probability of demand
equaling 20 units is said to be 0.16 (i.e., 16/100). In general, experimental probability
of an event is given by
number of times event occurred
probability of event =
total number of trials
Both logical and experimental probability are referred to as objective probability
in contrast to the individually assessed subjective probability. Each of these is based
on, and directly computed from, facts.
Events are classified in a number of ways that allow us to state rules for probability
computations. Some of these classifications and definitions follow.
AppA.qxd 12/6/02 2:54 PM Page 674
1. Independent events: events are independent if the occurrence of one does not af-
fect the probability of occurrence of the others.
2. Dependent events: events are termed dependent if the occurrence of one does af-
fect the probability of occurrence of others.
3. Mutually exclusive events: two events are termed mutually exclusive if the occur-
rence of one precludes the occurrence of the other. For example, in the birth of a
child, the events “It’s a boy!’’ and “It’s a girl!’’ are mutually exclusive.
4. Collectively exhaustive events: a set of events is termed collectively exhaustive if
on any one trial at least one of them must occur. For example, in rolling a die, one
of the events 1, 2, 3, 4, 5, or 6 must occur; therefore, these six events are collec-
tively exhaustive.
We can also define the union and intersection of two events. Consider two events
A and B. The union of A and B includes all outcomes in A or B or in both A and B. For
example, in a card game you will win if you draw a diamond or a jack. The union of
these two events includes all diamonds (including the jack of diamonds) and the re-
maining three jacks (hearts, clubs, spades). The or in the union is the inclusive or.
That is, in our example you will win with a jack or a diamond or a jack of diamonds
(i.e., both events).
The intersection of two events includes all outcomes that are members of both
events. Thus, in our previous example of jacks and diamonds, the jack of diamonds is
the only outcome contained in both events and is therefore the only member of the in-
tersection of the two events.
Let us now consider the relevant probability laws based on our understanding of
the above definitions and concepts. For ease of exposition let us define the following
notation:
P(A) = probability that event A will occur
P(B) = probability that event B will occur
If two events are mutually exclusive, then their joint occurrence is impossible.
Hence, P(A and B) = 0 for mutually exclusive events. If the events are not mutually
exclusive, P(A and B) can be computed (as we will see in the next section); this prob-
ability is termed the joint probability of A and B. Also, if A and B are not mutually ex-
clusive, then we can also define the conditional probability of A given that B has al-
ready occurred or the conditional probability of B given that A has already occurred.
These probabilities are written as P(A|B) and P(B| A), respectively.
A B
The two circles represent the probabilities of the events A and B, respectively. The
shaded area represents the overlap in the events; that is, the intersection of A and B. If
we add the area of A and the area of B, we have included the shaded area twice.
Therefore, to get the total area of A or B, we must subtract one of the areas of the in-
tersection that we have added.
AppA.qxd 12/6/02 2:54 PM Page 676
A.3 STATISTICS
where
k = the number of discrete values that the random variable Xj may assume
Xj = the value of the random variable
P(Xj) = is the probability (or relative frequency) of Xj in the population
Also, the mean can be computed as
N
µ= ∑ Xi / N
i =1
where
N = the size of the population (the number of different items in the population)
Xi = the value of the ith item in the population
The mean is also termed the expected value of the population and is written as E(X).
AppA.qxd 12/6/02 2:54 PM Page 677
The variance of the items in the population measures the dispersion of the items
about their mean. It is computed in one of the following two ways:
k
2
σ = ∑ ( X j − µ )2 P( X j )
j =1
or
k
( Xi − µ )2
σ2 = ∑ N
i =1
The standard deviation, another measure of dispersion, is simply the square root of
the variance or
σ = √σ2
Before considering the logic behind inferential statistics, let us define the primary
measures of central tendency and dispersion used in both descriptive and inferential
statistics.
k
∑ Ni
X
µ=N
i =1
where
n
∑ ni
X
X=
i =1
where
–
X = the sample mean (pronounced “X bar’’)
Xi = the value of the ith data item in the sample
n = the number of data items selected in the sample
The median is the middle value of a population of data (or sample) where the
data are ordered by value. That is, in the following data set
3, 2, 9, 6, 1, 5, 7, 3, 4
4 is the median since (as you can see when we order the data)
1, 2, 3, 3, 4, 5, 6, 7, 9
50 percent of the data values are above 4 and 50 percent below 4. If there are an even
number of data items, then the mean of the middle two is the median. For example, if
there had also been an 8 in the above data set, the median would be 4.5 = (4 + 5)/2.
The mode of a population (or sample) of data items is the value that most fre-
quently occurs. In the above data set, 3 is the mode of the set. A distribution can have
more than one mode if there are two or more values that appear with equal frequency.
AppA.qxd 12/6/02 2:54 PM Page 679
Measures of Dispersion
Dispersion refers to the scatter around the mean of a distribution of values. Three
measures of dispersion are the range, the variance, and the standard deviation.
The range is the difference between the highest and the lowest value of the data
set, that is, Xhigh − Xlow.
The variance of a population of items is given by
N
( Xi − µ )2
σ2 = ∑ N
i =1
where
σ2 = the population variance (pronounced “sigma squared”)
The variance of a sample of items is given by
n
( Xi − X )2
S2 = ∑ n
i =1
where
S 2 = the sample variance
The standard deviation is simply the square root of the variance. That is,
n
( Xi − µ )2
σ= ∑ N
i =1
and
n
( Xi − X )2
S= ∑ n
i =1
Inferential Statistics
A basis of inferential statistics is the interval estimate. Whenever we infer from par-
tial data to an entire population, we are doing so with some uncertainty in our infer-
ence. Specifying an interval estimate (e.g., average weight is between 10 and 12
pounds) rather than a point estimate (e.g., the average weight is 11.3 pounds) simply
helps to relate that uncertainty. The interval estimate is not as precise as the point esti-
mate.
AppA.qxd 12/6/02 2:54 PM Page 680
σ
σx =
n
σx = σ = 20 = 2
σ = 20 n 100
µx = 50 X µx = 50 X
2. From our knowledge of the normal distribution, we know that there is a number
(see normal probability table, back cover) associated with each probability value
of a normal distribution (e.g., the probability that an item will be within ± 2 stan-
dard deviations of the mean of a normal distribution is 94.45 percent, Z = 2 in this
case).
3. The value of the number Z is simply the number of standard deviations away from
the mean that a given point lies. That is,
( X − µ)
Z=
σ
20
56 ± 1.645
100
or
56 ± 3.29 or 52.71 to 59.29
This interval estimate of the population mean is based solely on information de-
rived from a sample and states that the estimator is 90 percent confident that the true
mean is between 52.71 and 59.29. There are numerous other sampling methods and
other parameters that can be estimated; the student is referred to one of the references
in the bibliography for further discussion.
and beta, determine the distribution’s shape. Its mean, µ, and variance, σ2, are given
by
α
µ=
α +β
αβ
σ2 = 2
(α + β ) (α + β + 2)
These are often approximated by
µ = (a + 4m + b)/6
and a standard deviation approximated by
σ = (b − a)/6
where
a is the optimistic value that might occur once in a hundred times,
m is the most likely (modal) value, and
b is the pessimistic value that might occur once in a hundred times.
Recent research (Keefer and Verdini, 1993) has indicated that a much better approxi-
mation is given by
µ = 0.630 d + 0.185 (c + e)
σ = 0.630 (d − µ)2 + 0.185 [(c − µ)2 + (e − µ)2]
2
where
c is an optimistic value at one in 20 times,
d is the median, and
e is a pessimistic value at one in 20 times.
See Chapter 8 for another method for approximating µ and σ.2
BIBLIOGRAPHY
ANDERSON, D., D. SWEENEY, and T. WILLIAMS. Statis- MENDENHALL, W., R. L. SCHAEFFER, and D. WACKERLY.
tics for Business and Economics. 7th ed. Cincinnati, Mathematical Statistics with Applications, 3rd ed.
OH: South-Western, 1998. Boston: PWS-Kent, 1986.
BHATTACHARYYA, G., and R. A. JOHNSON. Mathematical NETER, J., W. WASSERMAN, and G. A. WHITMORE.
Statistics. Paramus, NJ: Prentice-Hall, 1999. Applied Statistics, 3rd. ed. Boston: Allyn and Bacon,
KEEFER, D. L., and W. A. VERDINI. “Better Estimation 1987.
of PERT Activity Time Parameters.’’ Management Sci-
ence, September 1993.