Data Science
Data Science
1
Source: Anderson, D. R., Sweeney, D. J., Williams, T. A., Camm, J. D., and Cochran, J. J. (2018). Statistics for Business & Economics. Cengage learning.
Uncertainties
• Managers often base their decisions on an analysis of
uncertainties such as the following:
• What are the chances that the sales will decrease if we increase
prices?
• What is the likelihood a new assembly method will increase
productivity?
• What are the odds that a new investment will be profitable?
2
Source: Anderson, D. R., Sweeney, D. J., Williams, T. A., Camm, J. D., and Cochran, J. J. (2018). Statistics for Business & Economics. Cengage learning.
Probability
• Probability is a numerical measure of the likelihood that an event will occur.
• Probability values are always assigned on a scale from 0 to 1.
• A probability near zero indicates an event is quite unlikely to occur.
• A probability near one indicates an event is almost certain to occur.
Source: Anderson, D. R., Sweeney, D. J., Williams, T. A., Camm, J. D., and Cochran, J. J. (2018). Statistics for Business & Economics. Cengage learning.
Statistical Experiments
• In statistics, the notion of an experiment differs somewhat
from that of an experiment in the physical sciences.
• In statistical experiments, probability determines outcomes.
• Even though the experiment is repeated exactly the same
way, an entirely different outcome may occur.
• For this reason, statistical experiments are sometimes called
random experiments.
4
Source: Anderson, D. R., Sweeney, D. J., Williams, T. A., Camm, J. D., and Cochran, J. J. (2018). Statistics for Business & Economics. Cengage learning.
Random Experiment and Its Sample Space
• A random experiment is a process that generates well-defined experimental
outcomes.
• The sample space for an experiment is the set of all experimental outcomes.
• An experimental outcome is also called a sample point.
• On any single repetition or trial, the outcome that occurs is determined completely
by chance
Experiment Experimental Outcomes
Toss a coin Head, tail
Inspect a part Defective, non-defective
Conduct a sale call Purchase, no purchase
Roll a die 1, 2, 3, 4, 5, 6
Play a football game Win, lost, tie
Source: Anderson, D. R., Sweeney, D. J., Williams, T. A., Camm, J. D., and Cochran, J. J. (2018). Statistics for Business & Economics. Cengage learning.
Probability
Sample space of an experiment is denoted 𝑆𝑆.
The sample space contains all possible outcomes of the experiment.
For example, Letter grades in a course: 𝑆𝑆 = 𝐴𝐴, 𝐵𝐵, 𝐶𝐶, 𝐷𝐷, 𝐹𝐹 , Passing a
course or not: 𝑆𝑆 = {𝑃𝑃, 𝐹𝐹}
Source: Anderson, D. R., Sweeney, D. J., Williams, T. A., Camm, J. D., and Cochran, J. J. (2018). Statistics for Business & Economics. Cengage learning.
Mutually Exclusive Events
• Events that cannot occur Examples:
simultaneously. 1.Flipping a Coin:
• If one event happens, the other 1.Event A: Getting heads.
cannot. 2.Event B: Getting tails.
Venn Diagram
Sample space S with a rectangle
Two circles to represent the events A and B
Complement of an event A
Denoted 𝐴𝐴𝑐𝑐
All outcomes in the sample space S that are not in A
The portion in the Venn diagram that is everything in S that is not included in A
Practice Problem
You roll a die with the sample space S = {1, 2, 3, 4, 5, 6}. You define A as
{1, 2, 3}, B as {1, 2, 3, 5, 6}, C as {4, 6}, and D as {4, 5, 6}. Determine
which of the following events are exhaustive and/or mutually exclusive.
a. A and B
b. A and C
c. A and D
d. B and C
Solution
a. 𝐴𝐴 ∪ 𝐵𝐵 = {1, 2, 3, 5, 6 } ≠ {1, 2, 3, 4, 5, 6} = 𝑆𝑆; the events 𝐴𝐴 and 𝐵𝐵 are not
exhaustive.
𝐴𝐴 ∩ 𝐵𝐵 = {1,2,3}; the events A and B are not mutually exclusive.
Source: Anderson, D. R., Sweeney, D. J., Williams, T. A., Camm, J. D., and Cochran, J. J. (2018). Statistics for Business & Economics. Cengage learning.
Solution
Practice Problem
A survey of magazine subscribers showed that 45.8% rented a car
during the past 12 months for business reasons, 54% rented a car
during the past 12 months for personal reasons, and 30% rented a car
during the past 12 months for both business and personal reasons.
a. What is the probability that a subscriber rented a car during the past
12 months for business or personal reasons?
b. What is the probability that a subscriber did not rent a car during
the past 12 months for either business or personal reasons?
Source: Anderson, D. R., Sweeney, D. J., Williams, T. A., Camm, J. D., and Cochran, J. J. (2018). Statistics for Business & Economics. Cengage learning.
Solution
Probability
Properties of probability
Empirical and classical probabilities do not vary, they are often grouped as objective
probabilities.
Rules of Probability
1. Complement rule 2. Addition rule
Used to find the probability of the union of two
Follows from one of the defining properties of events
probability: 𝑃𝑃 𝐴𝐴 + 𝑃𝑃 𝐴𝐴𝑐𝑐 = 1 The probability that A or B occurs, or that at least
Rearrange: 𝑃𝑃 𝐴𝐴𝑐𝑐 = 1 − 𝑃𝑃(𝐴𝐴) one of these events occurs
𝑃𝑃 𝐴𝐴 ∪ 𝐵𝐵 = 𝑃𝑃 𝐴𝐴 + 𝑃𝑃 𝐵𝐵 − 𝑃𝑃 𝐴𝐴 ∩ 𝐵𝐵
A B
𝑃𝑃 𝐴𝐴 ∪ 𝐵𝐵 = 𝑃𝑃 𝐴𝐴 + 𝑃𝑃 𝐵𝐵
Conditional probability
In business applications, the probability of interest is often a
conditional probability.
Examples include :
Joint probabilities
The values in the margins of the joint probability table provide the probabilities of each event
separately. That is, P(M) = .80, P(W) = .20, P(A) = .27, and P(Ac) = .73. These probabilities are
referred to as marginal probabilities because of their location in the margins of the joint
probability table.
Source: Anderson, D. R., Sweeney, D. J., Williams, T. A., Camm, J. D., and Cochran, J. J. (2018). Statistics for Business & Economics. Cengage learning.
Conditional Probability
Source: Anderson, D. R., Sweeney, D. J., Williams, T. A., Camm, J. D., and Cochran, J. J. (2018). Statistics for Business & Economics. Cengage learning.
Conditional probability
The conditional probability that A occurs given that B has occurred is
𝑃𝑃(𝐴𝐴∩𝐵𝐵)
derived as 𝑃𝑃 𝐴𝐴 𝐵𝐵 =
𝑃𝑃(𝐵𝐵)
𝑃𝑃(𝐴𝐴∩𝐵𝐵)
Similarly, 𝑃𝑃 𝐵𝐵 𝐴𝐴 =
𝑃𝑃(𝐴𝐴)
Independent Event
Source: Anderson, D. R., Sweeney, D. J., Williams, T. A., Camm, J. D., and Cochran, J. J. (2018). Statistics for Business & Economics. Cengage learning.
Multiplication rule of Probability
𝑃𝑃 𝐴𝐴 ∩ 𝐵𝐵 = 𝑃𝑃 𝐴𝐴 𝐵𝐵 𝑃𝑃 𝐵𝐵
Probability
Two events are independent if the occurrence of one event
does not affect the probability of the occurrence of the other
event.
Events are considered dependent if the occurrence of one is
related to the probability of the occurrence of the other event.
𝑃𝑃 𝐴𝐴 ∩ 𝐵𝐵 = 𝑃𝑃 𝐴𝐴 𝑃𝑃 𝐵𝐵 .
Practice Problem
Suppose that for a given year there is a 2% chance that
your desktop computer will crash and a 6% chance that
your laptop computer will crash. Moreover, there is a
0.12% chance that both computers will crash. Is the
reliability of the two computers independent of each
other?
Solution
Let D represent the outcome that your desktop crashes,
𝑃𝑃 𝐷𝐷 = 0.02
Let L represent the outcome that your laptop crashes, 𝑃𝑃 𝐿𝐿 =
0.06
The joint probability is 𝑃𝑃(𝐷𝐷 ∩ 𝐿𝐿) = 0.0012
𝑃𝑃(𝐷𝐷∩𝐿𝐿) 0.0012
Calculate 𝑃𝑃 𝐷𝐷 𝐿𝐿 = = = 0.02
𝑃𝑃(𝐿𝐿) 0.06
So, 𝑃𝑃 𝐷𝐷 𝐿𝐿 = 𝑃𝑃 𝐷𝐷 . If your laptop crashes, it does not alter
the probability that your desktop also crashes
The reliability of the two computers is independent
Practice Problem
Let P(A) = 0.65, P(B) = 0.30, and P(A | B) = 0.45.
a. Calculate P(A ∩ B).
b. Calculate P(A ∪ B).
c. Calculate P(B | A).
a. 0.48
b. 0.86
c. 0.76.
Practice Problem
An analyst estimates that the probability of default on a seven- year AA-rated
bond is 0.06, while that on a seven-year A-rated bond is 0.13. The probability
that they will both default is 0.04.
A B BC
A 26 34
AC 14 26
• What is the probability that a randomly selected attendee enrolls in the fitness center?
• What is the probability that a randomly selected attendee is over 50 years old?
• What is the probability that a randomly selected attendee enrolls in the fitness center
and is over 50 years old?
• What is the probability that an attendee enrolls in the fitness center, given the attendee
is over 50 years old?
Solution
c. What is the probability that a randomly selected attendee enrolls in the fitness center and is over
50 years old?
44
𝑃𝑃 𝐸𝐸 ∩ 𝑂𝑂 = = 0.11
400
d. What is the probability that an attendee enrolls in the fitness center, given the attendee is over 50
years old?
44
𝑃𝑃 𝐸𝐸 𝑂𝑂 = = 0.33
132
𝑃𝑃(𝐸𝐸∩𝑂𝑂) 0.11
𝑃𝑃 𝐸𝐸 𝑂𝑂 = = = 0.33
𝑃𝑃(𝑂𝑂) 0.33
Practice Problem
Apple products have become a household name in America. Suppose that the likelihood of
Owning an Apple product is 61% for households with kids and 48% for households without
kids. Suppose there are 1,200 households in a representative community, of which 820 are with
kids and the rest are without kids.
a. Are the events “household with kids” and “household without kids” mutually exclusive and
exhaustive? Explain.
c. What is the probability that a household is with kids and owns an Apple product?
d. What is the probability that a household is without kids and does not own an Apple product?
Practice Problem
Joint Probability
Let X and Y be two events in a sample space. Then the joint
probability of the two events, written as P(X ∩ Y), is given by
Number of observations in 𝐗𝐗 ∩ 𝐘𝐘
𝐏𝐏𝐗𝐗 ∩ 𝐘𝐘 ) =
Total number of observations
Total probability rule and Bayes’ Theorem
The total probability rule expresses the probability of an event, 𝐴𝐴, in terms of
probabilities of the intersection of 𝐴𝐴 with any mutually exclusive and exhaustive
events.
The total probability rule based on two events, 𝐵𝐵 and 𝐵𝐵𝑐𝑐 , is given by 𝑃𝑃 𝐴𝐴 =
𝑃𝑃 𝐴𝐴 ∩ 𝐵𝐵 + 𝑃𝑃 𝐴𝐴 ∩ 𝐵𝐵𝑐𝑐 .
Total probability rule and Bayes’ Theorem
Bayes’ theorem is a procedure for updating probabilities based on new
information; it uses the total probability rule.
The posterior probability 𝑃𝑃 𝐵𝐵|𝐴𝐴 can be found using the information on the prior
probability 𝑃𝑃 𝐵𝐵 along with conditional probabilities as
Probability Revision Using Bayes’ Theorem
Source: Anderson, D. R., Sweeney, D. J., Williams, T. A., Camm, J. D., and Cochran, J. J. (2018). Statistics for Business & Economics. Cengage learning.
Terms for Bayes Theorem components
• The prior probability (estimate of the probability without any further information) is denoted by P(B).
• P(B|A) is known as the posterior probability (that is, given that event, A has occurred, what is the
probability that event B will occur). That is, given the new information (or additional evidence) that
A has occurred, what is the projected chance of B occurring?
Application of Bayes’
Prior Probabilities New Information Posterior Probabilities
Theorem
Total probability rule and Bayes’ Theorem
The posterior probability 𝑃𝑃 𝐵𝐵|𝐴𝐴 can be found using the information
on the prior probability 𝑃𝑃 𝐵𝐵 along with conditional probabilities as
𝑃𝑃(𝐴𝐴|𝐵𝐵)𝑃𝑃(𝐵𝐵)
𝑃𝑃 𝐴𝐴|𝐵𝐵 𝑃𝑃(𝐵𝐵) + 𝑃𝑃 𝐴𝐴|𝐵𝐵𝑐𝑐 𝑃𝑃(𝐵𝐵𝑐𝑐 )
Total probability rule and Bayes’ Theorem
The analysis to include an n mutually exclusive and exhaustive events
𝐵𝐵1 , 𝐵𝐵2 , ⋯ , 𝐵𝐵𝑛𝑛 can be included as follows:
For the extended case, Bayes’ theorem, for any i = 1, 2, . . ., n, is
𝑃𝑃(𝐴𝐴|𝐵𝐵𝑖𝑖 )𝑃𝑃(𝐵𝐵𝑖𝑖 )
𝑃𝑃 𝐵𝐵𝑖𝑖 |𝐴𝐴 =
𝑃𝑃 𝐴𝐴|𝐵𝐵1 𝑃𝑃 𝐵𝐵1 + 𝑃𝑃 𝐴𝐴|𝐵𝐵2 𝑃𝑃 𝐵𝐵2 + ⋯ + 𝑃𝑃 𝐴𝐴|𝐵𝐵𝑛𝑛 𝑃𝑃 𝐵𝐵𝑛𝑛
Revising Probabilities with New Information
Revising Probabilities with New Information
Initial Phase of Analysis:
• Begin with initial or prior probability estimates for events of interest.
Obtaining Additional Information:
• Sources: samples, special reports, product tests.
Updating Probabilities:
• Calculate revised probabilities (posterior probabilities) using Bayes’ theorem.
Example Scenario:
• Manufacturing Firm:
• Receives parts from two suppliers (Supplier 1 and Supplier 2).
• 65% of parts from Supplier 1 (P(A1) = 0.65).
• 35% of parts from Supplier 2 (P(A2) = 0.35).
Quality Variance:
• Historical data indicates varying quality ratings between suppliers.
• Conditional probability values provided (Table 4.6).
Bayes’ Theorem:
• Tool for updating prior probabilities with new information.
Source: Anderson, D. R., Sweeney, D. J., Williams, T. A., Camm, J. D., and Cochran, J. J. (2018). Statistics for Business & Economics. Cengage learning.
Two-Supplier Example
Tree Diagram for Two-Supplier Example
Historical Quality Levels
of Two Suppliers
Source: Anderson, D. R., Sweeney, D. J., Williams, T. A., Camm, J. D., and Cochran, J. J. (2018). Statistics for Business & Economics. Cengage learning.
Probability Tree for Two-Supplier Example
Source: Anderson, D. R., Sweeney, D. J., Williams, T. A., Camm, J. D., and Cochran, J. J. (2018). Statistics for Business & Economics. Cengage learning.
Probability Calculation Using Bayes’ Theorem
Scenario:
•Machine breaks down due to a bad part. Conditional Probabilities:
•P(B|A1): Probability of a bad part from Supplier 1.
Objective: •P(B|A2): Probability of a bad part from Supplier 2.
•Determine the probability that the bad part
came from Supplier 1. Using Bayes' Theorem:
•Determine the probability that the bad part •Calculate P(A1|B): Probability part is from Supplier 1
came from Supplier 2. given it is bad.
•Calculate P(A2|B): Probability part is from Supplier 2
Given: given it is bad.
•P(A1): Probability part is from Supplier 1.
•P(A2): Probability part is from Supplier 2. Probability Tree :
•Visual aid to understand and calculate revised
probabilities.
Source: Anderson, D. R., Sweeney, D. J., Williams, T. A., Camm, J. D., and Cochran, J. J. (2018). Statistics for Business & Economics. Cengage learning.
Bayes’ Theorem
Source: Anderson, D. R., Sweeney, D. J., Williams, T. A., Camm, J. D., and Cochran, J. J. (2018). Statistics for Business & Economics. Cengage learning.
Tabular Approach to Bayes’ Theorem Calculations
Step 1. Prepare the following three columns:
Column 1— The mutually exclusive events Ai for which posterior probabilities are desired
Column 2— The prior probabilities P(Ai) for the events
Column 3— The conditional probabilities P(B ∣ Ai) of the new information B given each event
Step 2. In column 4, compute the joint probabilities P(Ai ∩ B) for each event and the new information B by using the multiplication law. These joint
probabilities are found by multiplying the prior probabilities in column 2 by the corresponding conditional probabilities in column 3; that is, P(Ai ∩ B) =
P(Ai)P(B ∣ Ai).
Step 3. Sum the joint probabilities in column 4. The sum is the probability of the new information, P(B). Thus, we see in the Table below that there is a
.0130 probability that the part came from supplier 1 and is bad and a .0175 probability that the part came from supplier 2 and is bad. Because these are
the only two ways in which a bad part can be obtained, the sum .0130 + .0175 shows an overall probability of .0305 of finding a bad part from the
combined shipments of the two suppliers.
Step 4. In column 5, compute the posterior probabilities using the basic relationship of conditional probability.
• P(Ai | B) = P(Ai ∩ B)/P(B)
Note that the joint probabilities P(Ai ∩ B) are in column 4 and the probability P(B) is the sum of column 4.
Source: Anderson, D. R., Sweeney, D. J., Williams, T. A., Camm, J. D., and Cochran, J. J. (2018). Statistics for Business & Economics. Cengage learning.
Practice Problem
Solution
Practice Problem
Source: Anderson, D. R., Sweeney, D. J., Williams, T. A., Camm, J. D., and Cochran, J. J. (2018). Statistics for Business & Economics. Cengage learning.
Solution
Practice Problem
Christine has always been weak in mathematics. Based on her performance prior
to the final exam in Calculus, there is a 40% chance that she will fail the course if
she does not have a tutor. With a tutor, her probability of failing decreases to 10%.
There is only a 50% chance that she will find a tutor at such short notice.
Source: Anderson, D. R., Sweeney, D. J., Williams, T. A., Camm, J. D., and Cochran, J. J. (2018). Statistics for Business & Economics. Cengage learning.
Solution