Operation
Analytics
Predictive Analytics (Risk)
Introduction to Business Analytics
Khurram Shakeel
Risk and Evaluation of
Alternatives: Navigating
Uncertainty
This explores decision-making under varying levels of
uncertainty, providing practical frameworks and tools to
evaluate alternatives effectively. We'll delve into examples like
wireless data plans and inventory management to illustrate key
concepts.
Understanding Decision Environments
1 2
Low-Uncertainty Settings High-Uncertainty Settings
Decisions yield predictable, non-random outcomes. Outcomes are influenced by unknown factors, requiring
Perfect for optimizing specific metrics like profit or probabilistic approaches like simulation.
resource use.
3 4
Reward and Risk Connecting Inputs to Outputs
Assessing potential gains against potential losses is Understanding how random inputs propagate through
crucial in all decision contexts, especially with a system to create random outputs is fundamental to
uncertainty. robust analysis.
Decisions in Low-Uncertainty Environments
In settings with low uncertainty, each decision reliably produces
a specific, non-random outcome. This predictability applies to
both:
• Objective Function Value: Quantifiable metrics such as profit
or total shipping cost.
• Key Performance Indicators (KPIs): Measurable aspects like
resource consumption or distribution quantities.
Consider the Zooter example: a decision to produce 500 Razor
and 500 Navajo scooters yields an exact profit of $155,000 and
precisely 4,500 frame manufacturing hours used.
Navigating High-Uncertainty
The Newsvendor Problem
Newsvendor Example: The Wodget Dilemma
A "Wodget" sells for 12 talers, costs 3 talers, and unsold items
have no salvage value. The critical decision is how many to
stock.
In high-uncertainty scenarios, such as the Newsvendor example,
key decisions (e.g., inventory quantity Q) must be made before all
influential factors (e.g., product demand D) are known.
At the moment of decision, demand D is best modeled as a
random variable, directly impacting the final outcome like profit.
Modeling Random Variables using Scenarios
Random variables can be modeled using a “scenario” approach
– Each scenario is a value that a random variable can take
– Each scenario has a probability of being realized
For instance, historical sales data can be used to generate a normal distribution curve, providing insights into the
likelihood of various demand levels.
Modeling Uncertainty: Continuous Probability Distributions
Distributions
When facing uncertain outcomes, a powerful approach is to "fit" a probability distribution to historical data. This distribut ion
can then model future demand or other key variables.
For instance, historical sales data can be used to generate a normal distribution curve, providing insights into the likeliho od
of various demand levels.
Random Demand leads to Random Profit
If the demand is modeled as a random variable, profit may also become a random variable
Consider three demand values observed in the past
63
41
29
Random Demand Yields
Random Profit
If demand is treated as a random variable, the resulting profit
also becomes a random variable. This means a single decision
doesn't guarantee a fixed profit, but rather a spectrum of
possible profits.
For example, if we decide to order Q = 50 units of a product,
the actual profit will vary depending on the actual demand
that materializes.
This highlights the shift from deterministic outcomes to
probabilistic distributions in high-uncertainty environments.
The Impact of Demand on Profit
"Inventory Decision
Order Q = 50 units.
"Actual Demand Realized
D = 29: Profit = 12 * 29 - 3 * 50 = 198 talers
Alternative Demand
D = 63: Profit = 12 * 50 - 3 * 50 = 450 talers
Another Demand Scenario
D = 41: Profit = 12 * 41 - 3 * 50 = 342 talers
As shown, a single ordering decision leads to a distribution of potential profits, not a single fixed value.
Selecting Optimal Decisions in Low-
Low-Uncertainty
Low-Uncertainty Decision Process
For each potential decision, we must:
• Calculate the exact objective function value (e.g., profit).
• Verify the feasibility of the decision (e.g., resource constraints).
(R,N) 150*R+160*N
Among all feasible options, the decision that yields the best
objective function value is selected.
max 150*R+160*N
This systematic approach ensures that resources are allocated
efficiently and outcomes are maximized within known parameters.
Analyzing Monthly Payment
Risk Under New Data Plans
Understanding the true cost of a new data plan requires a deep
dive into potential monthly payments, especially when usage
varies. This outlines a structured approach to assess risk and
make informed decisions.
Case Study: Evaluating a Wireless Data Plan
A Philadelphia-based business analytics consultant is
optimizing her family's wireless data plan due to increased
video streaming.
• Current Plan: "Family Share"
• Cost: $10 per GB
• Proposed Plan: "Superior Share"
• Flat fee: $160 for up to 20GB/month
• Allowance is shareable
This scenario provides a practical application of evaluating
alternatives under different cost structures and usage
patterns.
Superior Share: Understanding the Cost Structure
Exceeding 20GB Data Threshold Usage Below 20GB Threshold
If monthly usage surpasses 20GB, an additional The full $160 flat fee is still paid, regardless of
charge of $15 per GB applies to the excess data. whether the entire 20GB allowance is used.
Example: 22GB usage = $160 + (22-20) * $15 = Example: 17GB usage = $160 (unused data does
$190 not roll over)
This tiered pricing model introduces a decision point where projected usage needs to be carefully balanced
against potential overage charges or unused allowances.
Key "Output" Measure: Monthly
Payment
Uncertainty in Usage
The primary concern is the actual monthly payment
incurred, which directly depends on the family's data
usage—an unknown at the time of plan selection.
Predictive Analytics
To address this, we leverage predictive analytics,
combining historical data with expert judgment to
forecast a probability distribution for future data usage.
Monthly Payment Under Old Plan:
Probability Distribution
Probability
Based on past trends, monthly data usage is modeled as a
normal random variable with a mean of 23GB and a standard
deviation of 5GB.
Consequently, the consultant's current plan results in a monthly
payment that is a normal random variable, averaging $230 with
a standard deviation of $50.
Monthly payment, $
This distribution allows us to visualize the likelihood of different payment amounts.
Reward and Risk
Probability
In dealing with uncertain outcomes it may be important to be
able to calculate performance measures that can be used to
compare decisions, like decisions to choose a new data plan
versus staying with the old one
When comparing decisions under uncertainty, we can then use
such performance measures as an objective function and
constraints
Monthly payment, $
Reward and Risk
Probability
One such performance measure is “reward”
Expected
Monthly
Expected value of cost or profit is often used as an indication Payment
of “attractiveness” of a particular decision
Expected value of the monthly payment is what the consultant
would pay, on average, if she would stay with her old data plan
for an infinite number of months
All other things being equal, a lower expected monthly
payment is more attractive Monthly payment, $
Reward and Risk
Probability
The expected monthly payment is what the consultant would
pay on average over infinite number of months Expected
Monthly
Payment
But, in any given month, the actual monthly payment is
uncertain and can be quite far away from the expected value
of $230
Monthly payment, $
An Example of Risk Measure: Standard Deviation of Monthly
Payments Under Old Data Plan
Probability
The standard deviation expresses how far away a consultant Expected
should expect her actual monthly payment to be from the Monthly
Payment
expected value of $230
Under the old data plan, the standard deviation of monthly
payments is $50
All other things being equal, a smaller standard deviation may
be more attractive
Monthly payment, $
An Example of Risk Measure: Standard Deviation of Monthly
Payments Under Old Data Plan
Probability
What constitutes “risk” may be different for different decision Expected
makers Monthly
Payment
Some may worry about the value of the standard deviation of
monthly payments being too large
Monthly payment, $
An Example of Risk Measure: Standard Deviation of Monthly
Payments Under Old Data Plan
Probability
Defining "risk" is subjective. Some may
focus on the standard deviation of
payments, aiming for consistency.
Others prioritize the likelihood of
exceeding a specific threshold, like
$300, to avoid unexpected budget
strains.
Understanding these perspectives helps
tailor risk assessments to individual
needs.
Monthly payment, $
Complete Description of the Random Future Monthly Payments
Payments Under the Old Plan
Probability
Consultant estimates that her monthly data usage is distributed Expected
as a normal random variable with the mean 23 GB and the Monthly
Payment
standard deviation 5 GB
So, the expected value of monthly payments under the old
plan is $230
The standard deviation of monthly payments under the old
plan is $50
Monthly payment, $
Distribution of Monthly Payments Under the New Data Plan
1 2
What is the expected monthly payment under the new What is the standard deviation of the monthly payments
data plan? under the new data plan?
An Algebraic Formula: Monthly Payment for Any Value of Data Usage
Usage
To precisely calculate the monthly payment (P) based on data usage (U), we use a conditional formula:
1 2
Base Cost Tiered Cost
If U is 20GB or less, the monthly payment is a flat $160. If U exceeds 20GB, the payment becomes $160 + $15
per GB over the limit.
This can be expressed in an EXCEL formula:
P = 160 + IF(U>20, 15*(U-20), 0) =IF(Condition, Choice1, Choice2)
The IF function evaluates a condition, returning Choice1 if true, and Choice2 if false.
Revisiting the Algebraic Formula
Given the formula P = 160 + IF(U>20, 15*(U-20), 0), and knowing that U is normally distributed with a mean of 23
and a standard deviation of 5, key questions arise:
1 What is the distribution 2 What is the expected 3 What is the standard
distribution of P? expected value of P? standard deviation of
of P?
These questions highlight the need for a more robust method of analysis than simple substitution.
Expected Value of Monthly Payment Under the New Plan?
A common misconception is to simply plug the expected value of U into the payment formula. Let's see why this is incorrect:
Direct Substitution The Flaw
If U's expected value is 23, then P might seem to be 160 + This method is generally inaccurate because of the
15*(23-20) = $205. conditional nature of the payment formula.
Consider a simplified example: if U is 18 (50% probability) or 28 (50% probability), its expected value is still 23.
• If U = 18, then P = 160
• If U = 28, then P = 160 + 15*(28 - 20) = 280
The true expected value of P is 0.5*160 + 0.5*280 = $220 — a significant difference from the $205 derived by direct substitution.
Simulation as an Analytics Tool
Simulation is a tool that uses a probability distribution of the “input” random variable (such as data usage U) to create
a distribution of the “output” random variable (such as monthly payment P)
Probability Distribution of U Simulation Probability Distribution of P
Simulation as an Analytics Tool
When direct analytical solutions are complex or impossible, simulation
provides a powerful alternative. This involves repeatedly generating
scenarios to build a comprehensive picture of outcomes.
1 Simulation Runs
Execute numerous "simulation runs" to mimic potential monthly
data usages.
2 Generate Sample Distribution
Each run produces an "output" value (e.g., monthly payment),
forming a "sample distribution."
3 Analyze Outcomes
Analyze this distribution to estimate expected values, standard
deviations, and other risk measures.
Microsoft Excel can effectively facilitate both the simulation process and the
subsequent data analysis, making it an accessible tool for this type of
modeling.
Simulated Data Usage Values and Corresponding
Monthly Payment Values: Excel Implementation
Simulated Data Usage Values and Corresponding
Monthly Payment Values: (n = 10 simulation runs, seed = 123
Simulation Run Data Usage, U (GB) Payment, P($)
1 11.9319952 160
2 24.0282690 220.4240354
3 25.6828047 245.242071
4 21.7321587 185.9823805
5 34.2335329 373.5029929
6 16.5820597 160
7 30.7079676 320.619514
8 36.9010808 413.5162123
9 20.3471859 165.2077878
10 28.3229996 284.8449946
Sample Mean 25.0470054 252.9339988
Sample St. Dev. 7.787935101 92.19007977
Simulated Data Usage Values and Corresponding
Monthly Payment Values: (n = 10 simulation runs, seed = 123
Simulation Run Data Usage, U (GB) Payment, P($)
1 11.9319952 160
2 24.0282690 220.4240354
3 25.6828047 245.242071
4 21.7321587 185.9823805
5 34.2335329 373.5029929
6 16.5820597 160
7 30.7079676 320.619514
8 36.9010808 413.5162123
9 20.3471859 165.2077878
10 28.3229996 284.8449946
SampleMean 25.0470054 252.9339988
SampleSt. Dev. 7.787935101 92.19007977
We are interested in analyzing the distribution of the monthly payment
Simulated Data Usage Values and Corresponding
Monthly Payment Values: (n = 10 simulation runs, seed = 123
Simulation Run Data Usage, U (GB) Payment, P($)
1 11.9319952 160
2 24.0282690 220.4240354
3 25.6828047 245.242071
4 21.7321587 185.9823805
5 34.2335329 373.5029929
6 16.5820597 160
7 30.7079676 320.619514
8 36.9010808 413.5162123
9 20.3471859 165.2077878
10 28.3229996 284.8449946
Sample Mean 25.0470054 252.9339988
Sample St. Dev. 7.787935101 92.19007977
First let us look at the simulated value of monthly data usage
Simulated Data Usage Values and Corresponding
Monthly Payment Values: (n = 10 simulation runs, seed = 123
Simulation Run Data Usage, U (GB) Payment, P($)
1 11.9319952 160
2 24.0282690 220.4240354
3 25.6828047 245.242071
4 21.7321587 185.9823805
5 34.2335329 373.5029929
6 16.5820597 160
7 30.7079676 320.619514
8 36.9010808 413.5162123
9 20.3471859 165.2077878
10 28.3229996 284.8449946
Sample Mean 25.0470054 252.9339988
Sample St. Dev. 7.787935101 92.19007977
Since it’s a true probability distribution, we can compare the sample mean and standard deviation with the true values
Simulated Data Usage Values and Corresponding
Monthly Payment Values: (n = 10 simulation runs, seed = 123
Simulation Run Data Usage, U (GB)
1 11.9319952
2 24.0282690
3 25.6828047
4 21.7321587
5 34.2335329
6 16.5820597
7 30.7079676
8 36.9010808
9 20.3471859
10 28.3229996
Sample Mean 25.0470054
Sample St. Dev. 7.787935101
Simulated Data Usage Values and Corresponding
Monthly Payment Values: (n = 10 simulation runs, seed = 123
Simulation Run Data Usage, U (GB)
1 11.9319952
2 24.0282690
3 25.6828047 In this simulation, 10 values for
4 21.7321587 monthly data usage “drawn” from
the normal distribution with a
5 34.2335329 mean of 23 and standard
6 16.5820597 deviation of 5 averaged to about
25.047.
7 30.7079676
8 36.9010808
9 20.3471859
10 28.3229996
Sample Mean 25.0470054
Sample St. Dev. 7.787935101
Simulated Data Usage Values and Corresponding
Monthly Payment Values: (n = 10 simulation runs, seed = 123
Simulation Run Data Usage, U (GB)
1 11.9319952
2 24.0282690
3 25.6828047 In this simulation, 10 values for
monthly data usage “drawn”
4 21.7321587 from the normal distribution
5 34.2335329 with a mean of 23 and standard
6 16.5820597 deviation of 5 produced a
sample standard deviation of
7 30.7079676 about 7.788.
8 36.9010808
9 20.3471859
10 28.3229996
Sample Mean 25.0470054
Sample St. Dev. 7.787935101
Simulated Data Usage Values and Corresponding
Monthly Payment Values: (n = 10 simulation runs, seed = 123
Simulation Run Data Usage, U (GB) Payment, P($)
1 11.9319952 160
2 24.0282690 220.4240354
3 25.6828047 245.242071
4 21.7321587 185.9823805
5 34.2335329 373.5029929
6 16.5820597 160
7 30.7079676 320.619514
8 36.9010808 413.5162123
9 20.3471859 165.2077878
10 28.3229996 284.8449946
Sample Mean 25.0470054 252.9339988
Sample St. Dev. 7.787935101 92.19007977
Simulated Data Usage Values and Corresponding
Monthly Payment Values: (n = 1000 simulation runs, seed = 123
C o m p aring Results fo r n = 10 and n = 1000 sim u latio n ru ns (seed
= 123)
n = 10 n = 1000
Simulation Run Data Usage, U (GB) Payment, P ($) Simulation Run Data Usage, U (GB) Payment, P ($)
1 11.9319952 160 1 11.93199518 160
2 24.0282690 220.4240354 2 24.02826903 220.4240354
3 25.6828047 245.242071 3 25.68280473 245.242071
4 21.7321587 185.9823805 4 21.7321587 185.9823805
5 34.2335329 373.5029929 5 34.23353286 373.5029929
6 16.5820597 160 6 16.58205969 160
7 30.7079676 320.619514 7 30.7079676 320.619514
8 36.9010808 413.5162123 8 36.90108082 413.5162123
20.3471859 9 20.34718585 165.2077878
9 165.2077878
28.3229996 1000 23.1895728 207.843592
10 284.8449946
Sample Mean 23.28418394 220.1594691
Sample Mean 25.0470054 252.9339988
Sample St. Dev. 4.877547328 58.23620041
Sample St. Dev. 7.787935101 92.19007977
C o m p aring Results fo r n = 10 and n = 1000 sim u latio n ru ns (seed
= 123)
n = 10 n = 1000
Simulation Run Data Usage, U (GB) Payment, P ($) Simulation Run Data Usage, U (GB) Payment, P ($)
1 11.9319952 160 1 11.93199518 160
2 24.0282690 220.4240354 2 24.02826903 220.4240354
3 25.6828047 245.242071 3 25.68280473 245.242071
4 21.7321587 185.9823805 4 21.7321587 185.9823805
5 34.2335329 373.5029929 5 34.23353286 373.5029929
6 16.5820597 160 6 16.58205969 160
7 30.7079676 320.619514 7 30.7079676 320.619514
8 36.9010808 413.5162123 8 36.90108082 413.5162123
20.3471859 9 20.34718585 165.2077878
9 165.2077878
28.3229996 1000 23.1895728 207.843592
10 284.8449946
Sample Mean 23.28418394 220.1594691
Sample Mean 25.0470054 252.9339988
Sample St. Dev. 4.877547328 58.23620041
Sample St. Dev. 7.787935101 92.19007977
The sample mean and sample standard deviation for monthly data usage simulated for n=1000 runs
(approximately, 23.2842 and 4.8775) are much closer to the true values of 23 and 5 than the corresponding
sample mean and standard deviation for n=10 simulation runs
C o m p aring Results fo r n = 10 and n = 1000 sim u latio n ru ns (seed
= 123)
n = 10 n = 1000
Simulation Run Data Usage, U (GB) Payment, P ($) Simulation Run Data Usage, U (GB) Payment, P ($)
1 11.9319952 160 1 11.93199518 160
2 24.0282690 220.4240354 2 24.02826903 220.4240354
3 25.6828047 245.242071 3 25.68280473 245.242071
4 21.7321587 185.9823805 4 21.7321587 185.9823805
5 34.2335329 373.5029929 5 34.23353286 373.5029929
6 16.5820597 160 6 16.58205969 160
7 30.7079676 320.619514 7 30.7079676 320.619514
8 36.9010808 413.5162123 8 36.90108082 413.5162123
20.3471859 9 20.34718585 165.2077878
9 165.2077878
28.3229996 1000 23.1895728 207.843592
10 284.8449946
Sample Mean 23.28418394 220.1594691
Sample Mean 25.0470054 252.9339988
Sample St. Dev. 4.877547328 58.23620041
Sample St. Dev. 7.787935101 92.19007977
In a similar way, the sample mean and sample standard deviation for monthly payment simulated for n=1000
runs (approximately, $220.1995 and $58.2362) are much closer to the true (unknown to us) values than the
corresponding sample mean and standard deviation for n=10 simulation runs
Random Seed Value?
So, what random seed value should one use
when running a simulation?
Comparing Results for n = 10 and n = 1000 simulation runs for different seed values
Simulated data usage values
n=10 seed = 123 seed = 1826 seed = 19104
Sample Mean, GB 25.05 19.48 24.72
Sample St. Dev., GB 7.79 5.21 3.20
n=1000 seed = 123 seed = 1826 seed = 19104
Sample Mean, GB 23.28 23.08 23.04
Sample St. Dev., GB 4.88 4.90 4.96
Random seed value does not matter much when you run a simulation with large
number of simulation runs
Visualizing Simulation Results Using
Histograms
Histograms offer an intuitive way to understand the distributions of
both inputs and outputs in a simulation.
Random Input
In our data plan example, the random input is the data
usage (U).
Random Output
The random output, directly influenced by U, is the monthly
payment (P).
Visualizing these distributions helps in grasping the spread and
concentration of values, aiding in risk assessment.
Histogram of Simulated Values of Data Usage U (n=1000, seed = 123)
seed = 123)
This histogram, generated from 1000 simulation runs, illustrates the frequency of data usage values. Each bar
represents the number of occurrences for a specific range of U (e.g., 10 < U ≤ 11). For further details, refer to the
DataPlan1000_Histogram.xlsx file.
Histogram of Simulated Values of Monthly Payment P (n=1000, seed =
(n=1000, seed = 123)
Histogram of Simulated Values of Monthly Payment P (n=1000, seed =
123)
The input (values of U) was drawn from a normal distribution – but the output
looks nothing like a normal distribution
Histogram of Simulated Values of Monthly Payment P (n=1000, seed =
123)
In general, one must use simulation to understand the shape of the distribution
and its parameters for an output random variable
Thank You
Predictive Analytics (Risk)
Introduction to Business Analytics
Khurram Shakeel