Spreadsheet Simulation
Spreadsheet Simulation
SPREADSHEET SIMULATION
Andrew F. Seila
features are available in most spreadsheets to make the eral policies to find the inventory policy that produces the
process quick and reliable: lowest mean cost. This is a typical stochastic model that
1. A large number of functions to do mathematical, can be analyzed using simulation. Interesting spreadsheet
statistical, database, date/time, financial and other implementations of queueing simulations have also been
calculations. developed (Grossman 1999).
2. Database representations and database access.
3. Charting and graphing. 3.2 Sensitivity Analysis for Spreadsheet Models
4. Display and documentation features such as fonts,
colors and geometric shapes to improve presenta- Another situation where spreadsheet simulation is useful
tion. involves doing a “what-if” or sensitivity analysis for models
5. Automation through scripting languages such as having unknown parameters that are not necessarily random.
VBA (in the case of Excel). It is often the case in spreadsheet models that modelers
The table structure of spreadsheets allows the devel- frequently want to determine how sensitive the performance
oper to organize the computations and results in a natural measure is to variations in these parameters. For example,
and intuitive manner. Spreadsheets are ubiquitous - almost in a model that concerns the leasing or purchase of a piece
everybody has one - and file formats have become stan- of real estate, the mortgage interest rate at the time the
dardized, so files written by one spreadsheet can usually contract is signed is an unknown parameter. The present
be imported by others. As a result, developers and users value of each decision (lease vs. buy) will depend upon
can easily pass simulation models from one to another. For this parameter value. If only one or two parameters are
these reasons, the spreadsheet is an attractive platform for involved, modelers can use the “Table” command of the
simulation. “Data” menu to evaluate the performance measure when
There are a number of publications that discuss spread- each parameter value in a collection of possible values is
sheet simulation. See (Winston 1996) and (Seila, Ceric, and substituted into the model. Excel and other spreadsheets
Tadikamalla 2003), Chapters 2 through 4 for very readable will support this calculation with one unknown parameter
tutorials. and many performance measures, or with two unknown
parameters and one performance measure. For example,
3 WHEN IS SIMULATION USING A one could vary the interest rate from 5.5% to 11.0% in
SPREADSHEET APPROPRIATE? steps of .5% and, for each value, compute the present
value of the lease decision and the present value of the
Certain modeling situations lend themselves well to imple- buy decision. Or, one could vary the interest rate and also
mentation in a spreadsheet. Indeed, any set of calculations vary the value of the property over a discrete set of values,
in a spreadsheet can be considered a model. Usually, these and for each combination of these two values, compute the
models have parameters or variables whose values are un- difference between the present values of the two decisions.
known and assumed. In real spreadsheet models, there are normally many
unknown parameters, as well as multiple performance mea-
3.1 Stochastic Models sures. This type of what-if analysis can become unwieldy
when the number of parameters is more than a few. For ex-
In some cases, the unknown parameters are actually ran- ample, suppose that the number of unknown parameters in
dom variables whose value cannot be predicted, i.e., the the model is 10, and the number of possible values for each
models are stochastic models. Many stochastic models in of these parameters is 3, denoting the minimum, most likely
finance (including real estate and insurance), logistics and and maximum values. Then, the number of recalculations
engineering can be conveniently setup in a spreadsheet for that must be performed in order to assess all combinations of
simulation. Spreadsheets are frequently used by actuaries, these possible values is 310 = 59, 049. Clearly, this is pos-
for example, to evaluate insurance rating methods. Con- sible only if the process of recalculation is automated, and
sider, for example, an inventory model in which the demand then it is rather time-consuming. If the number of parame-
for the product is stochastic. In order to evaluate a particu- ters increases to just 15, the number of recalculations grows
lar replenishment policy, this value must be sampled when to about 14 million, an infeasible amount of computing on
the simulation experiment is run. An experiment would most desktop systems. The solution to this conundrum is
consist of sampling demand for the product and applying simulation. By sampling these unknown values from ap-
the inventory policy over a long period of time to compute propriate distributions, one can do a “what-if” analysis on
observations of the periodic costs resulting from excess a model with a large number of unknown parameters. In
inventory and shortages associated with the policy. These fact, 1000 replications is generally enough observations to
observations would then be used to estimate the mean cost assess the variation in the output measures, regardless of the
for the policy. The experiment would be repeated for sev- number of combinations of values of unknown parameters.
26
Seila
Thus, simulation is a useful technique when the number of Third, identify the “output data” for the model. Actu-
unknown parameters is moderate or large. ally, the modeler should know these desired performance
The mechanics of setting up and running a spreadsheet measures when the model is created. For example, in an
simulation are very much the same in both of these cases, inventory model, one might use the mean cost per period
but there is one important difference in the way the output as a performance measure, so the output data for the model
data are analyzed: When simulating a stochastic model, you would be the costs incurred in each period. Here, you
are usually interested in using the output data to estimate an want to identify those cells that contain the values of these
unknown performance measure for the model; when doing performance measures. At this point, you can observe the
sensitivity analysis, you are interested in using the output values of these cells change (i.e., being sampled) each time
data to assess the amount of variation in one or more output the spreadsheet is recalculated.
quantities.
4.2 Create the Simulation Run
4 HOW DOES ONE SETUP A SPREADSHEET
SIMULATION? It is useful to distinguish two types of simulation experi-
ments: (1) static simulations that replicate the experiment in-
Generally, each cell in a spreadsheet model can be classified dependently, producing independent, identically distributed
as containing one of three types of quantities: observations and (2) dynamic simulations that produce a
• Inputs to the model. These cells can contain pa- time series of dependent observations. The setup for each
rameters that are part of the model, such as unit of these types is different.
costs or mean demand. The contents can also be Where independent replications are performed, the
sampled values of the random variables that rep- model computations are normally contained in some region
resent uncertain quantities in the model such as or group of regions of the spreadsheet. Since a recalcu-
demand or price paid, or they can be assumed lation produces a replication, we need to do a series of
values of unknown parameters when one is doing recalculations of the spreadsheet and save the outputs after
a sensitivity analysis. each recalculation to perform the replications. There are
• Intermediate computations. These cells contain several ways to accomplish this. If the model computations
calculations that are involved in the model. For can be put in a single row, we can just copy this row the
example, in an inventory model, they might com- appropriate number of times, so each recalculation produces
pute the inventory levels or backlogs at the end of all replications at once. If the computations in the model
each period. are more involved and cannot be placed on a single row,
• Outputs from the model. These cells contain the we can use the Table command in the Data menu to tell the
observations on quantities of interest one seeks spreadsheet to go through an iterative recalculation, storing
from the model. For example, in an inventory the values of the outputs between each recalculation. To
model, these observations could be the costs in- setup the data table, first create a column of numbers having
curred during each period. values from 1 to the number of replications you will per-
Most models that can be organized in this way can form. Excel and most other spreadsheets have an easy way
be simulated in a spreadsheet. The following steps are to create a column or row of consecutive numbers. Above
described in more detail in Chapter 2 of (Seila, Ceric, and each adjacent column to the right, place a formula that will
Tadikamalla 2003). produce the value of a specific output. The design is for
each row of this table to contain the replication number in
4.1 Setup the Model the first column and the outputs, i.e., observations on each
performance measure, for that replication in the adjacent
The first step is to build the model in the spreadsheet using columns.
definite values for all parameters and other inputs. This Actually running the replications involves using a com-
allows one to check the computations and assure the correct- mand such as the Data-Table command to tell the spreadsheet
ness of the model before the simulation-specific components to run the recalculations. This command was originally cre-
are added. ated to perform “what-if” scenarios as described above by
Second, replace the values in the cells that represent substituting each value in the first column into a specific cell,
random or unknown quantities with formulas that sample recalculating the spreadsheet, and recording the values of
these quantities from appropriate distributions. Appropriate other cells that depend upon the substituted value adjacent
formulas can be found in any reference on random variate to the substituted value. In our case, the contents of each
generation. See (Cheng 1998) for example. At this point, all cell in the first column of the data table is just a replica-
random variates will be resampled each time the spreadsheet tion number, and since the replications are independent and
is recalculated. identically distributed, the replication number is not actu-
27
Seila
ally used in recalculation. However, the recalculation will 5 SIMULATION ADD-INS FOR SPREADSHEETS
cause all random variate sampling formulas to re-execute,
producing a new value for all random variates and thus new The process of developing and running a simulation in a
values for all outputs that are independent of those for all spreadsheet can be simplified somewhat by using one of
other replications. The Data-Table command produces a the available commercial add-in packages for Excel such
dialog asking where you want to put the input value. You as @RISK (http://www.palisade.com) or Crys-
can select any unused cell and click “OK”. The data table tal Ball (http://www.decisioneering.com). Pop-
containing the results of the replications will fill quickly. Tools (http://www.cse.csiro.au/poptools/) is
When it is finished, each column of this table except the a free Excel add-in. Another free add-in for Excel called
first will contain all of the observations on a specific per- SIMTOOLS.XLA by Professor Roger Myerson is avail-
formance measure. It is not difficult to do thousands, or able at http://home.uchicago.edu/˜rmyerson.
even tens of thousands, of replications in this way. These packages provide several features that are not in-
In a dynamic simulation, the output values are observed cluded in the basic spreadsheet:
periodically over time. For example, in an inventory model, • Random number generation using documented and
the costs incurred might be observed at the end of each tested algorithms.
week. In addition, each output observation will depend • Extensive functions for generating random variates
in some way on the previous outputs. If each period’s from a variety of distributions.
computations can be placed in a single row, then the next • Features to automate the setup and running of the
period’s computations are constructed from the contents simulation experiment.
of the cells in the previous row. Once the computations • Features to automate analysis and presentation of
for a representative set of periods are setup, i.e., once a the output data from the simulation experiment.
representative row is entered, the row(s) can be copied, thus • Optimization procedures for the model.
extending the time span of the model and producing the The random number function, which is called
desired number of periodic observations. As a result, the “RAND()” in most spreadsheets, produces a pseudo ran-
sequence of dependent output observations in the simulation dom sample from a uniform distribution between 0 and 1.
will be contained in one or more columns of the spreadsheet. Unfortunately, many spreadsheet publishers do not docu-
Chapter 3 of (Seila, Ceric, and Tadikamalla 2003) has some ment the algorithm used in RAND(). Frequently, these are
examples of dynamic models implemented in a spreadsheet. just the functions that are distributed as part of the C or
C++ compiler. Research has shown that some algorithms for
4.3 Analyze the Data generating random numbers have better statistical properties
than others (Fishman 1996, L’Ecuyer 1998). Thus, using
Each simulation has its own analysis requirements (Alex- the built-in RAND() function carries some risk that the ran-
opoulos and Seila 1998). For stochastic models, analysis dom numbers will not behave as truly independent, random
normally involves applying statistical procedures to compute numbers. In @RISK, Crystal Ball and PopTools, the ran-
estimates of population parameters as well as confidence dom number generators have been tested and documented,
intervals for these estimates. When sensitivity analysis is and therefore are recommended over RAND().
the objective, data analysis is concerned with evaluating It is easy to write functions that generate observations
the range of values of the output data. This can involve from some distributions such as the triangular, exponential
computing extreme values of the data such as quantiles and and normal distributions, starting from independent uniform
graphically displaying the distribution of the data. Most random variates (Cheng 1998). However, observations from
spreadsheets have formulas for computing the sample mean, some distributions such as the Gamma and Weibull are
sample variance and quantiles of well known distributions difficult or impossible to generate using just the built-in
such as the normal distribution, so the usual confidence functions of the spreadsheet. These add-ins provide easy,
interval formulas can be applied. Spreadsheets also have intuitive functions for all common distributions.
a rich selection of other statistical computations such as If you use the Data-Table method described above to
regression analysis and quantile computation, which can be run replications, some effort is required to set it up and the
applied too. method uses space in the spreadsheet to store the results.
Presentation generally includes some tables and graphs. These packages implement their own iterative procedure to
Spreadsheets have extensive facilities that make it easy to run replications and store the resulting summary statistics
produce these types of presentations in high quality. or raw data. Often, you do not need to store all of the
Some example spreadsheet models can be found raw data. Only the summary statistics are needed. Thus,
at http://seila.terry.uga.edu/spreadsheetSim. these add-ins can simplify the problems of setting up and
These models illustrate the concepts and techniques just running simulations, and analyzing the output. Examples
discussed. of the use of these add=ins can be found on their websites
28
Seila
and examples of the use of @RISK for financial modeling and provide all of the features. Thus, spreadsheets are
can be found in Chapter 3 of (Seila, Ceric, and Tadikamalla inefficient in their use of memory. A model that is very
2003). large and/or requires long simulation runs would need to
be programmed in a compiled language in order to execute
6 WHEN IS SPREADSHEET SIMULATION NOT in a feasible length of time or use a reasonable amount of
ADVISED? main memory.
Finally, since the output data must be stored in the
Spreadsheets are powerful, convenient tools for simulation spreadsheet, usually in a column, the length of the output
modeling, but they do have four important limitations: (1) series is limited. In many spreadsheets, column lengths
Only simple data structures are available, (2) complex algo- can be tens of thousands or even hundreds of thousands of
rithms are difficult to implement, (3) spreadsheets are slower cells. However, some models such as models that evaluate
than some alternatives and (4) data storage is limited. Let’s the reliability of highly reliable systems, require very large
examine each of these limitations. sample sizes - in the millions of observations. There are
The spreadsheet consists of a group of pages, each of ways to circumvent this restriction. One could use multiple
which has a table consisting of rows and columns of cells. columns to store output data for the same performance
Each cell can contain data or a formula. One can treat a measure, or one could accumulate sample statistics without
column or row of cells as a vector, and a two-dimensional actually storing the raw data. All of these solutions require
range of cells can be treated as a two-dimensional array, a more complex approach to the simulation and result in
or matrix. In some simulation models, more elaborate data more inefficiency in the execution of the simulation. When
structures such as lists and trees are needed. One case in this is the case, it is appropriate to ask if another platform
point is that of discrete event simulation, where lists are would be a better solution.
needed for the event list and waiting lines. These structures These four limitations seem to restrict considerably the
can be built in a spreadsheet, but they are contrived and range of models that can be implemented in a spreadsheet.
inefficient. However, many models are not subject to these restrictions,
For the most part, formulas in spreadsheet cells are and they are often done to get “quick and dirty” results.
static computations that are executed once when the cell is This is the place where a spreadsheet really earns its bars.
recalculated. Spreadsheets do not have convenient facilities Prototypes can be quickly built and run in a spreadsheet.
to implement a while-loop or a for-loop. These can also be If the prototype shows that the simulation does not work
implemented, but the implementation is often inefficient and well in the spreadsheet, then it can be moved to a more
inflexible. For example, if a computation needs to be done appropriate platform.
10 times, it can be implemented in a column or row of 10
cells. But, what if it needs to be done 100,000 times? Most 7 CONCLUSIONS
spreadsheets do not allow a column this long. Moreover,
how would you implement a loop that must be executed Spreadsheets provide a useful platform for many simulation
until a particular value is obtained? For example, in an models. The attractiveness of this platform comes from
actuarial ruin model, the value of the firm is computed until its availability, intuitive interface, ease of use and pow-
it is negative. Since you do not know the maximum number erful features. Of the situations where a simulation can
of periods to guarantee ruin (it might even be infinite), you provide valuable information for decision making, a simu-
do not know how many cells to include. VBA in Excel lation actually built and used in only a small percentage of
can be used to implement more complex logic, but this cases. The reasons for this underutilization of simulation
is a more advanced tool that is seldom used by casual are many. Sometimes, the managers or analysts are not
spreadsheet users. So, models that require complex loops familiar with simulation software. Spreadsheets provide a
and other conditional computations may not work well in means to use familiar, intuitive software to do simulation,
a spreadsheet. and if the computations for the model are already imple-
Consider what a spreadsheet must do to recalculate. mented in a spreadsheet, this approach avoids the trouble
Formulas are stored in “source code.” That is, the spread- of moving them to another simulation platform.
sheet must interpret the formula before it can be executed. Many simulations do not need to be extensive. They
This interpretation action normally takes much longer than are designed to provide ball-park estimates and to show
the execution. Some spreadsheets are sophisiticated enough general system behavior. This is often true of financial
to store the executable code so the interpretation does not models. These models can usually be implemented most
have to be repeated each time, but it is nevertheless a much efficiently in a spreadsheet. Simulation problems for which
less efficient setup than one would have with a compiled spreadsheets are a useful platform also include prototype
language. Moreover, spreadsheets use much more of the models which are relatively small and designed to provide
computer’s resources to support the elaborate user interface a proof of concept.
29
Seila
Commercial and free spreadsheets are continuing to be Program at the University of Georgia. He is a member
developed. Future versions will undoubtedly allow larger of the American Statistical Association, INFORMS and the
worksheets and perform computations more efficiently. As Healthcare Information Management Systems Society. His
computing power continues to grow, the limitations to e-mail address is <[email protected]>.
spreadsheet simulation will be removed and this platform
will be even more attractive. Excel comes bundled with
an optimization tool (solver). Perhaps there will be a time
when it also comes bundled with a simulation tool!
REFERENCES
AUTHOR BIOGRAPHY