0% found this document useful (0 votes)
371 views

CS 1538 Introduction To Simulation: Course Notes For

"A simulation is the imitation of the operation of a real-world process or system over time". It "involves the generation of an artificial history of a system, and the observation of that artificial history to draw inferences … "

Uploaded by

Achmad Hidayat
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
371 views

CS 1538 Introduction To Simulation: Course Notes For

"A simulation is the imitation of the operation of a real-world process or system over time". It "involves the generation of an artificial history of a system, and the observation of that artificial history to draw inferences … "

Uploaded by

Achmad Hidayat
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 228

Course Notes for

CS 1538
Introduction to Simulation
By
John C. Ramirez
Department of Computer Science
University of Pittsburgh
• These notes are intended for use by students in CS1538 at
the University of Pittsburgh and no one else
• These notes are provided free of charge and may not be sold
in any shape or form
• These notes are NOT a substitute for material covered during
course lectures. If you miss a lecture, you should definitely
obtain both these notes and notes written by a student who
attended the lecture.
• Material from these notes is obtained from various sources,
including, but not limited to, the following:
Discrete-Event System Simulation, Fourth Edition by Banks,
Carson, Nelson and Nicol (Prentice Hall)
• Also (same title and authors) Third Edition
Object-Oriented Discrete-Event Simulation with Java by Garrido
(Kluwer Academic/Plenum Publishers)
Simulation Modeling and Analysis, Third Edition by Law and
Kelton (McGraw Hill)
A First Course in Monte Carlo
2
by George S. Fishman (Thomson &
Brooks/ Cole
Goals of Course

• To understand the basics of computer


simulation, including:
Simulation concepts and terminology
When it is useful
Why it is useful
How to approach a simulation
How to develop / run a simulation
How to interpret / analyze the results

3
Goals of Course

• To understand and utilize some of the


mathematics required in simulations
Statistical models and probability
distributions
• How various models are defined
• Which models are correct for which situations
Simple queuing theory
• Characteristics
• Performance measures
• Markovian models

4
Goals of Course

Random number theory


• Generating and testing pseudo-random
numbers
• Generating pseudo-random values within
various distributions
Analysis / generation of input data
• How is input data generated?
• Is the data correct and appropriate for the
simulation?
Analysis / measurement of output data
• What does the output data mean and what
can be derived from it?
• How confident are we in our results?
5
Goals of Course

• To implement some simulation tools


and some simulation projects
What enhancements do typical
programming languages need to facilitate
simulation?
Programming will be done in Java
• Review if you are rusty
• Find / keep a good Java reference
• There are special-purpose simulation
languages, but we will probably not be using
them

6
Introduction to Simulation

• What is simulation?
Banks, et al:
• "A simulation is the imitation of the operation
of a real-world process or system over time".
It "involves the generation of an artificial
history of a system, and the observation of
that artificial history to draw inferences … "
Law & Kelton:
• "In a simulation we use a computer to
evaluate a model (of a system) numerically,
and data are gathered in order to estimate the
desired true characteristics of the model"

7
Introduction to Simulation

More specifically (but still superficially)


• We develop a model of some real-world
system that (we hope) represents the
essential characteristics of that system
– Does not need to exactly represent the system
– just the relevant parts
• We use a program (usually) to test / analyze
that model
– Carefully choosing input and output
• We use the results of the program to make
some deductions about the real-world system

8
Introduction to Simulation

• Why (or when) do we use simulation?


• This is fairly intuitive
Consider arbitrary large system X
• Could be a computer system, a highway, a
factory, a space probe, etc.
We'd like to evaluate X under different
conditions
• Option 1: Build system X and generate the
conditions, then examine the results
– This is not always feasible for many reasons:
> X may be difficult to build
> X may be expensive to build

9
Introduction to Simulation

> We may not want to build X unless it is


"worthwhile"
> The conditions that we are testing may be
difficult or expensive to generate for the real
system
• For example:
– A company needs to increase its production
and needs to decide whether it should build a
new plant or it should try to increase
production in the plants it already has
> Which option is more cost-effective for the
company?
– Clearly, building the new plant would be very
expensive and would not be desirable to do
unless it is the more cost-effective solution
– But how can we know this unless we have built
the new plant? 10
Introduction to Simulation

• Option 2: Model system X, simulate the


conditions and use the simulation results to
decide
– Continuing with the same example:
– Model both possibilities for increasing
production and simulate them both
> We then choose the solution that is most
economically feasible
• Clearly, this is itself not a trivial task
– Simulations are often large, complex and
difficult to develop
– Just developing the correct system model can
be a daunting task
– However, if a new plant costs hundreds of
millions or even billions of dollars, spending on
the order of thousands
11
(or even hundreds of
thousands) of dollars on a simulation could be a
Introduction to Simulation

When is simulation NOT a good idea?


– See Section 1.2 of Banks text
• Don't use a simulation when the problem can
be solved in a "simpler" or more exact way
– Some things that we think may have to be
simulated can be solved analytically
– Ex: Given N rolls of a fair pair of dice, what are
the relative expected frequencies of each of
the possible values {2, 3, 4, … 12} ?
> We could certainly simulate this, "rolling" the
dice N times and counting
> However, based on the probability of each
possible result, we can derive a more exact
answer analytically

12
Introduction to Simulation

> How many ways do we have of obtaining each


outcome?
2:1, 3:2, 4:3, 5:4, 6:5, 7:6, 8:5, 9:4, 10:3, 11:2,
12:1
Total of 36 possible outcomes
For N "rolls", the expected frequency of value i is
N * (Pi) = N * (outcomes yielding i / total
outcomes)
> For example, for 900 rolls, the expected number
of 9s generated would be 900 * (4 / 36) = 100
> Note that the expected value may not be a
whole number (nor should it necessarily be)
> Given 500 rolls, the expected number of 9s is
500 * (4 / 36) ≈ 55.55
– Note: You should be familiar with the general
approach above from CS 0441
> We will be looking at some more complex
analytical models
13 later on
Introduction to Simulation

• Don't use a simulation if it is easier or cheaper


to experiment directly on a real system
– Ex: A 24 hour supermarket manager wants to
know how to best handle the cash register
during the "midnight shift":
> Have one cashier at all times
> Have two cashiers at all times
> Have one cashier at all times, and a second
cashier available (but only working as cashier if
the line gets too long)
– Each of these can be done during operating
hours
> An extra employee can be used to keep track of
queue data (and would not be too expensive)
> Differences are (likely) not that drastic so that
customers will be alienated
14
Introduction to Simulation

• Don't use a simulation if the system is too


complex to model correctly / accurately
– This is often not obvious
– Can depend on cost and alternatives as well
– Ex: Simulation of damage to the space shuttle –
results were disputed but what was the
alternative?

15
Some Definitions

• System
"A group of objects that are joined
together in some regular interaction or
interdependence toward the
accomplishment of some purpose"
(Banks et al)
• Note that this is a very general definition
• We will represent this system in our simulation
using variables (objects) and operations
The state of a system is the variables
(and their values) at one instance in time

16
Some Definitions

• Discrete vs. Continuous Systems


Discrete System
• State variables change at discrete points in
time
– Ex: Number of students in CS 1538
> When a registration or add is completed, number
of students increases, and when a drop is
completed, number of students decreases

Continuous System
• State variables change continuously over time
– Ex: Volume of CO2 in the atmosphere
> CO2 is being generated via people (breathing),
industries and natural events and is being
consumed by plants
17
Some Definitions

– Models of continuous systems typically use


differential equations to indicate rate of change
of state variables
– Note that if we make the time increment and
the unit of measurement small enough, we may
be able to convert a continuous system into a
discrete one
> However, this may not be feasible to do
> Why?
– Also note that systems are not necessarily
exclusively discrete or exclusively continuous
• We will be primarily concerned with
Discrete Systems in this course

18
Some Definitions

• System Components
Entities
• Objects of interest within a system
– Typically "active" in some way
– Ex: Customers, Employees, Devices, Machines,
etc
• Contain attributes to store information about
them
– Ex: For Customer: items purchased, total bill
• May perform activities while in the system
– Ex: For Customer: shopping, paying bill
– In normal cases it is really just the period of
time required to perform the activity
• Note how nicely this
19
meshes with object-
oriented programming
Some Definitions

Events
• Instantaneous occurrences that may change
the state of a system
– Note that the event itself does not take any
time
– Ex: A customer arrives at a store
– Note that they "may" change the state of the
system
> Example of when they would not?
• Endogenous event
– Events occurring within the system
– Ex: Customer moves from shopping to the
check-out
• Exogenous event
– Events relating 20
/ connecting the system to the
outside
Some Definitions

• System Model
A representation of the system to be used /
studied in place of the actual system
• Allows us to study a system without actually
building it (which, as we discussed previously,
could be very expensive and time-consuming to
do)
Physical Model
• A physical representation of the system (often
scaled down) that is actually constructed
– Tests are then run on the model and the results
used to make decisions about the system
– Ex: Development of the "bouncing bomb" in WWII
> http://www.bbc.co.uk/dna/ww2/A2163827
> http://www.computing.dundee.ac.uk/staff/irmurray/bigbounc.asp
21
Some Definitions

Mathematical Model
• Representing the system using logical and
mathematical relationships
• Simulations are run using the mathematical
model, and, assuming it is valid, the results
are interpreted for the system in question
• Simple ex: d = vot + ½ at2
– This equation can be used to predict the
distance traveled by an object at time t
– However, will acceleration always be the same?
• More often this model is fairly complex and
defined by the entities and events
• So this is the model we will be using
22
Some Definitions

• Analytical evaluation
– If the model is not too complex we can
sometimes solve it in a closed form using
analytical methods
– One type of analytical evaluation is the Markov
process (or Markov chain)
– Nice simple example at:
http://en.wikipedia.org/wiki/Examples_of_Markov_chains

– We will see this more in Section 6.4


– Often problems that are too complex, even if
they can be modeled analytically, are too
computation intensive to be practical
• Simulation evaluation
– More often we need to simulate the
behavior of the model
23
Some Definitions

Deterministic Model
• Inputs to the simulation are known values
– No random variables are used
– Ex: Customer arrivals to a store are monitored
over a period of days and the arrival times are
used as input to the simulation
Stochastic Model
• One or more random variables are used in the
simulation
– Results can only be interpreted as estimates (or
educated guesses) of the true behavior of the
system
– Quality of the simulation depends heavily on
the correctness of the random data distribution
> Different situations may require different
24
distributions
Some Definitions

– Ex: Customers arrive at a store with


exponentially distributed interarrival times
having a mean of 5 minutes
• In most cases we do not know all of the input
data in advance, and at least some random
data is required
– Thus, our simulations will typically use
the stochastic model

25
Some Definitions

Static Model
• Models a system at a single point in time,
rather than over a period of time
• Sometimes called Monte Carlo simulations
– We'll briefly discuss these shortly
Dynamic Model
• Models a system over time
• Our simulations will typically use this model
• In summary our models will typically
be: discrete, mathematical,
stochastic and dynamic

26
The Clock

• Since we are using the dynamic


model, we need to represent the
passage of time
We need to use a clock
Three fundamental approaches to time
progression
• Next-event time advance
– Clock initialized to zero
– As the times of future events are determined,
they are put into the future event list (FEL)
– Clock is advanced to the time of the next most
imminent event, the event is executed and
removed from the list
– See example in 27 Section 3.1.1
The Clock

Ex: People (P) using a MAC machine


• Event A == arrival of a customer at MAC
machine
Cloc • Event C == completion of a transaction by a
FEL
customer Event Action
k
P1 arrives, is served; Events A2
0 (A2,t1), (C1,t2) A1 and C1 generated, placed in FEL
P2 arrives, waits; Event A3
t1 (C1,t2), (A3,t3) A2
generated, placed in FEL
P1 completes; P2 is served; Event
t2 (A3,t3), (C2,t4) C1 C2 generated, placed in FEL
P3 arrives, waits; Event A4
t3 (A4,t5), (C2,t4) A3 generated, placed in FEL (note:
t5<t4)
P4 arrives, waits; Event A5
t5 (C2,t4), (A5,t6) A4
generated, placed in FEL
P2 completes; P3 is served; Event
t4 (A5,t6), (C3,t7) C2 C3 generated, place in FEL
28
The Clock

• Fixed-increment time advance (activity


scanning)
– Clock initialized to zero
– Clock is incremented by a fixed amount (ex. 1)
– With each increment, list of events is checked
to see which should occur (could be none)
– Clock is typically easier to implement in this
way
– However, execution is less efficient
> Potentially many scans for each event
• Process-interaction approach
– Entities are associated with processes
– Processes interact as entities progress through
system
– Could delay while waiting for a resource, or
29
during an interaction with another process
Simple Example

• Let's consider a very simple example:


Single-Channel Queue (Example 2.1 in
text)
• Small grocery store with a single checkout
counter
• Customers arrive at the checkout at random
between 1 and 8 minute apart (uniform)
• Service times at the counter vary from 1 to 6
minutes
– P(1) = 0.1, P(2) = 0.2, P(3) = 0.3, P(4) = 0.25
P(5) = 0.1, P(6) = 0.05
• Start with first customer arriving at time 0
• Run for a given number of customers (text
uses 100) 30
• Calculate some results that may be useful
Simple Example

The entities are the customers


The system is discrete since states are
changed at specific points in time
• ex: a customer arrives or leaves
The model is mathematical (since we
don't have real customers)
The model is stochastic since we are
generating random arrivals and random
service times
The model is dynamic since we are
progressing in time

31
Simple Example

What results are we interested in?


• In this simple case we may want to know
– What fraction of customers have to wait in line
– What is the average amount of time that they
wait
– What is the fraction of time the cashier is idle
(or busy)
• We probably want to do several runs and get
cumulative results over the runs (ex:
averages)
• There are more complex statistics that may be
relevant
– We will discuss some of these later

32
Simple Example

We can program this example, but in this


simple case we could also use a table or
spreadsheet to obtain our results
• Let's first look at an "Excel novice" approach
to this
– See sim1.xls
• Although some of the spreadsheet formulas
require some thought, this is fairly simple to
do
• Note that each row in the spreadsheet
depends only on some local data (generated in
that row) and the data in the previous row
– We do not need a "memory" of all rows
• Authors have a much nicer spreadsheet with
macros 33
Programming a Simple Example

• If we do program it, how would we do


it?
Using Java, it is logical to do it in an
object-oriented way
Let's think about what is involved
• We need to represent our entities
– As text indicates, for this simple example we do
not have to explicitly represent them
– However, we can do it if we want to – and have
our Customers and CheckOut as simple Java
objects
• We need to represent our events
– We need to store events in our Future Event
List (FEL) and we
34 have two different kinds of
events (arrival of a customer, finish of a
Programming a Simple Example

> We need to distinguish between the different


event types (since different actions are taken for
different events)
> We need to order our events based on the
simulation clock time that they will occur
– Thus we probably need to explicitly represent
the events in some way
> Use classes and inheritance to represent the
different events
> This enables events to share characteristics but
also to be distinguished from each other
> So we need a event time instance variable and a
method to compare event times
> Look at SimEvent.java, ArrivalEvent.java,
CompletionEvent.java

35
Priority Queue to Represent the FEL

• We need to represent the FEL itself


– Since we are inserting items and then removing
them based on priority (earliest next time of an
event is removed first), we should use a priority
queue (PQ) with the following operations:
> add (Object e) – add a new Object to the PQ
> remove() – remove and return the Object with
the min (best) priority value
> peek() – return the Object with the min (best)
priority value without removing it
– It's also a good idea to have some helper
methods
> size() – how many items are in the PQ
> isEmpty() – is the PQ empty
– There are variations of these ops depending on
the implementation, but the idea is the same
36
Priority Queue to Represent the FEL

– How to efficiently implement a Priority Queue?


> How about an unsorted array or linked list?
> add is easy but remove is hard – why? – discuss
> How about a sorted array or linked list?
> removeMin is easy but add is hard – why? –
discuss
– Neither implementation is adequate in terms of
efficiency
> Note that the premise of a PQ is that everything
that is inserted is eventually removed
> Thus, with N adds you have N removes
> Discuss / show on board overall time required for
both implementations
> You may have seen this already in CS 1501
– Thus we need a better approach
> Implementation of choice is the Heap
37
Heap Implementation of a Priority Queue

– Idea of a Heap:
> Store data in a partially ordered complete binary
tree such that the following rule holds for EACH
node, V:
Priority(V) betterthan Priority(LChild(V))
Priority(V) betterthan Priority(RChild(V))
> This is called the HEAP PROPERTY
> Note that betterthan here often means smaller
> Note also that there is no ordering of siblings –
this is why the overall ordering is only a partial
ordering 10
– ex: 30 20

40 70 85 35

90 45 80
38
Heap Implementation of a Priority Queue

– How to do our operations?


> peek() is easy – return the root
> add() and remove() are not so obvious
> Let's look at them separately
– add(Object e)
> We want to maintain the heap property
> However, we don't know where in advance the
new object will end up
> We also don't want a lot of rearranging or
searching if we can avoid it – remember time is
key
> Solution: Add new object at the next open leaf in
the last level of the tree, then push the node UP
the tree until it is in the proper location
> This operation is called upHeap
> See example on board

39
Heap Implementation of a Priority Queue

– remove()
> Clearly, the min node is the root
> However, removing it will disrupt the tree greatly
> How can we solve this problem?

• Remember BST delete?


– Did not actually delete the root, but
rather the _______________ (fill in blank)
• We will do a similar thing with our Heap
– Copy the last leaf to the root and delete
(easily) the leaf node
– Then re-establish the heap property by a
downHeap
– See example on board

40
Heap Implementation of a Priority Queue

– Run-Time?
> Since our tree is complete, it is balanced and
thus for N nodes has a height of ~ lgN
> Thus upHeapand downHeap require no more
than ~lgN time to complete
> Thus, if we have N adds and N removeMins, our
total run-time will be NlgN
> This is a SIGNIFICANT improvement of the
simpler implementations, especially for a long
simulation
> Ex: Compare N2 with NlgN for N = 1M (= 220)
– Note:
> For our simple example, a heap is probably not
necessary, since we have few items in our FEL at
any given time
> However, for more complex simulations, with
many different event types, a heap is definitely
preferable 41
Implementing a Heap

– How to Implement a Heap?


> We could use a linked binary tree, similar to that
used for BST
Will work, but we have overhead
associated with dynamic memory
allocation and access
> But note that we are maintaining a complete
binary tree for our heap
> It turns out that we can easily represent a
complete binary tree using an array
We simply must map the tree locations
onto the array indexes in a reasonable /
consistent way
– Idea:
> Number nodes row-wise starting at 0 (some
implementations start at 1)
> Use these numbers as index values in the array
42
Implementing a Heap

> Now, for node at index i

Parent(i) = floor((i-1)/2)
LChild(i) = 2i+1
RChild(i) = 2i+2

> See example on board


– Now we have the benefit of a tree structure
with the speed of an array implementation
• So now should we write the code?
– No! Luckily, in JDK 1.5 a heap-based
PriorityQueue class has been provided!
– It's still a good idea to understand the
implementation, however
– Look at API
43
Queue for Waiting Customers

• We need to represent the queue (or line) of


customers waiting at the checkout
– This is a FIFO queue and can simply be
implemented in various ways
> We can use a circular array
> We can use a linked-list
– You should be already familiar with queue
implementations from CS 0445
– In JDK 1.5 Queue is an interface which is
implemented by the LinkedList class
> See API
> Q: Would a similar approach using an ArrayList
also be good?

44
Programming a Simple Example

• We need to represent the clock


– This is fairly easy – we can do it with an integer
> In some cases it might be better to use a double
• We need to implement some activities
– These are actually better defined as the time
required for activities to execute
– Typically interarrival times or service times,
either specified exactly (with deterministic
model) or by probability distributions (with
stochastic model)
> In our case, we have the interarrival times of
customers and the time required for checkout,
specified by the distributions shown on p. 28 of
the text
> We will discuss various distributions in more
detail later
45
Programming a Simple Example

Let's put this all together:


GrocerySim.java
• This is a fairly object-oriented implementation,
using newer JDK 1.5 features
Note that there is also a Java version from
authors in Chapter 4
• Look over this one as well
• Does not utilize JDK 1.5 and not quite as
object-oriented
• The author also switches distributions in this
implementation
– Uses an exponential distribution for arrivals
– Uses a normal distribution for service times
> We will look at these later
46
One More Example

• Newspaper Seller's Problem


Example 2.3 in text
Simple inventory problem
• Each day new inventory is produced and used,
but is not carried over to successive days
• Thus, time is more or less removed from this
problem
Used where goods are only useful for a
short time
• Ex: newspaper, fresh food
In this case, our goal is to try to optimize
our profit
47
Newspaper Seller's Problem

Specifics of the Newspaper Seller's


Problem
• Seller buys N newspapers per day for 0.33
each
• Seller sells newspapers for 0.50 each
• Unused papers are "scrapped' for 0.05 each
• If seller runs out, lost revenue is 0.17 for each
not sold paper
– Text says this is controversial, which is true
– How to predict how many would have been
sold?
> Perhaps seller goes home when he/she runs out
> May be a goal to run out every day – easier than
returning the papers for scrap
• See sim2.xls 48
Newspaper Seller's Problem

In fact we do we really need to simulate


this problem at all?
• The data is simple and highly mathematical
• Time is not involved
Let's try to come up with an analytical
solution to this problem
• We have two distributions, the second of
which utilizes the result of the first
• Let's calculate the expected values for
random variables using these distributions
– For a given discrete random variable X, the
expected value,
E(X) = Sum [xi p(xi)] (more soon in Chapter 5)
all i
49
Newspaper Seller's Problem

• Let our random variable, X, be the number of


newspapers sold
– Let's first consider the expected value for each
of the demands of good, fair and poor

Demand Probability Distribution


Demand Good Fair Poor
40 0.03 0.10 0.44
50 0.05 0.18 0.22
60 0.15 0.40 0.16
70 0.20 0.20 0.12
80 0.35 0.08 0.06
90 0.15 0.04 0.00
100 0.07 0.00 0.00
50
Newspaper Seller's Problem

• Egood(X) = (40)(0.03) + (50)(0.05) + (60)(0.15)


+
(70)(0.20) + (80)(0.35) + (90)(0.15)
+
(100)(0.07) = 75.2
• Efair(X) = (40)(0.10) + (50)(0.18) + (60)(0.40)
+
(70)(0.20) + (80)(0.08) + (90)(0.04)
+
(100)(0.00) = 61
• Epoor(X) = (40)(0.44) + (50)(0.22) + (60)(0.16)
+
(70)(0.12) + (80)(0.06) + (90)(0.00)
+
51
(100)(0.00) = 51.4
Newspaper Seller's Problem

• E(X) = (Egood(X))(0.35) + (Efair(X))(0.45) +


(Epoor(X))(0.20) = 64.05
Now we utilize the expected number of
newspapers sold to find results for each
of the potential number that we stock
• Let sales = expected value calculated above
• Let stock = number vendor purchases
• Let left = stock – sales (only if stock > sales,
else 0)
• Let lost = sales – stock (only if sales > stock,
else 0)
• Profit = (Min(sales,stock))(0.5) – (stock)(0.33)
+
(left)(0.05)
52 – (lost)(0.17)
Newspaper Seller's Problem

Stock Profit
40 2.71 Expected profit
values for given
50 6.11 stock anoumts
60 9.51 Note that this table
shows that 60 is
70 9.2 the best choice
(more or less
80 6.4
agreeing with the
90 3.6 simulation results)

100 0.82

53
Newspaper Seller's Problem

Is this analytical solution correct?


• Not entirely
• We are using an expected value to derive
another expected value – oversimplifying the
actual analysis
• The variance from the expected value will
cause our actual results to differ
• Note that the simulation results are almost
identical to the analytical for small and large
inventories
• In the middle there is more variation and this
is where using the expected value is
inadequate
• However, as a basis for choosing the best
number of papers54to stock, it still works
Other Simulation Examples

There are other examples in Chapters 2


and 3
• Read over them carefully
• We may look at some of these types of
simulations later on in the term

55
Simulation Software

• Simulations can be written in any


good programming language
• However, many things that need to be
done in simulations can be built into
languages to make them easier
Random values from various probability
distributions
Tools for modeling
Tools for generating and analyzing output
Graphical tools for displaying results

56
Simulation Software

Look at the various described languages


Our simple queueing example (Example
2.1) is shown using many of the
languages
• Even if you don't completely understand all of
the code, look it over to note some differences
We may look at one of these packages
later in the term if we have time

57
Probability and Statistics in Simulation

• Why do we need probability and


statistics in simulation?
Needed to validate the simulation model
Needed to determine / choose the input
probability distributions
• Needed to generate random samples / values
from these distributions
Needed to analyze the output data /
results
Needed to design correct / efficient
simulation experiments

58
Experiments and Sample Space

• Experiment
A process which could result in several
different outcomes
• Sample Space
The set of possible outcomes of a given
experiment
• Example:
Experiment: Rolling a single die
Sample Space: {1, 2, 3, 4, 5, 6}
• Another example?
59
Random Variables

• Random Variable
A function that assigns a real number to
each point in a sample space
Example 5.2:
• Let X be the value that results when a single
die is rolled
• Possible values of X are 1, 2, 3, 4, 5, 6
• Discrete Random Variable
A random variable for which the number
of possible values is finite or countably
infinite
• Example 5.2 above is discrete – 6 possible
values 60
Random Variables and Probability Distribution

• Countably infinite means the values can be


mapped to the set of integers
– Ex: Flip a coin an arbitrary number of times.
Let X be the number of times the coin comes
up heads

• Probability Distribution
For each possible value, xi, for discrete
random variable X, there is a probability
of occurrence, P(X = xi) = p(xi)
p(xi) is the probability mass function
(pmf) of X, and obeys the following

rules:
all i
p( xi )

– p(xi) >= 0 for all i


61
– =1
Random Variables and Probability Distribution

The set of pairs (xi, p(xi)) is the probability


distribution of X
Examples:
• For Example 5.2 (assuming a fair die):
– Probability Distribution:
> {(1, 1/6), (2, 1/6), (3, 1/6), (4, 1/6), (5, 1/5), (6,
1/6)}
• From Example 2.1 for Service Times
– Probability Distribution:
> {(1, 0.1), (2, 0.2), (3, 0.3), (4, 0.25), (5, 0.1), (6,
0.05)}
• From Example 2.3 for Type of Newsday
– Probability Distribution:
> {(0, 0.35), (1, 0.45), (2, 0.20)}
> Note in this case
62 we are assigning the values 0, 1,
Cumulative Distribution

• Cumulative Distribution Function


The pmf gives probabilities for individual
values xi of random variable X
The cumulative distribution function (cdf),
F(x), gives the probability that the value
of random variable X is <= x, or
F(x) = P(X <= x)
For a discrete random variable, this can
be calculated simply by addition:
F(x) = ∑ p ( xi )
xi ≤ x

63
Cumulative Distribution

Properties of cdf, F:
1) F is non-decreasing
2) lim ( F ( x) ) = 1
x →∞
3) lim ( F ( x)) = 0
x → −∞
and
P(a < X ≤ b) = F(b) – F(a) for all a < b
Ex: Probability that a roll of two dice will
result in a value > 7?
• Discuss
Ex: Probability that 10 flips of a fair coin
will yield between 6 and 8 (inclusive)
heads?
• Discuss 64
Expected Value

• Expected Value (for discrete random variables)


E ( X ) = ∑ xi p ( xi )
all i

Also called the mean


Ex: Expected value for roll of 2 fair dice?
E(X) = (2)(1/36) + (3)(2/36) + (4)(3/36) + (5)(4/36)
+ (6)(5/36) + (7)(6/36) + (8)(5/36) + (9)(4/36) +
(10)(3/36) + (11)(2/36) + (12)(1/36)
=7
• Note that in this case the expected value is an
actual value, but not necessarily

65
Expected Value and Variance

If each value has the same "probability",


we often add the values together and
divide by the number of values to get the
mean (average)
• Ex: Average score on an exam
• Variance
V ( X ) = E[( X − E[ X ]) 2 ]
= E ( X 2 ) − [ E ( X )]2 ( Equation 5.10 )

We won't prove the identity, but it is


useful

66
Expected Value and Variance

• In the original definition, we need to subtract


the mean from each of the X values before
squaring
– So we need each X value to calculate the mean
AND AFTER the mean has been calculated
> Must look at them twice
• In the right side of the equation (Equation 5.10
in the text), we need to calculate the mean of
X and the mean of the squares of X
– We can do this as we process the individual X
values and need to look at them only one time
• Ex: What is the variance of the following group
of exam scores: { 75, 90, 40, 95, 80 }
– Since each value occurs once, we can consider
this to have a uniform distribution
67
Expected Value and Variance

• V(X) using original definition:


E(X) = (75+90+40+95+80)/5 = 76
V(X) = E[(X – E[X])2] = [(75-76)2 + (90-76)2 + (40-
76)2
+ (95-76)2 + (80-76)2]/5
= (1 + 196 + 1296 + 361 + 16)/5 = 374
• V(X) using Equation 5.10
E(X) = (75+90+40+95+80)/5 = 76
E(X2) = (5625+8100+1600+9025+6400)/5 =
6150
V(X) = 6150 – (76)2 = 374
– Note that in this case we can add each number
to one sum and its square to another, so we
can calculate our overall answer with one a
single "look" at each number
68
Discrete Distributions

• Discrete Distributions of interest:


Bernoulli Trials and the Bernoulli
Distribution
• Consider an experiment with the following
properties
– n independent trials are performed
– each trial has two possible results – success or
failure
– the probability of success, p and failure, q (= 1
– p) is constant from trial to trial
– for random variable X, X = 1 for a success and
X = 0 for a failure
• Probability Distribution:
P(X = 1) = p
69
P(X = 0) = 1 – p = q
Bernoulli Distribution

• Expected Value
– E(X) = (0)(q) + (1)(p) = p
• Variance
– V(X) = [02q + 12p] – p2 = p(1 – p)
A single Bernoulli trial is not that
interesting
• Typically, multiple trials are performed, from
which we can derive other distributions:
– Binomial Distribution
– Geometric Distribution

70
Binomial Distribution

Binomial Distribution
• Given n Bernoulli trials, let random variable X
denote the number of successes in those trials
• Note that the order of the successes is not
important, just the number of successes
– Thus, we can achieve the same number of
successes in various different ways
 n  x n − x
  p q , x = 0,1,  , n
p ( x) =  x 
0, otherwise

– Since the trials are independent, we can


multiply the probabilities for each trial to get
the overall probability for the sequence
71
Binomial Distribution

• Recall that the number of combinations of n


items taken x at a time is
 n n!
  =
 x  x!(n − x)!

• E(X) = np
– Discuss
• V(X) = npq
• Consider an example:
– Exercise 5.1
– Read
– Do solution on board

72
Binomial Distribution

• Consider again coin-flip ex. on slide 63


• Generally speaking binomial distributions can
be used to determine the probability of a
given number of defective items in a batch, or
the probability of a given number of people
having a certain characteristic
– Ex: The trait of having a klinkled flooje occurs
on average in 10% of Kreptoplomians (krep-tō-
plō'-mē-əns). Given a group ot 20
Kreptoplomians, what is the probability that 3
of them have klinkled floojes?
P(3) = (20 C 3)(0.1)3(0.9)17 =
(1140)(0.001)(0.1668)
= 0.1902

73
Geometric Distribution

Geometric Distribution
• Given a sequence of Bernoulli trials, let X
represent the number of trials required until
the first success
q p , x = 1,2, 
x −1
p( x) = 
0 , otherwise

– i.e. we have x – 1 failures, followed by a


success
– Note that the maximum probability for this is at
X = 1, regardless of p and q
• E(X) = 1/p
• V(X) = q/p2
– We will omit the proofs of the above, since they
are fairly complex
74 (involving series solutions)
Geometric Distribution

• Ex: What is the probability that the first


Kreptoplomian found to have a klinkled flooje
will be the 5th Kreptoplomian overall?
(0.9)4(0.1) = 0.0656
• Ex: The probability that a certain computer will
fail during any 1-hour period is 0.001
– What is the probability that the computer will
survive at least 3 hours?
> Here p = 0.001 and q = (1 – p) = 0.999
– Using a geometric distribution, we want to
solve
P(X >= 4) = 1 – P(1) – P(2) – P(3)
= 1 – (0.001) – (0.999)(0.001)
– (0.999)2(0.001)
= 0.997
75
Geometric Distribution

• The Geometric Distribution is memoryless


– Consider the following two scenarios where p
= probability that a component will fail in the
next hour. Assume the current hour is hour 0.
> What is the probability that the component will
fail by the end of hour 3?
> What is the probability that the component will
fail by the end of hour 6, given that it has not
failed by the end of hour 3 ?
> For 1) the solution is P(1) + P(2) + P(3)
> For 2), since the component did NOT fail by the
end of hour 3, and since the probability is for
the next hour (whatever that hour may be), the
solution is the same
– We can prove this property with fairly simple
algebra
> First we need one additional definition
76
Geometric Distribution

• The conditional probability of an event, A,


given that another event, B, has occurred is
defined toPbe:
( A ∩ B)
P( A | B) =
P( B)

• Applying this to the Pgeometric


([ X > s + tdistribution
] ∩ [ X > s ]) we
( X > s + t | X > s) =
Pget
P( X > s)

P( X > s + t )
P–( X > s + tif| XX>>s+t,
Clearly, s ) =then X > t, so we get
P( X > s)

77
Geometric Distribution

– Consider that P(X > s) =

∑ (q )
∞ ∞ ∞

∑q
j = s +1
j −1
p = ∑q
j = s +1
j −1
(1 − q ) =
j = s +1
j −1
−qj =

(q s
) ( ) (
− q s +1 + q s +1 − q s + 2 + q s + 2 − q s +3 +  = q s )
– We can use similar logic to determine that
P(X > s + t) = qs+t
– Now our conditional probability becomes
q s +t
P( X > s + t | X > s) = s = q t = P( X > t )
q
– and thus we have shown that the geometric
distribution is memoryless
> We will see shortly that the exponential
distribution is also memoryless
78
Poisson Distribution

Poisson Distribution
• Often used to model arrival processes with
constant arrival rates
• Gives (probability of) the number of events
that occur in a given period
• Formula looks quite complicated (and NOT
discrete), but it is discrete and using it is not
 e −αα x
that difficult
 , x = 0,1, 
p ( x) =  x!
0 , otherwise

• Where α is the mean arrival rate. Note that α


must be positive
• E(X) = V(X) = α
79
Poisson Distribution

x
e −α α i
F ( x) = ∑
i =0 i!

• Note: The Poisson Distribution is actually the


convergence of the Binomial Distribution as
the number of trials, n, approaches infinity
– If we think of n as the number of subintervals of
a given unit of time
– As n ∞, the subintervals get smaller and
smaller
– We will skip the detailed math here
– One nice feature of this is that we can use a
Poisson Distribution to approximate a Binomial
Distribution when n is large
80
Poisson Distribution

• Example
– Number of people logging onto a computer per
minute is Poisson Distributed with mean 0.7
– What is the probability that 5 or more people
will log onto the computer in the next 10
minutes?
– Solution?
>Must first convert the mean to the 10 minute
period – if mean is 0.7 in 1 minute, it will be
(0.7)(10) = 7 in a ten minute period
>Now we can plug in the formula
>P(X >= 5) = 1 – P(0) – P(1) – P(2) – P(3) – P(4)
= 1 – F(4) (where F is the cdf)
= 1 – 0.173 (from Table A.4 in the text)
= 0.827

81
Continuous Random Variables

• Continuous Random Variable


Random variable X is continuous if its
sample space is a range or collection of
ranges of real values
More formally, there exists non-negative
function f(x), called the probability density
function, such that for any set of real
numbers, S

a) f(x) x) dx0=for
f ( >= 1 all x in the range space
b)
range space – the total area under f(x) is 1

d) f(x) = 0 for all x not in the range space


– Note that f(x) does NOT give the probability that
82
X=x
Continuous Random Variables

• The probability that X lies in a given interval


[a,b] is b
P (a ≤ X ≤ b) = ∫ f ( x)dx
a

– We see this visually as the "area under the


curve"
– Note that for continuous random variables,
P(X = x) = 0 for any x (see from formula
above)
– Rather we always look at the probability of x
withinx a given range (although the range could

) =very small)
F ( xbe
−∞
f (t )dt
• The cumulative density function (cdf), F(x) is
simply the integral from -∞ to x or

83
Continuous Random Variables

• Ex: Consider the uniform distribution on the


range [a,b] (see text p. 166)
 1
 if a ≤ x ≤ b
f ( x) =  b − a
0 otherwise
x x 1 x−a
F ( x) = ∫ f ( y )dy = ∫ dy = if a ≤ x ≤ b
a a b−a b−a
– Look at plots on board for example range [0,1]
> What about F(x) when x < a or x > b?
Expected Value for a continuous random variable

E ( X ) = ∫ xf ( x)dx
−∞

– Compare to the discrete expected value

84
Continuous Random Variables

Variance for continuous random variables


• Defined in same way as for discrete variables
– Calculating it will clearly be different, however
Ex: Uniform Distribution
b x b x2 b2 − a 2
E( X ) = ∫ dx = = =
a b−a a 2(b − a ) 2(b − a )
(b + a )(b − a ) b + a
=
2(b − a ) 2

V ( X ) = E ( X 2 ) − [ E ( X )]
2
( Eq 5.10)
x2 b x3
dx − [ E ( X )] =
b
=∫
2
− [ E ( X )]2
a b−a a 3(b − a )

85
Continuous Random Variables
2
b x 3
b −a3 3
 b −a 
2 2
V (X ) = − [ E ( X )] =
2
− 
a 3(b − a ) 3(b − a )  2(b − a ) 
2
b −a
3 3
 (b + a )(b − a )  b 3 − a 3 (b + a ) 2 (b − a ) 2
= −  = −
3(b − a )  2(b − a )  3(b − a ) 4(b − a ) 2
4(b 3 − a 3 ) 3(b + a ) 2 (b − a )
= −
12(b − a ) 12(b − a )
4b 3 − 4a 3 − 3[b 2 + 2ab + a 2 ](b − a )
=
12(b − a )
4b 3 − 4a 3 − 3b 3 − 6ab 2 − 3a 2b + 3ab 2 + 6a 2b + 3a 3
=
12(b − a )
b 3 − a 3 − 3ab 2 + 3a 2b (b − a ) 3 (b − a ) 2
= = =
12(b − a ) 12(b − a ) 12
86
Uniform Distribution

• Generally speaking we will not be calculating


these values from scratch (good news, in all
likelihood!)
– However, it is good to see how it can be done
for at least one (simple) distribution
The Uniform Distribution will be useful
primarily in the generation of other
distributions
• Ex: Most random number generators on
computers minimally will provide a uniform
value from [0,1)
– We will later see how that can be used to
generate other random variates (See Chapter
8)
87
Exponential Distribution

Given a value λ > 0, the pdf of the


Exponential Distribution is
λe − λx , x ≥ 0
f ( x) = 
0 , otherwise
• Since the exponent is negative, the pdf will
decrease as x increases – see shape on board
and p. 168
• λ is the rate: number of occurrences per time
unit
– Ex: arrivals per hour; failures per day
• Note that 1/λ can thus be considered to be the
time between events or the duration of events
– Ex: 10 arrivals per hour 1/10 hr (6min)
average between arrivals
– Ex: 20 customers88 served per hour 1/20 hr
Exponential Distribution

• Some more definitions


1 1
E( X ) = V (X ) = 2
λ λ
0 , x < 0
F ( x) =  x −λt
∫0
− λx
λe dt = 1 − e , x≥0
– See shape of cdf on board and p. 168
• Ex: Assume the hard drives manufactured by
Herb's Hard Drives have a mean lifetime of 4
years, exponentially distributed
a) What is the probability that one of Herb's Hard
Drives will fail within the first two years?
b) What is the probability that one of Herb's Hard
Drives will last at least 8 years (or twice the
mean)? 89
Exponential Distribution

• Like the geometric distribution, the


exponential distribution is memoryless
– Implies that P(X > s+t | X > s) = P(X > t)
– The proof of this is similar in nature to that for
the geometric distribution
> See pp. 169 in text
• Ex: Exercise 5.19 in the text
– Component has exponential time-to-failure
with mean 10,000 hrs
a) Component has already been in operation for its
mean life. What is the prob. that it will fail by
15,000 hrs?
b) Component is still ok at 15,000 hrs. What is the
prob. that it will operate for another 5,000 hrs
The first thing we need to do here is be sure we
understand the problem correctly
90
Exponential Distribution

– First, let's determine our distribution


> X is the time to failure, and is exponentially
distributed with mean 10,000 hours
> This gives us a cumulative distribution
−x
F ( x) = 1 − e 10000
, x>0
> Here the failure rate = λ = 1/10000
– a) is pretty clear – we want the probability that
it lasts at most 15,000 given that it has lasted
10,000
> We want P(X <= 15000 | X > 10000)
> Due to the memoryless property of the
exponential distribution, this reduces to P(X <=
5000) which is
1 – e-1/2 = 0.3935

91
Exponential Distribution

– b) is trickier (and actually not phrased well)


> Do they mean the prob. that it will last exactly
5000 more hours?
If so, the probability is 0, since continuous
distributions have 0 prob. for a specific
value
> Do they mean it will last at most 5000 more
hours?
If so, the answer is the same as a) due to
the memoryless property
> Do they mean it will last at least 5000 more
hours?
If so, we want 1 – F(5000) = 0.6065

92
Gamma Distribution

Gamma Distribution
• More general than exponential
• Based on the gamma function, which is a
continuous generalization of the the factorial
Γ( β ) = ( β − 1)Γ( β − 1)
function
= ( β − 1)! when β ∈ Ζ

base case Γ(1) = 1


βis called the shape parameter
– The gamma function also has θ, the scale
 βθ
parameter
β −1 − βθ x
 the (the
– This leads βθ following
x) e pdf: , x>0
f ( x) =  Γ( β )
0 , otherwise

93
Gamma Distribution
1 1
E ( x) = V (X ) =
θ βθ 2
 ∞ βθ
 ∫x
β −1 − βθ t
1 − ( β
θ t ) e dt , x > 0
F ( x) =  Γ( β )
0 , x ≤ 0

– These formulas look complicated (and they are)
– However, they allow for more flexibility in the
distributions
> Allow more different curves (See Fig. 5.10a)
– Note that if β=1, the equations simplify to the
exponential distribution with rate θ
> Look at the formulas to see this

94
Erlang Distribution

When βis an arbitrary positive integer, k,


the Gamma Distribution is also called the
Erlang Distribution of order k
• In general terms, it represents the sum of k
independent, exponentially distributed
variables, each with rate kθ (= λ)
X = X1 + X2 + … + Xk leading to
1 1 1 1
E( X ) = + ++ =
kθ kθ kθ θ
1 1 1 1
V (X ) = + ++ = 2
(kθ ) (kθ )
2 2
(kθ ) 2

 k −1 e − kθ x (kθ x) i
1 − ∑ , x>0
F ( x) =  i =0 i!
0 , x ≤ 0
 95
Erlang Distribution

• This allows us to determine probabilities for


sequences of exponentially distributed events
– Note that the rates for all events in the
sequence must be the same
• Ex: Exercise 5.21 in text
– Time to serve a customer at a bank is
exponentially distributed with mean 50 sec
a)Probability that two customers in a row will each
require less than 1 minute for their transaction
b)Probability that the two customers together will
require less than 2 minutes for their transactions
– It is important to recognize the difference
between these two problems

96
Erlang Distribution

– For a) we are looking at the probability that


each of two independent events will be < 1
minute
> In this case, the probability overall is the product
of the two probabilities, each of which is
exponential
λ = rate = 1/mean = 1/50
We want P(X < 60) = F(60) = 1 – e-(1/50)(60) =
0.6988
> Thus the total probability is (0.6988)2 = 0.4883
– For b) we are looking at the probability that two
events together will be < 2 minutes
> In this case the probability overall is an Erlang
distribution with k = 2 and kθ = 1/50, and we
want to determine P(X < 120) = F(120)
> Substituting into our equation for F, we get

97
Erlang Distribution
2 −1
e − (120 / 50 ) (120 / 50) i
F (120) = 1 − ∑
i =0 i!
= 1 − (0.0907) − (0.2177) = 0.6916
– Note that these results are fairly intuitive
> Requiring both to be < 1 minute is more
restrictive a condition than requiring the sum to
be < 2 minutes, and would seem to have a lower
probability
– How about if we add another part: Probability
that the next 3 customers will have a
cumulative time of more than 2.5 minutes?
> Now we want P(X > 150) = 1 – F(150)
> But F has changed since we now have 3 events
> Let's do this one on the board

98
Normal Distribution

A very common distribution is the Normal


Distribution
• It has some nice properties
lim x→−∞ f ( x) = 0 = lim x→∞ f ( x)
f ( µ + x) = f ( µ − x)
max( f ( x)) = f ( µ )
– Discuss these
• The pdf for the normal distribution is also quite
complex
– We won't even show it here
– However, we can use tables and another nice
property to allow solution for arbitrary normal
distributions
99
Normal Distribution

• Define φ(z) to be the normal distribution with


mean (µ) 0 and variance (σ2) 1
= N(0, 1)
– We call this the standard normal distribution
– This can be calculated using numerical
methods, and its cdf is typically provided in
tables in statistics (and simulation) textbooks
(see Table A.3 in text)
– We use the notation Z ~ N(0, 1) to mean that Z
is a random variable with a standard normal
distribution
• Naturally, most normal distributions of interest
will not be the standard normal distribution
– However, Eqs 5.42 & 5.43 in the text relate any
normal distribution to the standard normal
distribution in the
100
following way
Normal Distribution

– Given an arbitrary normal distribution, X ~ N(µ,


σ2), let Z = (X – µ)/σ
– Through Eq. 5.43, we know that
 x−µ 
F ( x) = Φ 
 σ 
> where Φ(z) is the cumulative density function for
the standard normal distribution φ
– Thus we can use the tabulated values of the
standard normal distribution to determine
probabilities for arbitrary normal distributions
– Ex:Student GPA's are approximately normally
distributed with µ=2.4 and σ=0.8. What
fraction of students will possess a GPA in
excess
Example ofFrom
is from 3.0?Mathematical Statistics with Applications,
Second Edition by Mendenhall, Scheaffer and Wackerly

101
Normal Distribution

– Let Z = (X – 2.4)/0.8
– We want the area under the normal curve with
mean 2.4 and standard deviation 0.8 where x >
3.0
This will be 1 – F(3.0)
F(3.0) = Φ [(3.0 – 2.4)/0.8] = Φ
(0.75)
– Looking up Φ (0.75) in Table A.3 we find 0.77337
– Recall that we want 1 – F(3.0), which gives us
our final answer of 1 – 0.77337 = 0.2266
• The idea in general is that we are moving from
the mean in units of standard deviations
– The relationship of the mean to standard
deviation is the same for all normal
distributions, which is why we can use the
method indicated
102
Other Distributions

There are a LOT of probability


distributions
• More in the text that we did not discuss
• Many others not in the text
For simulation, the idea for using them is:
• How well does the distribution of choice model
the actual distribution of events / times that
are relevant to our model
• The more possibilities and variations, the more
closely we can model our actual behavior
• However, we need to be able to determine if a
distribution fits observed data
– We will look at this in Chapter 9
103
Poisson Arrival Process

Before we finish Ch. 5, let's revisit the


Poisson Distribution
 e −αα x
 , x = 0,1,  x
e −αα i
p ( x) =  x! F ( x) = ∑
0 , otherwise i =0 i!

• In this case, α indicates the mean value, or


number of arrivals (total)
– Does not factor in arrivals over time
– However, this can be done, and in this case we
say the arrivals follow a Poisson Process
> In this case we are counting the number of
arrivals over time
– However, some rules must be followed

104
Poisson Arrival Process

1) Arrivals occur one at a time


2) The number of arrivals in a given time period
depends only on the length of that period and not
on the starting point
– i.e. the rate does not change over time
3) The number of arrivals in a given time period
does not affect the number of arrivals in a
subsequent period
– i.e. the number of arrivals in given periods are
independent of each other
– Discuss if these are realistic for actual "arrivals"
• We can alter the Poisson distribution to include
time  e − λt (λt ) n
– Only difference = 0,1, 
 is that λt, isnsubstituted for α
P [ N (t ) = n] =  n!
0 , otherwise

105
Poisson Arrival Process

• Just like the Poisson Distribution, V = E = α = λt


• In fact, if you look at the example from slide 81,
we are in fact using the Poisson Arrival Process
there
The Poisson Arrival Process has some nice
properties
• The three required from the previous slide
(obviously)
– These imply that the arrivals follow an
exponential distribution
• Random Splitting
– Consider a Poisson Process N(t) with rate λt
– Assume that arrivals can be divided into two
groups, A and B with probability p and (1-p),
respectively 106
> Show on board
Poisson Arrival Process

– In this case N(t) = NA(t) + NB(t)


– NA is a Poisson Process with rate λp and NB is a
Poisson Process with rate λ(1-p)
– Splitting can be used in situations where
arrivals are subdivided to different queues in
some way
> Ex: At immigration US citizens vs. non-US
citizens
• Pooled Process
– Consider two Poisson Processes N1(t) and N2(t),
with rates λ1 and λ2
– The sum of the two, N1,2(t) is also a Poisson
Process with rate λ1 + λ2
– Pooling can be used in situations where
multiple arrivals processes feed a single queue
107in New York City from many
> Ex: Cars arrive
Poisson Arrival Process

• Ex: Exercise 5.28 in text:


– An average of 30 customers per hour arrive at
the Sticky Donut Ship in accordance with a
Poisson process. What is the probability that
more than 5 minutes will elapse before both of
the next two customers walk through the door?
> As usual, the first thing is to identify what it is
that we are trying to solve.
> Discuss (and see Notes)
– [I added this part] If (on average) 75% of Sticky
Donut Shop's customers get their orders to go,
what is the probability that 3 or more new
customers will sit in the dining room in the next
10 minutes?
> Discuss

108
Brief Intro. to Monte Carlo Simulation

• We discussed previously that our


simulations will typically follow the
dynamic model
Progress over time
• Stochastic simulations using the static
model are often called Monte Carlo
Simulations
Idea is to determine some quantity / value
using random numbers that could be very
difficult to do by other means
• Ex: Evaluating an integral that has no closed
analytical form
109
Brief Intro. to Monte Carlo Simulation

• Before any formal definitions, let's


consider a simple example
Let's assume we don't know the formula
for the area of a circle, but we do know
the formula for the area of a square
We'd like to somehow find the area of a
circle of a given radius (let's say 1)

110
Brief Intro. to Monte Carlo Simulation

Let's generate a (large) number of


random points known to be within the
square
• We then test to see if each point is also within
the circle
– Since we know the circle has a radius of 1, we
can put its center at the origin and any random
point a distance <= 1 from the origin is within
the circle
• The ratio of points in the circle to total points
generated should approximate the ratio of the
area of circle to the area of the square
• We can then calculate the area of the circle by
multiplying the area of the square by that ratio
• See Circle.java 111
Brief Intro. to Monte Carlo Simulation

• Some informal theory behind M.C.


Empirical probability
• Consider a random experiment with possible
outcome C
• Run the experiment N times, counting the
number of C outcomes, NC
• The relative frequency of occurrence of C is
the ratio NC/N
• As N  ∞, NC/N converges to the probability of
C, or NC
p (C ) = lim
N →∞ N

112
Brief Intro. to Monte Carlo Simulation

Axiomatic probability
• Set theoretic approach that determines
probabilities of events based on the number of
ways they can occur out of the total number of
possible outcomes
• Gives the "true" probability of a given event,
whereas empirical probability only gives an
estimate (since we cannot actually have N be
infinity)
• However, for complex situations this could be
quite difficult to do
When axiomatic probability is not
practical, empirical probability (via Monte
Carlo sims) can often be a good
substitute 113
Let's Make a Deal

• Ex: Famous Let's Make a Deal problem


Player is given choice of 3 curtains
• One has a grand prize
• Other two are duds
After player chooses a curtain, Monty
shows one of the other two, which has a
dud
• Now player has option to keep the same curtain
or to switch to the remaining curtain
What should player do?
At first thought, it seems like it should not
matter
• However, it DOES 114
matter – player should always
Let's Make a Deal

We can look at this axiomatically


• Initially there is a 1/3 probability that the
player's choice is correct and 2/3 that it is
incorrect
• Revealing an incorrect curtain does not
change that probability, so if the user does not
switch his/her chance of winning is still 1/3
• However, now what do we know?
– There is a 2/3 chance that the winning curtain
is NOT the one originally picked
– Of that 2/3, there is a 0 chance that it is the
curtain already revealed
– Therefore, there is a 2/3 chance that the
remaining curtain is the winner, so we should
switch to it
115
Let's Make a Deal

In case we are still skeptical, we can


verify this result with a Monte Carlo
Simulation
• See MontyMonte.java
Note that the larger our number of trials,
the better our result agrees with the
axiomatic result

116
Monte Carlo Integration

• Let's apply this idea to another


common problem – evaluating an
integral
Many integrals have no closed form and
can also be very difficult to evaluate with
"traditional" numerical methods
How can we utilize Monte Carlo
simulation to evaluate these?
Let's look at this in a somewhat simplified
way (i.e. we will be light on the theory)

117
Monte Carlo Integration

Consider function f(x) that is defined and


continuous on the range [a,b]
• The first
mean value theorem for integral calculus states
that there exists some number c, with a < c < b
such that:
1 b
b−a ∫
f ( x)dx = f (c) or
a

b
∫a
f ( x)dx = (b − a ) f (c)

– The idea is that there is some point within the


range (a,b) that is the "average" height of the
curve
– So the area of the rectangle with length (b-a)
and height f(c) is the same as the area under the
curve 118
Monte Carlo Integration

So now all we have to do is determine f(c)


and we can evaluate the integral
We can estimate f(c) using Monte Carlo
methods
• Choose N random values x1, … , xN in [a,b]
• Calculate the average (or expected) value, ḟ(x)
in that range:
N
1
f ( x) =
N
∑ f ( x ) ≈ f (c )
i =1
i

• Now we can estimate the integral value as


b
∫a
f ( x)dx = (b − a) f ( x)
119
Monte Carlo Integration

There is some error in this, but as N  ∞


the error approaches 0
• It is inversely proportional to the square root
of N
• Thus we may need a fairly large N to get
satisfactory results
Let's look at a few simple examples
• In practice, these would be solved either
analytically or through other numerical
methods
• Monte Carlo methods are most useful for
multiple integrals that are not analytically
solvable
• See Monte.java
120
Simple Queueing Theory

• Many simulations involve use of one


or more queues
People waiting in line to be served
Jobs in a process or print queue
Cars at a toll booth
Orders to be shipped from a company
• Queueing Theory can get quite
complex
We are interested in a few of the more
important results / guidelines

121
Simple Queueing Theory

• First, we should define queue


characteristics in a consistent way
Standard Queueing Notation:
A/B/c/N/K
• where A is the interarrival time distribution
• where B is the service-time distribution
– A and B can follow any distribution (ex: the ones
we discussed in Chapter 5):
> D  Deterministic
Distribution is not random (ex: real data
that has been measured / calculated)

122
Simple Queueing Theory

> M  Exponential (Markov)


This is probably the most common and
most studied of the random distributions – more
on this below
> Ek  Erlang of order k
> G  General

So why is an exponential distribution


called Markov?
• Relates to Markov processes (continuous time)
and Markov chains (discrete time)
• Let's consider a Markov chain for now (it is
easier to conceptualize)

123
Simple Queueing Theory

• A set of random variables (or states) X1, X2, …


forms a Markov Chain if the probability that we
transition from state Xi to state Xi+1 does not
depend on any of the previous states X1, … Xi-1
– In other words, the past history of the chain
does not affect its future
– The idea is the same for a continuous time
Markov process
• Let's consider now a random variable Y that
describes how long it will be in one state
before transitioning to a different state
– For example, in a queueing system, this could
model how long before another arrival into the
system (which changes the system state)

124
Simple Queueing Theory

– This time should not depend on how long the


process has been in the current state
> i.e. it must be memoryless
– As we discussed in Chapter 5, the exponential
distribution is the only continuous distribution
that is memoryless
– Thus, when arrivals or services times are
exponentially distributed, they are often called
Markovian
– We will touch on a bit more of this theory later

– Now back to our description of the


terminology…
A/B/c/N/K

125
Simple Queueing Theory

• where c is the number of parallel servers


– A single queue could feed into multiple servers
• where N is the system capacity
– Could be "infinite" if the queue can grow
arbitrarily large (or at least larger than is ever
necessary)
> Ex: a queue to go up the Eiffel Tower
– Space could be limited in the system
> Ex: a bank or any building
> This can affect the effective arrival rate, since
some arrivals may have to be discarded
• where K is the size of the calling population
– How large is the pool of customers for the
system?
– It could be some relatively small, fixed size
> Ex: The computers
126 in a lab that may require
service
Simple Queueing Theory

– It could be very large (effectively infinite)


> Ex: The cars coming upon a toll booth
– The size of the calling population has an
important effect on the arrival rate
> If the calling population is infinite, customers
that are removed from the population and enter
the queueing system do not affect the arrival
rate of future customers (∞-1 = ∞)
> If the calling population is finite, removal of
customers from the population (and putting
them into and later out of the system) must
affect future arrival rates
Ex: In a computer lab with 10 computers, each
has a 10% chance of going down in a given day.
If a computer goes down, the repair takes a
mean of 2 days, exponentially distributed

127
Simple Queueing Theory

 In the first day, the expected number of failures is


10*0.1=1
 However, once a failure occurs, the faulty computer
is out of the calling population, so the expected
number of failures in the next day is 9 * 0.1 = 0.9
 Clearly this changes again if another computer fails
• What about the system you are testing in
Programming Assignment 1?
– We cannot actually classify it with this notation, due
to the single initial queue branching into multiple
queues for the toll booths
– However, if we consider only the toll booth queues,
we could make each queue M/M/1/10/∞
> However, since a single queue buffers all of the
individual queues, the arrival rate into the queues will
change if traffic backs up onto the single queue
128
Long-Run Measures of Performance

• What are some important queueing


measurements?
L = long-run average number of
customers in the system
LQ = long-run average number in queue
w = long-run average time spent in
system
wq = long-run average time spent in
queue
ρ = server utilization (fraction of time
server is busy)
129
Time-Average Number in System

• Let's discuss these in more detail


Time-Average Number in the System, L
• Given a queueing system operating for some
period of time, T, we'd like to determine
∧ the
time-weighted average number of peopleL in
the system,

• We'd also like to know the time-weighted
LQ
average number of people in the queue,
∧ ∧
– Note that for a single queue
L Q =with
L − 1a single
server, if the system is always busy,

> However, it is not the case when the server is


idle part of the time
– The "hats" indicate that the values are
"estimators" rather
130 than analytically derived
long-term values
Time-Average Number in System

• We can calculate
L for an interval [0,T] in a
fairly straightforward manner using a sum:

∧ 1
L=
T
∑iT
i =0
i

– Note that each Ti here represents the total time


that the system contained exactly i customers
> These may not be contiguous
– i is shown going to infinity, but in reality most
queuing systems (especially stable queuing
systems) will have all Ti = 0 for i > some value
> In other words, there is some maximum number
in the system that is never exceeded
– See GrocerySimB.java

131
Time-Average Number in System

Let's think of this value in another way:


• Consider the number of customers in the
system at any time, t
– L(t) = number of customers in system at time t
• This value changes as customers enter and
leave the system
• We can graph this with t as the x-axis and L(t)
as the y-axis
• Consider now the area under this plot from [0,
T]
– It represents the sum of all of the customers in
the system over all times from [0, T], which can
be determined
T with an integral
Area = ∫ L(t ) dt
0
132
Time-Average Number in System

• Now to get the time-average we just divide by


T,
∧ or ∞
1 1
L = ∑ iTi =
T

T i =0 T ∫0
L(t ) dt

– See on board

• For many stable systems, asLT ∞ (or,
practically speaking, as T gets very large)
approaches L, the long-run time-average
number of customers in the system
– However, initial conditions may determine how

large T must be before the long-run average
LQ is
reached
The same principles can be applied to ,
the time-average133number in the queue,
Average Time in System Per Customer

Average Time in System Per Customer, w


• This is also a straightforward calculation
during our simulations
N
∧ 1
w=
N
∑W
i =1
i

– where N is the number of arrivals in the period


[0,T]
– where each Wi is the time customer i spends in
the system during the period [0,T]
• If the system is stable, as N  ∞ , ŵ  w
– w is the long-run average system time
• We can do similar calculations for the queue
alone to get the values ŵQ and wQ
– We can think of these values as the observed
delay and the long-run
134 average delay per
Arrival Rates, Service Rates and Stability

We have seen a few times now


• "for stable systems…"
• What does this mean and what are the
implications?
For simple queueing systems such as
those we have been examining, stability
can be determined in a fairly easy way
• The arrival rate, λ, must be less than the
service rate
– i.e. customers must arrive with less frequency
than they can be served
• Consider a simple single queue system with a
single server (G/G/1/∞/∞)
– Define the service rate to be µ
135
– This system is stable if λ < µ
Arrival Rates, Service Rates and Stability

– If λ > µ, then, over a period of time there is a


net rate of increase in the system of λ – µ
> This will lead to increase in the number in the
system (L(t)) without bound as t increases
– Note if λ == µ some systems (ex: deterministic)
may be stable while others may not be
• If we have a system with multiple servers (ex:
G/G/c/∞/∞) then it will be stable if the net
service rate of all servers together is greater
than the arrival rate
– If all servers have the same rate µ, then the
system is stable if λ < cµ
• See GrocerySimB.java

136
Arrival Rates, Service Rates and Stability

• Note that if our system capacity or calling


population (or both) are fixed, our system can
be stable even if the arrival rate exceeds the
service rate
– Ex: G/G/c/k/∞
> With a fixed system capacity, in a sense the
system is unstable until it "fills" up to k. At this
point, excess arrivals are not allowed into the
system, so it is stable from that point on
> The idea here is that the net arrival rate
decreases once the system has filled
– Ex: G/G/c/∞/k
> With a fixed calling population, we are in effect
restricting the arrival rate
> As arrivals occur, the arrival rate decreases and
the system again becomes stable
137
Conservation Law

An important law in queueing theory


states
L = λw
• where L is the long-run number in the system,
λ is the arrival rate and w is the long-run time
in the system
– Discuss intuitively what this means
• Often called "Little's Equation"
• This holds for most queueing systems
• Text shows the derivation
– Read it but we will not cover it in detail here

138
Server Utilization

Server Utilization
• What fraction of the time is the server busy?
• Clearly it is related to the other measures we
have discussed (we will see the relationship
shortly)
• As with our other measures we can calculate

this 1 ∞a given system (G/G/1/∞/∞)
for
ρ=
T
∑ T
i =1
i

– We assume that if at least 1 customer is in the


system, the server will be busy (which is why
we start at T1 rather than T0)
• However, we can also calculate the server
utilization based on the arrival and service
rates 139
Server Utilization

G/G/1/∞/∞ systems
• Consider again the arrival rate λ and the
service rate µ
• Consider only the server (without the queue)
• With a single server, it can be either busy or
idle
• If it is busy, there is 1 customer in the "server
system", otherwise there are 0 customers in
the "server system" (excluding the queue)
– Thus we can define Ls = ρ = average number of
customers in the "server system"
• Using the conservation equation this gives us
– L s = λs w s
> where λs is the rate of customers coming into the
server and w140
s is the average time spent in the
Server Utilization

– For the system to be stable, λs = λ , since we


cannot serve faster than customers arrive and
if we serve more slowly the line will grow
indefinitely
– The average time spent in the server is simply
1/µ (i.e. 1/(rate of the server))
– Putting these together gives us
– ρ = Ls = λsws = λ (1/µ) = λ/µ
– Note that this indicates that a stable queueing
system must have a server utilization of less
than 1
> The closer we get to one, the less idle time for
the server, but the longer the lines will be
(probably)
> Actual line length depends a lot not just on the
rates, but also on the variance – we will discuss
shortly 141
Server Utilization

G/G/c/∞/∞ systems
• Applying the same techniques we did for the
single server, recalling that for a stable system
with c servers:
λ < cµ
• we end up with the the final result
ρ = λ/cµ

142
Markov Systems in Steady-State

• Consider stable M/G/c queueing


systems
• Exponential interarrival times
• Arbitrary service times
• 1 or more servers
As long as they are kept relatively simple,
we can calculate some of the long-run
steady state performance measures for
these analytically
May enable us to avoid a simulation for
simple systems
May give us a good starting point even if
the actual system 143is more complicated
Markov Systems in Steady-State

First, what do we mean by steady-state?


• The probability that the system is in a
particular state is not time-dependent
• For example, consider a stable queueing
system S
– What is the probability that fewer than 10
people are in the queue?
– If we start with an empty queue, this probability
is initially 1
– However, as the system runs over time, the
length of the queue will approach its long-run
average length LQ
> It no longer depends on the initial state
> The actual length will still vary, but only due to
variations in the arrival and service times
144
Markov Systems in Steady-State

Let's look at it another way


• Consider a stable M/G/1 queueing system with
a given long-run average customer delay wQ
• We start the system with a certain number of
customers, s, in the queue
• For each customer processed, we re-calculate
the mean delay up to that point
– In other words (ŵQ)j = average customer delay
up to and including customer j
• As j increases, (ŵQ)j  wQ , despite the
different starting values of s
– When this occurs the steady-state has been
reached
• See graph from Law text on board
145
Markov Systems in Steady-State

We will not concentrate on the derivation


of the formulas, but it is useful to know
them and how to use them
M/G/1 Queue (see p. 222 of text for
complete list) Note that ρ is as we previously
λ
ρ= defined it (and is equal to the
µ average number in the server)
Note that σ2 is the variance in
ρ (1 + σ µ )
2 2 2
L=ρ+ the service times
2(1 − ρ ) Note that (intuitively) the
ρ 2 (1 + σ 2 µ 2 ) long-run queue length is equal
LQ = to the long-run number in the
2(1 − ρ ) system minus the average
number in the server
146
Markov Systems in Steady-State

Let's look at these a bit more closely


• Consider first if σ 2 = 0
– i.e. the service times are all the same (= mean)
> For example a deterministic distribution
– In this case the equations for L and LQ greatly
simplified to
ρ 2 (1 + 0 2 µ 2 ) ρ2
LQ = =
2(1 − ρ ) 2(1 − ρ )
– In this case LQ is dependent solely upon the
server utilization, ρ
– Note as ρ 0 (low server utilization) LQ 0
– Note as ρ 1 (high server utilization) LQ ∞

147
Markov Systems in Steady-State

• However, as σ 2 increases with a fixed utilization,


LQ also increases
– Other measures such as w and wQ increase with σ 2
as well
– This indicates that all other factors being equal, a
system with a lower variance will tend to have
better performance
> In fact (See Ex. 6.9) in some cases a lower µ will give
shorter lines than a higher µ, if it also has a lower σ 2
Able: 1/µ = 24min, σ 2 = 400min2
Baker: 1/µ = 25min, σ 2 = 4min2
Able: LQ = 2.711, P0 = 0.20
Baker: LQ = 2.097, P0 = 0.167
> Note that Able has a longer long-run queue length
despite his faster rate
> However, he also has a higher P0, indicating that
148
more people experience no delay
Markov Systems in Steady-State

We can generalize this idea, comparing


various distributions using the coefficient
of variation, cv:
(cv)2 = V(X)/[E(X)]2
– i.e. it is the ratio of the variance to the square
of the expected value
• Distributions that have a larger cv have a
larger LQ for a given server utilization, ρ
– Ex: Consider an exponential service
distribution: V(X) = 1/μ2 and E(X) = 1/μ, so (cv)2
= (1/μ)2/[(1/μ)]2 = 1
– See Figure 6.12 for chart of other distributions

149
Markov Systems in Steady-State

Consider again the special case of an


exponential service distribution (M/M/1),
then
• The mean service time is 1/µ and the variance
on the service time is 1/µ2
This simplifies
1 our2 equations to
ρ (1 +
2
µ )
µ 2
ρ (1 − ρ ) ρ2 ρ
L=ρ+ = + =
2(1 − ρ ) (1 − ρ ) (1 − ρ ) (1 − ρ )
1 2
ρ (1 + 2 µ )
2

µ ρ 2
LQ = =
2(1 − ρ ) (1 − ρ )
150
Markov Systems in Steady-State

• The M/M/1 queue gives us a closed form


expression for a measure that the M/G/1
queue does not:
– Pi , the long-run probability that there will be
exactly ρ)ρ n
Pn = (1 −i customers in the system


L = ∑ i Pi –= 0Note:
[1 − ρ We
]ρ 0 +can
1[1 −use
ρ ]ρthis
1
+ 2value
[1 − ρ ]to
ρ 2show
+ 3[1 −Lρfor
]ρ 3this
+
i =0 system
= 0 + ρ − ρ 2 + 2 ρ 2 − 2 ρ 3 + 3ρ 3 − 3ρ 4 + 
1
= ρ (1 + ρ + ρ + ρ + ) = ρ (
2 3
)
1− ρ

> The last equality is based on the solution to an


infinite geometric series when the base is < 1
> We know ρ <1511 since system must be stable
Markov Systems in Steady-State

The text has formulas for a number of


different queue possibilities:
• M/G/1  already discussed
• M/M/1  already discussed
• M/M/c  Exponential arrivals and service with
multiple servers
• M/G/∞  "Infinite" servers (or service capacity
>> arrivals, or "how many servers are
needed"?)
• M/M/c/N/∞  Limited capacity system
• M/M/c/K/K  Finite population
Let's look at an example where these
formulas would be used
152
Markov Systems in Steady-State

Exercise 6.6
– Patients arrive for a physical exam according to
a Poisson process at the rate of 1/hr
– The physical exam requires 3 stages, each one
independently and exponentially distributed
with a service time of 15min
– A patient must go through all 3 stages before
the next patient is admitted to the facility
– Determine the average number of delayed
patients, LQ, for this system
– Hint: The variance of the sum of independent
random variables is the sum of the variance
> Discuss and develop the solution

153
Markov Systems in Steady-State

Exercise 6.23
– Copy shop has a self-service copier
– Room in the shop for only 4 people (including
the person using the machine)
> Others must line up outside the shop
> This is not desirable to the owners
– Customers arrive at rate of 24/hr
– Average use time is 2 minutes
– What impact will adding another copier have on
this system
• How to solve this?
– Determine the systems in question
– Calculate for each system the probability that 5
or more people will be in the system at steady-
state
154
Markov Systems in Steady State

Exercise 6.21:
– A large consumer shopping mall is to be
constructed
– During busy periods the expected rate of car
arrivals is 1000 per hour
– Customers spend 3 hrs on average shopping
– Designers would like there to be enough spots
in the lot 99.9% of the time
– How many spaces should they have?
• We will model this as an M/G/∞ queue, where
the spaces are the servers
– We want to know how many servers, c, are
necessary so that the probability of having c+1
or more customers in the system is < 0.001

155
Markov Systems in Steady State

Text also has some examples showing


the M/M/c/N Queue and the M/M/c/K/K
Queue
• Idea is similar but the formulas and
applications are different

156
Markov Systems in Steady State

• Before we finish with this topic…


Let's just talk a LITTLE bit about how
these formulas are derived (being light on
theory)
Let's look at the simplest case, an M/M/1
queue with arrival rate λ and service rate
µ
Consider the state of the system to be
the number of customers in the system
We can then form a state-transition
diagram for this system
• A transition from state k to state k+1 occurs
with probability λ157
Markov Systems in Steady State
λ λ λ λ
0 1 2 … k-1 k k+1
µ µ µ µ

From this we can obtain


k
k −1
λ λ
Pk = P0 ∏ = P0   ( Eq. 138.1)
i =0 µ µ
We also know that

∑P
k =0
k =1

• Since the sum of the probabilities in the


distribution must equal 1
• This will allow us to solve for P0
158
Markov Systems in Steady State

∑P
k =0
k =1 •All that is needed for this
derivation is fairly simple
k

λ algebra
∑ P0   = 1 •Before completing the
k =0 µ derivation, we must note an
 ∞
 λ 
k
 important requirement: to be
P0 1 + ∑    = 1 stable, the system utilization
 k =1  µ   λ/µ < 1
  must be true
1 •This allows the series to
P0 = k converge

λ
1 + ∑  
k =1  µ 

159
Markov Systems in Steady State
1 1
P0 = =
 λ / µ   1 − (λ / µ ) λ/µ 
1 +   + 
 1 − (λ / µ )   1 − (λ / µ ) 1 − (λ / µ ) 
1 λ
P0 = = 1−
 1  µ
 
 1 − (λ / µ ) 
• Which is the solution for P0 from the M/G/1
Queue in Table 6.3
Utilizing Eq. 138.1, we can substitute back
k k
to get  λ   λ  λ 
Pk = P0   = 1 −   = (1 − ρ ) ρ
     k

 µ   µ  µ 

• Which is the formula indicated in Table 6.4


• The other values can
160
also be derived in a similar
Markov Systems in Steady State

A lot of theory has been left out of this


process
If you want to read more about queueing
theory, there are a number of books on the
topic:
• Queueing Systems Volume 1: Theory by L.
Kleinrock (Wiley and Sons, 1975)
– Old but still relevant and still available – well
known text on the topic
• Many newer textbooks as well – Go to
Amazon.com select "Books" and search
"Queueing Theory"
One final note: spelling!
• Both "Queueing Theory"
161 and "Queuing Theory"
seem to be acceptable (and both spellings are in
Random Numbers

• Stochastic simulations require random


data
If we want true random data, we CANNOT use
an algorithm to derive it
• We must obtain it from some process that itself has
random behavior
– http://en.wikipedia.org/wiki/Hardware_random_number_genera

• Ex: thermal noise


– http://noosphere.princeton.edu/reg.html
• Ex: atmospheric noise
– http://random.org/essay.html
• Ex: radioactive decay
– http://www.fourmilab.ch/hotbits/
162
Pseudo-Random Numbers

However, these require either purchase of


hardware or a service
For stochastic simulations we also need
very large amounts of random data
quickly
• Possibly more than can be obtained from a true
random source in a reasonable amount of time
Often in testing and running our
simulations, we also want to reuse the
same "random" values
• If we are debugging it helps to be able to see
the execution repeatedly with the same data
• If we are comparing systems, we may want to
use the same data163
on both systems
Pseudo-Random Numbers

• Thus, more often than not,


simulations rely on pseudo-random
numbers
These numbers are generated
deterministically (i.e. can be reproduced)
However, they have many (most, we
hope) of the properties of true random
numbers:
• Numbers are distributed uniformly on [0,1]
– Assuming a generator from [0,1), which is the
most common
• Numbers should show no correlation with each
other
164
– Must appear to be independent
Pseudo-Random Numbers

Let's look at a decidedly non-random


distribution in the range [0,1)
• Ex: Assume 99 numbers, X1…X99 are
generated
X1 = 0.01;
Xi = Xi-1 + 0.01
– This will generate the sequence of numbers
0.01, 0.02, 0.03, … , 0.99
– Clearly these numbers are uniformly distributed
throughout the range of values
– However, also as clearly they are not
independent and thus would not be considered
to be "random"
– Often, uniformity and independence are not as
obvious, and must
165 be determined
mathematically
Linear Congruential Generators

• Linear Congruential Generators


These are perhaps the best known
pseudo-random number generators
• Easy and fairly efficient
– Depending upon parameter choices
• Simple to reproduce sequences
• Give good results (for the most part, and when
used properly)
– Again depending upon parameter choices
Based on mathematical operations of
multiplication and modulus

166
Linear Congruential Generators

Standard Equation:
X0 = seed value
Xi+1 = (aXi + c) mod mfor i = 1, 2, …
– where
a = multiplier
c = increment
m = modulus
• For c == 0 it is called a multiplicative
congruential generator
• For c != 0 it is called a mixed congruential
generator
– Both can achieve good results
• Initially proposed by Lehmer and studied
extensively by Knuth
167
– See references at end of the chapter
Linear Congruential Generators

Note that the generator as shown will


produce integers
If we want numbers in the range [0,1) we
will have to convert the integer to a float
in some way
• This can be done in a fairly straightforward
way by dividing by m
• However, sophisticated generators can do the
conversion in a better (more efficient) way
• Clearly, however, the more possible integers,
the denser the values will be in the range [0,1)

168
Linear Congruential Generators

For now, consider three properties


• The density of the distribution
– How many different values in the range can be
generated?
• The period of the generator
– How many numbers will be generated before
the generator cycles?
> Since the values are deterministic this will
inevitably happen
> Clearly, a large period is desirable, especially if a
lot of numbers will be needed
> A large period also implies a denser distribution
• The ease of calculation
– We'd like the numbers to be generated
reasonably quickly, with few complex
operations 169
Linear Congruential Generators

Consider a multiplicative linear


congruential generator (i.e. c == 0)
Xi+1 = (aXi) mod m
• If m is prime, this will produce a maximum
period of (m–1) if ak – 1 is not divisible by m for
all k < (m–1)
– The period and density here are good, but with m
as a prime, the mod calculation is somewhat
time-consuming
• If m = 2b for some b, this will produce a
maximum period of 2b-2 of X0 is odd and either a
= 3 + 8k or a = 5 + 8k for some k
– This allows more efficient calculation of the
numbers, but gives a smaller period
– The importance 170of a good seed also makes this
less practical (programmer must understand
Linear Congruential Generators

Consider a mixed linear congruential


generator (i.e. c != 0)
Xi+1 = (aXi + c) mod m
• If m = 2b for some b, this will produce a
maximum period of 2b if c and m are relatively
prime and a = (1+4k) for some k
• This allows a good period, an easy mod
calculation with the relatively small overhead
of an addition
• This is the generator used by Java in the JDK
• Let's take a look at that in more detail
• See JDKRandom.java
• See TestRandom.java
171
Quality of Linear Congruential Generators

The previous criteria for m, a and c can


guarantee a full period of m (or m-1 for
multiplicative congruential generators)
• However, this does not guarantee that the
generator will be good
• We must also check for uniformity and
independence in the values generated
• General criteria for good values of m, a and c
are still unsure
• If you are creating a new LCG, a good rule of
thumb is:
– Choose m, a and c to guarantee a good period
– Test the resulting generator for uniformity and
independence
172
Chi Square Testing for Uniformity

• Idea:
Consider discrete random variable X,
which has possible values x1, x2, …, xk
and probabilities p1, p2, …, pk (which sum
to 1)
Assume n random values for X are
chosen
Then the expected number of times each
value will occur Ei = npi
Now assume n random values of
distribution Y thought to be the same as
X are chosen
173
• How can we tell if distribution Y is the same as
Chi Square Testing for Uniformity

Ex: Consider our dice-throwing example


from earlier this term (see Slides 12-13)
E2 = (n/36), E3 = (2n/36), E4 = (3n/36), E5 =
(4n/36)
E6 = (5n/36), E7 = (6n/36), E8 = (5n/36), E9 =
(4n/36)
E10 = (3n/36), E11 = (2n/36), E12 = (n/36)
• Now consider a sequence of n = 360 rolls with
the following distribution
N2 = 8, N3 = 19, N4 = 35, N5 = 38, N6 = 53, N7 = 64
N8 = 46, N9 = 40, N10 = 27, N11 = 17, N12 = 13
– Question: Are these dice "fair"?
• We need a way of comparing empirical results
with theoretical predictions
174
– We can use the Chi-Square (χ2) test for this
Chi Square Test

Consider the following formula:

V =∑
k
( Yi − Ei )
2

i =1 Ei
• Where Yi is the observed number of
occurrences of value xi and Ei is the expected
number of occurrences of value xi
• This is showing the square of the differences
the observed values and the expected values
(divided by the expected values to normalize
it)
• Now consider two hypotheses:
– NULL Hypothesis, H0 : Y matches distribution of
X
– Hypothesis H1: 175
Y does not match distribution of
Chi Square Test

• If V is too large, we reject the null hypothesis (i.e.


the distributions do not match)
• If V is small(ish) we do not reject the null
hypothesis (i.e. the distributions may match)
Staying with our dice example:
V = (8-10)2/10 + (19-20)2/20 + (35-30)2/30 + (38-40)2/40
+
(53-50)2/50 + (64-60)2/60 + (46-50)2/50 +
(40-40)2/40 + (27-30)2/30 + (17-20)2/20 + (13-10)2/10
= 4/10 + 1/20 + 25/30 + 4/40 + 9/50 + 16/60 + 16/50
+
0/40 + 9/30 + 9/20 + 9/10
= 228/60
• Ok, so what does this number mean?
• Should we accept the
176 null hypothesis or reject it?
Chi Square Test

• To answer this question we need to look at the


test a bit more closely
• The test is called the Chi Square Test because
for n large enough (standard rule is that each
Yi should be at least 5) the value for V gives a
Chi Square distribution under the null
hypothesis
– Since V is normalized it is not dependent on the
original distributions – only the differences
between them and the "degrees of freedom" (k
– 1, where k is the number of possible values of
the random variable).
– Thus a standard chart can be used – see p. 584
in text to see how it looks
– The idea is that, given the null hypothesis, it is
more likely that177
V will fall into the "middle" part
than either end
Chi Square Test

– The closer we get to the ends (the "tails") the


less likely it is that V should fall here, given the
null hypothesis
– Thus if V is too large (or in some cases small),
we reject the null hypothesis
• How to determine values for "too large" or
"too small"?
– We must set a level of significance, α
α = Pr(reject H0 | H0 true)
> Or, α is the probability that we reject the null
hypothesis even though it is true (i.e. the
probability that we reject it by mistake)
> Clearly, the lower the value for α the less likely
we are to reject a distribution by mistake
> Ex: if α = 0.1, we are saying we are willing to
reject the null hypothesis even though there is a
10% chance 178that it is true
Chi Square Test

– Mapping this onto our Chi Square graph, we


locate the portion point of the graph that
corresponds to our level of significance, α
> For example, if α = 0.1, which values of V would
cause us to reject the null hypothesis?
> This corresponds to the area in the Chi Square
graph where there is less than 0.1 chance that V
would legitimately be mapped
> See handout
– Depending upon the application, we can use
the full α on one "tail" of the graph (one-sided
test)
> Usually the right tail – in this case V can be 0 but
should not be too large
– We may also want to split α between the left
and right "tails" of the graph
> If it is either too big or too small we reject the
179
null hypothesis
Chi Square Test

Consider again our die-tossing experiment


• We have 10 degrees of freedom (since we have
11 possible values)
• Our value for V = 228/60 = 3.8
• Assume our level of significance is 0.05, using a
two-tailed test (so 0.025 on each side)
– In other words, if the expected values are varied
from by too much, OR if they are matched too
closely, we will assume that the sequence of
throws is not random
• Looking at the charts in the handout, we see:
– For the right tail, with critical value 0.025, the
value of V for 10 degrees of freedom is 20.483
> Since our value of V is much less than this, we do
not reject the null hypothesis on this basis
180
Chi Square Test

– For the left tail, with critical value 0.025, we


actually want the critical value for (1 – 0.025) =
0.975
> The value of V here for 10 degrees of freedom is
3.247
> Since our value of V is GREATER than this (but
not by a lot) we do not reject the null hypothesis
on this basis either
• Note that under repeated testing, there is a
reasonable chance that we WILL occasionally
reject a null hypothesis (perhaps) improperly
– Ex: if α = 0.1 and we do 100 trials, we are likely
to reject 10 of them even if the distribution is
valid
– We need to take this into account during
testing
181
Chi Square Test

So how do we apply this test to our


Uniform distribution?
• After all, the values shown are discrete, and the
uniform distribution is continuous!
We simply divide U[0,1) into subranges
that represent each discrete value
• Ex: For 10 subranges we can have {[0,0.1),
[0.1,0.2), …, [0.9,1.0)}
• Then we count the values in each subrange and
perform the Chi Square test as indicated
– Note that for a uniform distribution all of the Ei
will be the same = n/(# of subranges)
• We probably want the two-tail test here (why?)
182
Chi Square Test

Look at handout RandTest1.java


• Here some linear congruential generators are
tested with the Chi Square Test
• Note the comments and why some of the
generators are poor in some situations
The Chi Square test is not perfect
• Doesn't work well for small number of trials
• As shown does not directly test independence
– Thus we need other tests as well before
deciding upon quality of a random number
generator
• However, Chi Square can be applied in
conjunction with other tests – we may see this
later
183
Kolmogorov-Smirnov Test

• Another test for uniformity is the


Kolmogorov-Smirnov test
Has a different approach than Chi-Square
• Does not count number of values in each
subrange of U[0,1)
• Instead it compares the empirical distribution
(i.e. the numbers generated) to the cumulative
distribution function (cdf) for a uniform
distribution
– Recall from Chapter 5 and Slide 84:
F(x) = x where 0 <= x <= 1
– Think about what this means
> Discuss

184
Kolmogorov-Smirnov Test

– Now consider the empirical distribution generated


by our random number generator (assume N
values are gen'ed)
> Call it SN(x)
– Of the N total values generated by our random
number generator, say that k of them are <= x
> Then SN(x) in this case is k/N
– For any value x, we can determine SN(x) by
counting the number of values <= x and dividing
by N
– For example, consider the following 10 values:
(0.275, 0.547, 0.171, 0.133, 0.865, 0.112, 0.806, 0.155, 0.572,
0.222)
SN(0.25) = 5/10 since 5 values above are <= 0.25
SN(0.5) = 6/10 since 6 values above are <= 0.5
SN(0.75) = 8/10 since 8 values above are <= 0.75
185
– K-S test checks to see how much S (x) differs from
Kolmogorov-Smirnov Test

More specifically:
• Sort the empirical data
• Calculate the maximum "SN(x) above F(x)"
value, D+ for the empirical data
• Calculate the maximum "SN(x) below F(x)"
value, D- for the empirical data
– Idea is that if the empirical data matches the
uniform distribution, it should not ever be too
much above or below the cdf – See Figure 7.2
• Test max(D+, D-) for the desired level of
significance against the table of critical values
for the given degree of freedom (see Table
A.8)
– If it is too high, reject the null hypothesis
186
Kolmogorov-Smirnov Test

• Ex: For previous data, call it R


(0.275, 0.547, 0.171, 0.133, 0.865, 0.112, 0.806, 0.155, 0.572,
0.222)
• Sorted, and compared to F(x) (actually, i/N), we
get
(0.112, 0.133, 0.155, 0.171, 0.222, 0.275, 0.547, 0.572, 0.806,
0.865)
(0.100, 0.200, 0.300, 0.400, 0.500, 0.600, 0.700, 0.800, 0.900,
1.0)
• i/N – Ri
( ----- , 0.067, 0.145, 0.229, 0.278, 0.325, 0.153, 0.228, 0.094,
0.135)
• Ri – (i-1)/N
(0.112, 0.033, ----- , ----- , ----- , ----- , ----- , ----- , 0.006, -----
)
• D+ = 0.325; D- = 0.112
187
Chi-Square vs. Kolmogorov-Smirnov

There is debate about which test is more


effective
• Each seems to work better under certain
circumstances
• It is probably best to run both tests on the
data to thoroughly test for uniformity

188
Tests for Independence

• There are many tests for


independence
There are a lot of ways data can correlate
From one point of view it could appear
independent, while from another it could
appear very non-independent
For best results, many different
independence tests should be run on the
data
• Only if it "passes" them all should the
generator be accepted
We will briefly look at a few of these tests
• Most not taken from
189 the text
Runs Tests

A run can be considered a sequence of


events whose outcome is "the same"
• For example, a sequence of increasing values
– Old game show "Card Sharks" was based on
this
> 2, 5, 8, J, Q, A

In random data the number and length


of various runs should not be too great
(or too small)
• If so, it indicates some non-independence of
the data
Many types of runs can be tested
• We'll look at a few
190
Runs Tests

Runs up and Runs down


• In a random sequence of N values in U[0,1),
some properties of the runs up and runs down
are
Max = N – 1 (every other number switches)
Min = 1 (monotone increasing or decreasing)
µ = (2N – 1)/2
σ2 = (16N – 29)/90
(don't worry about why this is true)
• If N > 20, this distribution is about normal
• Assume we observed r actual runs in our data
– If our data is random, r should not be too far
from the mean
> If it is we reject it
191
Runs Tests

– We normalize using our standard formula:


Z0 = (r - µ)/σ
– And look up the result in our normal table
> If α is our level of significance, our value r should
not be less than α/2 standard deviations from
either side of the normal curve
> If it is we reject the null hypothesis
– See example in handout and show on board
Runs above and below the mean
• Runs up and down don't indicate any absolute
organization of the data (or lack thereof)
• We may also want to test the number of runs
(i.e consecutive values) above and below the
mean
– Too many or too few is not a good sign
192
Runs Tests

• Calculation for mean and variance is more


complex, but we handle this test in more or
less the same way as the runs up and runs
down test
– Don't want the our number to be too far from
the mean
– Use the standard normal distribution to test
– See handout for formulas
We can also test the lengths of runs for
both of our previous options
• The math in this case is more complex, but the
idea is similar
– Given a sequence of random numbers, we can
expect a certain number of runs of each length
1, 2, 3, …
193
> Clearly the probability should get lower as the
Runs Tests

– If the run lengths in our empirical tests differ


too much from the expected distribution, we
can reject the generator
– Idea is to determine the empirical distribution
of runs of various lengths, then compare it with
the expected distribution given random data
> The distributions are compared using the Chi-
Square test
– See handout for details

194
Other Tests

Other tests for independence include


• Autocorrelation tests
– Tests relationship of items m locations apart
> See simple ex on board
• Gap tests
– Tests the gap between repetitions of the same
digit
> For U[0,1) we can discretize it and test for the
gap between values in the same bin
> See simple ex on board
• Poker test
– Tests frequencies of digits within numbers
– Looks at possible poker hands and their
frequencies vs. the expected values
> See simple ex on board
195
Other Tests

• Serial Test
– Let's start with an example here:
– Given a sequence of random values in U[0,1):
X1, X2, X3, X4, …
– Consider the values in non-overlapping pairs
(X1, X2), (X3, X4), …
– Now let's consider the distribution of the pairs
of values in a plane
> We can approach this in a similar way to what
we did with the Chi-Square frequency test
> Now, however, rather than dividing U[0,1) into
(for example) 10 bins we instead divide U[0,1)2
into (for example) 100 bins, one for each pair
> We then check the empirical results with the
expected values for each bin (using Chi-Square)

196
Other Tests

– We can generalize this to an arbitrary number


of dimensions
– However, note that the number of bins
increases dramatically as we increase
dimensions
> Recall the Chi-Square is only effective if the
number of values in each bin is ~5 or more
> For large dimensions we have to generate an
incredibly large number of values to obtain
accurate results
– However, if our generator "passes" in 2 and 3
dimensions, for many purposes it should be ok
> As long as it passes other tests as well

197
Other Tests

There are many other empirical tests that


we can run on our generators
There are also theoretical ways to test
generators
• In this case we do not generate any actual
values with a generator
• Rather, we use the generation algorithm to
determine how well (or poorly) it "can
perform"
Two famous theoretical tests are the
spectral test and the lattice test
– Both are similar in what they are trying to do
• Measure the theoretical randomness /
distribution of sequences
198 of values generated
Spectral Test & Lattice Test

These tests are (more or less) theoretical


extensions of the serial test
Let's look at a simple example with sets
of pairs
• (Xn, Xn+1) for n = 1 up to period – 1.
• If we consider these sets in two-dimensional
space, we get points in a plane
– These points tend to fall on a set of parallel
lines
> Show on board
– The more lines required to touch all of the
points, the better the distribution (for d = 2)
– If few lines are required, it indicates that the
generator does not disperse the pairs well in 2
dimensions 199
Spectral Test & Lattice Test

• We can extend this test to an arbitrary


number of dimensions
– Generally, for dimension d we will look at (d-1)-
tuples and see how many d-1 dimension
hyperplanes are needed to capture all of the
points
> Actually we are looking at "how far apart" the
hyperplanes are, but it is more or less the same
idea
• See handout from Knuth
– Note that the process is highly mathematical
– However, if a generator does not pass this test
in at least the first few dimensions, it is
probably not a good generator to use

200
Other Random Number Generators

• Other random number generation


techniques have been tried, with
varying degrees of success
Middle Square method
• Original method proposed by von Neumann
Start with a four-digit positive integer Z0 and square it to
obtain an integer with up to eight digits; if necessary,
append zeros to the left to make it exactly eight digits.
Take the middle four digits of this eight-digit number as the
next four-digit number, Z1. Place a decimal point at the left
of Z1 to obtain the first U[0,1) random number, U1. Then let
Z2 be the middle four digits of Z12 and let U2 be Z2 with a
decimal point at the left, and so on.
– Try an example
• Interesting idea and at first glance looks good,
201
but has been shown to be quite poor
Other Random Number Generators

Combined LCGs
• In this case multiple LCGs are combined (in a
specific way) to produce generators with
longer periods
– See Section 7.3.2
• These must be handled very carefully
– Oddly enough, the best random generators
tend to be those that follow strict mathematical
formulas
Additive Generators
• These generators add previously generated
values to generate new values
• They typically differ on how many values are
added and which previous values are chosen
to add 202
Additive Generators

Famous initial attempt: Fibonacci


Generator:
• Initialize X0 and X1 using an LCG
Xi = (Xi-1 + Xi-2) mod 232
– The name is fairly obvious
• Unfortunately, this generator does not do too
well in empirical tests
Researchers determined better choices
for terms to add
Ex: The generator used for Unix BSD
random()
• Initialize X0, X1, … X31 using an LCG
Xi = Xi-31 + Xi-3 203
– Has a HUGE period – good for long simulations
Random Variate Generation

• Generating Random Variates


Once we have obtained / created and
verified a quality random number
generator for U[0,1), we can use that to
obtain random values in other
distributions
• Ex: Exponential, Normal, etc.
A variety of techniques exist for
generating random variates
• Some are more efficient than others
• Some are easier to implement than others
We will briefly look at just a few
204
Random Variate Generation

• Inverse Transform Technique


Distributions that have a closed
mathematical formula can be generated
in this way
Idea is that some function F-1 will map
values from U[0,1) into the desired
distribution
An obvious example is the exponential
distribution that you used in Project 1
Let's look this one in more detail
• Recall what the distribution is

205
Inverse Transform Technique

Recall the cdf, as shown below


0 , x < 0
F ( x) =  − λx
1 − e , x≥0
Now consider a uniform distribution over
U[0,1) – we will call this R
We then set F(x) = R and solve for x
• This will give us the transformation needed to
convert from R to F(x)
1 − e − λx = R
e −λx = 1 − R
− λx = ln(1 − R )
x = (−1 / λ ) ln(1 − R ) = (−1 / λ ) ln( R )
206
Inverse Transform Technique

The last equality seems incorrect, but


note that if R is a random distribution on
U[0,1), then so is 1-R, so we can use the
slightly simpler form

• Let's look at another example:


The Uniform Distribution over an arbitrary
range
• We could have it continuous, such as U(1,2)
or U(0,4)
• We could have it discrete, such as
– Uniform distribution over integers 1-10
– Uniform distribution over integers 100-200
207
Uniform Distributions on Different Ranges

• Continuous uniform distributions


– Given R ∈U[0,1), we may need to do 2 different
things to get an arbitrary uniform distribution
> Expand (or compress) the range to be larger (or
smaller)
> Shift it to get a different starting point
– The text shows a derivation of the formula, but
we can do it intuitively, based on the two goals
above
> To expand the range we need to multiply R by
the length of the range desired
> To get the correct starting point, we need to add
(to 0) the starting point we want
– Consider the desired range [a,b]. Our
transformations yield
X = a + (b – a)R
208
Uniform Distributions on Different Ranges

• Discrete uniform distributions


– The formula in this case is quite similar to that
for the continuous case
– However, care must be paid to the range and
any error due to truncation
– Consider a discrete uniform range [m, n] for
integers m, m+1, … , n-1, n
– If we use
X = m + (n – m)R
> The minimum value is correct, since the
minimum result for X is m (since R can be 0)
> Now we need to be sure of two things:
3) The maximum value is correct
4) The probabilities of all values are the same

209
Uniform Distributions on Different Ranges

– First let's consider the maximum value


> Although the continuous range (b – a) worked as
we expected, the discrete range (n – m) does
NOT work
> Since R ∈U[0,1), we know it can never actually
be 1
> Thus, (n – m)R will always be strictly less than n
– m and thus m + (n – m)R will always be strictly
less than n (it will be n – 1 , since it is discrete)
– To solve this problem, we need to increase the
range by 1
> (n – m + 1)
– So we get overall
X = m + (n – m + 1)R
– Now are all values uniform (i.e. equal
probability)?
> Consider breaking
210
U[0,1) up into (n – m + 1)
intervals
Uniform Distributions on Different Ranges

 1  1 2   n − m n − m +1
0, n − m + 1 ,  n − m + 1 , n − m + 1    n − m + 1 , n − m + 1 
• Each sub-interval is equal in range and thus
has equal probability
• Given R in each sub-interval, a distinct value in
the range [m,n] will result
– Ex: Assume R is in the second sub-interval
above
– We then have
m + (n – m + 1)(1/(n – m + 1)) = m + 1 ≤ X <
m + (n – m + 1)(2/(n – m + 1)) = m + 2
– Since X is an integer, and is strictly less than m
+ 2, the entire sub-interval maps into m + 1

211
Inverse Transform Technique

The inverse transform technique can be


applied to other distributions as well
• See text for more examples
Note that in some situations, the inverse
transform technique may NOT be the
most efficient way of generating random
variates
• However for relatively simple cdfs such as the
exponential distribution, it can be easily done

212
Convolution Technique

Recall that the Erlang distribution can be


thought of as the sum of K independent
exponential random variables, each with
the same mean 1/KΘ
• Thus, since we can already calculate the
exponential distribution using the inverse
transform technique, we should be able to
easily also generate an Erlang distribution
• However, for efficiency, we can generate the
answer in a slightly different way
• Let's look at it in some detail

213
Erlang Distribution

• From the text we see:


k k
1
X = ∑ X i = ∑ (− ln Ri )
i =1 i =1 kΘ
– where the contents of the parenthesis is an
exponential variate with mean 1/kΘ
– We could calculate the sum directly, but that
would require calculating a log k times
> Since logarithms are typically time-consuming,
we'd like to calculate fewer if possible
– The equation continues in the text
k k
1 1  k 
X = ∑ X i = ∑ (− ln Ri ) = − ln ∏ Ri 
i =1 i =1 kΘ kΘ  i =1 
– Now only one logarithm is required
> But how did they get that last step?

214
Erlang Distribution

• Recall properties of logarithms


Given values x1 and x2
logb(x1) + logb(x2) = logb(x1x2)
– Ex: log10(1000) + log10(100) = log10(100000) =
5
• Applying the rule multiple times gives us our
desired result
• We haven't talked too much about efficiency
up to this point, but note that for large
simulations, very high numbers of variates
may be required
– If we save even a few cycles in each generation
overall it can add up to substantial savings
• In fact there are more efficient ways than this
as well, which would
215 probably be used in a
large simulation
Other Techniques

Not all distributions can be easily


transformed
• Some may not have a closed form inverse
• We can still use the inverse transform
technique if we are willing to approximate the
distribution
• However, if we want to be more precise we
may have to use other means
In these cases we can use alternative
techniques to generate the variates
• There are many different techniques, some of
which are quite specific to a particular
distribution
216
Normal Variates

Recall that the cdf for a normal


distribution is complex and cannot be
inverted in a closed form
• However, with a different approach we can still
generate normal variates with good results
• The text shows a polar technique that
generates two normal values at a time
– There are many other techniques as well
Let's take a look at these four
distributions, using the implementations
just discussed
• Look at Variates.java

217
Variate Generation Summary

• In a professional simulation
Software used will have predefined
functions for all of these variates
• However, it is good to know some of
the theory for how they are derived
You may perhaps have to derive one
You may need to examine the
implementation to perhaps improve the
efficiency

218
Input Modeling

• In the real world, input data


distributions are not always obvious or
even clearly defined
Often we have some sample data but not
enough for our entire simulation
Thus we'd like to determine the
distribution of the sample data, then
generate an arbitrary number of values
within that distribution for our simulation
• However, the input data may not even be of a
single distribution
• May differ at different times / days
219
Input Modeling

• Also, in some cases we may not be able to


generate real data, because the means to do so
do not yet exist
– Ex: If we are building a new network or road
system, we don't yet have a system to provide us
with the real data
Assuming we can get sample input data,
determining the distribution is not easy
• Usually requires multiple steps, and a
combination of computer and "by hand" work
Let's try to do this for a small real example
• Arrivals at Panera downstairs
• We'll monitor Panera for the rest of class
220
Input Modeling

• We will stand at the door and record the


interarrival times of customers over a 15-20
minute period
– We could do this by hand, but we'll use a
computer since that is easier and more precise
• We'll then look at the data to see if we can fit
it to a distribution
– We'll do this next class

221
Input Modeling

Once we have the data, how do we fit it


to a distribution?
• The first step typically involves creating one or
more histograms of the data and graphing
them, to see the basic "shape" of the
distribution
• This requires some skill and finesse
– How "wide" is each group to be?
– How can we know if a shape is of a particular
distribution?
> Sometimes we just "eyeball it"
> There are also some analytical techniques for
verification
• If we have an idea of a possible distribution,
then we can try to verify it
222
Input Modeling

• Estimate parameters based on the distribution


– Ex: lamba for exponential
– Ex: beta and theta for gamma
– Ex: mu and sigma for normal
• We can then try some goodness of fit tests to
see if the distribution works
– As with the random number testers, we can use
the Chi-Square test and the Kolmogorov-
Smirnov test
Let's try this process with our collected
data

223
Panera Example

Consider the initial data points acquired


• First try to form one or more histograms to
“eyeball” the distribution versus known
distributions
– In this case, the exponential distribution seems
to best match the data
– Note that we may have to try a few different
intervals for the histogram
• Once a family of distributions has been
determined, we’d like to determine the
parameter(s) to the distribution
– With exponential, we’d like to estimate the
value of the rate, lambda
– Since this should be 1/mean, we can calculate
it easily using the sample data
224
Panera Example

• Next we should try one or more “goodness of


fit” tests to see if the proposed distribution
does in fact reasonably match our sample data
– In this case we will use Chi-Square or
Komolgorov-Smirnov
– Consider in this case Chi-Square as shown in
the panera.xls handout
– First we will consider an “uneven” distribution
based on the histogram shown
> We generate the cumulative distribution with the
given lambda at the interval endpoints
> Next we determine the expected values for each
interval
> We then run the test on the results, with the
following modifications:

225
Panera Example
> The degrees of freedom will be n – s – 1, where n is
the number of intervals, and s is the number of
estimated parameters (in this case s = 1)
> Groups with fewer than 5 expected values will be
merged into a single group
– Note that the p-values described in the Banks text
(section 9.4.4) can be easily calculated in Excel
using the CHITEST function
> However there is an issue with degrees of freedom –
see panera.xls
– As indicated in the Banks text (section 9.4.2) we
should also consider an equal probability Chi
Square test
> Note that for our data this test does not perform
nearly as well as the “uneven” probability test (in fact
it fails fairly badly)
• We could also use a K-S test to test the
distribution
– See Example 9.16
226in Banks text
Panera Example

If we find Exponential Distribution


unacceptable (based on the even prob.
Chi-Square test) it might behoove us to try
a different distribution to see if it matches
better
• Some options are
– Gamma
– Weibull
> Exponential is a special case of both of these
– Lognormal
> This is actually tried in the panera.xls
spreadsheet, but it is not good
• Clearly this is not a simple process
• Also note that we really need more data here
227
Input Modeling

What if we don’t have sample data to


use?
• We must do the best that we can, relying on
– Experts
> What do people in the area in question think,
based on their knowledge and experience
– Engineering specs
> Ex: A device is built to have some mean time to
failure, based on the production environment.
We can use that value as a starting point for
mean time to failure of the device in our real
environment
– Similarity to something we already know
> Ex: What is the distribution of people wanting to
ride the new “Super Wacky Crazy Loopmeister”
ride? We can use as a starting point the
measured distribution
228 from last year’s “Bizarro
Zany Upsidedowner” ride

You might also like