Queueing Theory Notes
Queueing Theory Notes
A Modeling Perspective
Glenn Ledder
July 2019, revised October 2019
Contents
A Note on Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1
5 The Erlang Distribution (see Hillier and Lieberman 17.7) 24
5.1 Significance of the Erlang distribution . . . . . . . . . . . . . . . . . . . . . . . . 25
5.2 Probabilities for M/E2 /1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2
This document contains an introduction to queueing theory with emphasis on using queueing
theory models to make design decisions. It therefore combines probability with optimization.
These concepts are contrasted in a statement I once heard in a talk:
Probability is the study of the typical for issues of chance.
Optimization is the study of the exceptional for issues of choice.1
The first five sections of these notes develop the concepts and results of queueing theory.
This topic is about issues of chance, such as the amount of time one must wait in line before
reaching the checkout counter in a store. Hence, our goal will be to characterize the typical
behavior of such systems, and we must keep in mind that the actual behavior in any one instance
will not necessarily be close to the typical behavior. Sections 6 and 7 use the results of queueing
theory in the context of optimization problems . Our goal in these sections will be to identify the
choices that are available in a given setting and determine which choice produces the exceptional
result. We must keep in mind that choices involving design of probabilistic systems are less
certain than choices involving design of deterministic systems. In a deterministic system, we
can say exactly what will be the result of any design choice. In a probabilistic system, the
best we can say is what will be the average result of a design choice. The optimal decision
may not yield the best result in a particular instance, but it will be the decision that results
from rational analysis of the options and therefore the best decision that can be made with the
information at hand.
These notes assume familiarity with basic calculus and probability theory. In particular,
queueing theory makes extensive use of probability distributions and expected value. These
concepts are reviewed here to some extent, but the reader may need to look elsewhere for more
information.
A Note on Notation
The enormous number of quantities in all of mathematics have to be represented with only a
handful of symbols–generally the Latin and Greek alphabets with subscripts. Inevitably this
means that many symbols get used differently in different contexts. The symbol λ is used to
denote Lagrange multipliers in optimization, adjoints in control theory, eigenvalues in linear
algebra and partial differential equations, and the mean customer arrival rate in queueing
theory. Clearly, symbols do not have fixed meanings; rather, they mean what we define them
to mean. This has implications for both the reading and writing of mathematics. When reading
mathematics, one has to be careful to look for information about symbol meanings and be aware
that one author’s W and Wq might be another author’s S and W ;2 not only are the symbols for
a given quantity different, but the symbol “W ” has different meanings in the different systems.
This point is particularly important when reading supplementary material on the internet. It is
very likely that the material you are reading has some notational differences from your textbook
or lecture notes, and you have to be able to translate from one system to the other. When
writing mathematics, it is necessary to be clear about terms and notation to spare your reader
any confusion. In general, it is a good idea to define every symbol other than π or e when
writing about mathematics and modeling. This is the only way to make sure that the reader
will understand what you have written.
1
I regret not knowing the name of the author to whom this memorable statement should be attributed.
2
This specific example arises in Section 2.
3
1 Queueing Theory Basics (see Hillier and Lieberman
17.2,7)
Learning Objectives
1. Know the goals of queueing theory.
2. Be able to identify the defining characteristics of a queue system from the standard 5-
character identifiers.
3. Be able to calculate the arrival-service ratio γ and the utilization factor ρ from a given
narrative and explain their significance.
4. Know the four principal performance measures of a queue system and be able to calculate
them from the steady-state probabilities.
5. Be able to calculate the four performance measures for an M/G/1/∞/∞ system using λ,
µ, and σ.
3. To develop protocols that allow system design decisions to be made so as to optimize the
overall cost associated with the system.
1. The type of probability distribution used for the arrival process, with one or more param-
eters.
2. The type of probability distribution used for the service process, with one or more pa-
rameters.
3. The number of servers in the service station (for example, the number of check-out stations
in service at a grocery store).
Some of this information is presented in compact form as an identifier of the form A/S/s/K/N
, where
4
• A designates the type of distribution used for arrival times,
• S designates the type of distribution used for service times,
• s designates the number of servers,
• K is the maximum number of customers that can be in the system at any one time, and
• N is the size of the population of potential customers.
Typical designators for the distributions are
• M for the exponential distribution (Markovian),
• Ek for an Erlang distribution (which we will use later), and
• D for the deterministic distribution (constant times),
• G for a general distribution (unspecified except for mean and standard deviation).
The size of the calling population is important because customers that enter a queue system
should be removed from the list of potential customers. This decreases the mean arrival rate,
but usually the decrease is too small to worry about. Unless the calling population is small
enough for the decrease to matter, it is best to consider it to be infinite. In this case, the fifth
designator N is often omitted. Similarly, the number of customers that can be in the system
at any one time is usually limited, but most of the time the capacity is large enough that it
is never actually reached. In this case, it is best to consider the maximum system size to be
infinite. As with infinite calling population size, it is common to omit the fourth designator K
when the queue size is unlimited.
The most commonly used systems are of the form M/M/s, meaning that the arrival and
service processes are exponentially distributed and the system size and calling population are
unlimited. We will study these systems in Section 4.
5
then represents the fraction of the service capacity that is used. For this reason, it is called the
utilization factor for the queue system. The quantity is often used even in cases where λ is not
fixed, but the interpretation as utilization factor no longer holds.
For both modeling and computation, it is also helpful to define the arrival-service ratio
λ
γ= . (1.2)
µ
This parameter represents the expected number of arrivals during the average amount of time
for a service completion, which we might call the “load” of the system. In modeling, the most
frequent scenario is one in which the rates λ and µ are fixed and the problem is to choose the
optimal number of servers. The parameter γ is much more useful than ρ in this context because
it is strictly a property of the scenario while ρ combines elements of the scenario data (λ and
µ) with the independent variable of the optimization problem (S). Computationally, we’ll find
γ more useful that ρ in cases where the arrival rate depends on the system state.
1. The mean number of customers in the system over time, including those who are in the
queue as well as those being served. This quantity is usually designated as L.
2. The mean number of customers in the queue, usually denoted Lq . This quantity is seldom
of special interest in modeling, but it is mathematically important because it is usually
the easiest of the four performance measures to determine.
The other two performance measures involve the average amount of time spent by customers.
There are two common choices for terminology and notation, so one must be careful to identify
which system is being used, both when reading what others have written and when writing for
the benefit of others.
6
3. The mean amount of time that a customer spends in the system. Some authors call this
the “sojourn” time and denote it with the symbol S. Others call it the “waiting” time
and use the symbol W .
4. The mean amount of time that a customer spends in the queue. Authors who use “so-
journ” time for mean time in the system usually call this the “waiting” time and denote
it as W , while authors who use “waiting” time for mean time in the system usually call
this the “waiting time in the queue” and denote it as Wq .
The choice between the two notation/terminology systems is a matter of taste. I have two
reasons for preferring the S and W system. First, the phrase “waiting time in the queue” is
unnecessarily complicated. Second, in everyday language we only think of ourselves as “waiting”
when we are in the queue, not when we are being served. Given a choice, it is better to have
the mathematical meaning of a word match its nonmathematical meaning.
Any of these four performance measures can be used to quantify the functioning of a queue
system. Which we emphasize depends on the context. For internal queue systems, where the
customers are machines in a factory and the servers are repair crews, the most logical choice
is L because customers in the system represent lost productivity. For external systems, think
of an auto repair shop as an example. As a customer, you don’t really care how many fellow
customers are in the shop, and you are as inconvenienced by slow service as by a long wait to
begin service. Assuming your car gets repaired properly at a reasonable cost, it is the sojourn
time that will determine if you return to that shop.
There are three simple formulas that relate the four performance measures. First off, the
mean amount of time spent in service is of course 1/µ, so this is the difference between the
waiting times with and without the time spent in service:
1
S=W+ . (1.3)
µ
In addition to this obvious relationship, there are the more subtle relationships that go by the
name of Little’s formulas:
L = λ̄S, Lq = λ̄W. (1.4)
The symbol λ̄ represents the expected value of the arrival rate.1 One might easily guess these
formulas from dimensional consistency (L is customers and S is time, so L/S is customers per
time), but they require some effort to prove. Notice that the formulas (2.3) and (2.4) can also
be combined to get a less obvious formula relating L and Lq :
L = Lq + γ. (1.5)
Taken together, formulas (2.3) through (2.5) mean that if we can calculate one of the
performance measures, then we can calculate all of them. Usually the easiest of the four is Lq ,
which we can write as a weighted average of the number of customers in the queue for each
state of the system.
∞
X
Lq = Ps+1 + 2Ps+2 + 3Ps+3 + · · · = (n − s)Pn , (1.6)
n=s+1
1
In many systems, the arrival rate is always the specific value denoted as λ, but in other systems the arrival
rate depends on the system size.
7
where Pn is the steady-state probability that there are n customers in the system.2
Example
An M/M/2/3/3 queue system has mean service rate µ = 2. The arrival rates depend on the
system state: λ0 = 3, λ1 = 2, λ2 = 1, and λ3 = 0. The resulting steady-state probabilities
are P0 = 0.29, P1 = 0.44, P2 = 0.22, and P3 = 0.05.3 The expected number of customers in
the queue is a weighted average of the numbers of customers in the queue for each state, which
(given that there are 2 servers) are 0, 0, 0, and 1 for the states 0, 1, 2, and 3. Hence,4
Of course the length of the queue is never 0.05, since queue length can only be an integer.
But if we make frequent counts of the queue length, we should expect the average of those
measurements to be approximately 0.05.
We can use formula (2.4) to get W , but first we need to calculate λ̄ as a weighted average
of the λn :
λ̄ = (3)(0.29) + (2)(0.44) + (1)(0.22) + (0)(0.05) = 1.97.
Thus,
0.05
W = = 0.0254, W = 0.0254 + 0.5 = 0.5254, L = (1.97)(0.5254) = 1.035.
1.97
8
leading to
2ρ − ρ2 (1 − µ2 σ 2 )
L= . (1.8)
2(1 − ρ)
Formula (2.7) is called the Pollaczek-Khintchine formula.
For the specific case where the service times are exponentially distributed (M/M/1), the
standard deviation is σ = 1/µ and the result reduces to
ρ
L= , (1.9)
1−ρ
ρ(2 − ρ)
L= . (1.10)
2(1 − ρ)
Note that formula (2.10) has an extra factor (1 − ρ/2) compared to formula (2.7). Given that
this factor is less than one, we see that greater uniformity in service times improves system
performance by as much as 50%.7 See Figure 2.1. This is a general characteristic of queue
systems (although the maximum amount of improvement might be different from 50%). Less
variability with the same mean service rate is always better. There are no obvious design
implications of this result, since we don’t get to choose the characteristics of service jobs. It
may influence the decision to add a server, as additional servers have more benefit when service
times have a higher variability.
M/M/1
8 M/D/1
6
L
4
0
0.0 0.2 0.4 0.6 0.8
ρ
Figure 1.1: Dependence of expected system size on arrival-service ratio for exponential and
deterministic service distributions.
7
Other service distributions generally produce results that fall between M/M/1 and M/D/1.
9
2 Stochastic Processes (see Hillier and Lieberman 17.4)
Learning Objectives
1. Understand what is meant by the term stochastic process.
2. Be able to explain why the arrival process for a queue system is a stochastic process.
3. Be able to discuss the lack of history property that we expect to be valid for arrival
processes.
4. Be able to calculate the probability of the next event occuring within some specified time
interval for the exponential distribution.
5. Be able to show that the exponential distribution has the lack of history property.
6. Be able to explain why the exponential distribution is probably not a good model for
queue service processes.
10
the choice of a model; for example, we can estimate the average arrival rate by counting the
number of arrivals that occur in a 6-hour period.
• The time at which the next arrival occurs, as measured from the current time, does not
depend on the time at which the previous arrival occurred.
Suppose the mean arrival rate is 15 customers per hour, so that the average time between
arrivals is 4 minutes. Now suppose you start your stopwatch at exactly 1:00. The expected
time for the next arrival is 1:04. Suppose no customers arrive in the first 3 minutes. When do
we now expect the next arrival? It might seem logical that the answer is still 1:04, but that is
not correct. It is now 1:03, and our best guess for the next arrival is that it will happen in 4
minutes, at 1:07. If we get to 1:16 without an arrival, the expected time for the next arrival is
still 4 minutes away. To summarize: additional elapsed time does not change the expectation
for the next arrival time, because the arrival time of customer n is unrelated to the arrival
time of customer n − 1. This is hard to conceptualize because it is so different from everyday
experience, when events become more likely as the wait for them lengthens.11
8
In the queueing theory context, think of a customer as encompassing groups of individuals who are shopping
together as well as single individuals.
9
It is important to note the dimensions of quantities, as dimensional reasoning is very helpful in mathematical
modeling.
10
In a real situation, the average arrival rate is probably not a well-defined quantity, since the arrival rate of
customers probably depends on the time of day and the weather. But remember that we are talking here about
a stochastic process as a model for a real situation.
11
I know of only one example other than stochastic processes that is like this. When I was in college in
the 1970’s, scientists thought that we would see a practical fusion reactor in about 20 years. When I started
teaching in 1989, scientists at that time still estimated 20 years for a practical fusion reactor. Scientists in 2019
still estimate that a practical fusion reactor is unlikely within the next 20 years. Of course the story here is not
about a history-less property, but about how hard it is to predict the difficulty of something we don’t yet know
how to do.
11
2.3 The Exponential Distribution
There are three ways to define a probability distribution, two that are easier to understand and
one that is more broadly useful. One of the two easier ones is the survival function, which we
define as
S(t) = P {T > t}; (2.1)
that is, S(t) is the probability for any time t that the next event has not yet occured at that
time.12 There are some simple properties that are common to the survival functions in queueing
theory:
S(0) = 1, S 0 (t) ≤ 0, S(∞) = 0; (2.2)
these say that the next event always occurs after time 0, that the probability the event has not
yet occurred decreases over time, and that the next event always occurs at some finite time.13
Given a survival function S(t), the probability density function is defined as
f (t) = −S 0 (t). (2.3)
Note that
Z ∞ Z ∞ Z t
0
f (t) dt = − S (t) dt = S 0 (t) dt = S(t) − S(∞) = S(t) = P {T > t}; (2.4)
t t ∞
Thus, a probability distribution can also be defined through a probability density function,
using a definite integral to determine probabilities. Similarly,
Z t2
f (t) dt = S(t2 ) − S(t1 ) = P {t1 < T < t2 }.14 (2.5)
t1
12
2.4 Key Properties of the Exponential Distribution
Why are we studying the exponential distribution? Because it has the history-free property
that we identified as characteristic of arrival times for independent customers. To see this, note
first that the probability of T > t is given by S(t) from formula (1.6) . Now suppose a time t1
passes without an arrival. Then the probability that the next arrival occurs between times t1
and t1 + t is given by the conditional probability formula
This says that the probability that the next arrival will require a time greater than t, given that
the interval t1 has already passed, is the same as the initial probability that the next arrival
requires a time greater than t from the start.
This lack of history property is what will allow us to analyze queue models theoretically.
Because of this convenience, we’ll want to use the exponential distribution for service times as
well as arrival times. There are some difficulties in using an exponential distribution service
model, however. Note that the function f is always decreasing. This means that the probability
that the next arrival will occur between times 0 and t1 is larger than the probability that the
next arrival will occur between times t1 and 2t1 , for any t1 . See Figure 1.1. This is a reasonable
property for arrival processes, but it is questionable for service processes. Think about the
amount of time it takes to make your purchase in the convenience store. It is certainly more
likely to be in the range 0 to 5 minutes than 5 to 10 minutes. But is it more likely that the
time will be between 0 and 10 seconds than between 10 and 20 seconds? Almost certainly not.
This is an issue we’ll address in Section 5.
2000
1750
1500
1250
frequency
1000
750
500
250
0
0 1 2 3 4 5 6 7 8
time
Figure 2.2: Histogram of 40000 values drawn from the exponential distribution with mean rate
µ = 0.5. Note that the mean time is µT = 1/µ = 2.
13
2.5 Building Intuition
It is easy to have too much faith in theoretical results and too little understanding of the
unpredictability of individual events. One should experiment with computer simulations to see
how much the actual mean rate can deviate from the theoretical mean rate when the duration
of the simulation is short. Figure 1.2 shows the means for samples of size 100 drawn from the
exponential distribution.17 Note that it is not especially unusual for the actual mean of 100
trials to be less than 80% or more than 120% of the theoretical value.
2000
2000
1500 1500
frequency
1000 1000
500 500
0 0
0.6 0.8 1.0 1.2 1.4 0.3 0.4 0.5 0.6 0.7
sample means sample means
Figure 2.3: Histograms of means of 10000 samples of size 100, with rates α = 1 (left) and α = 2
(right).
2. Be able to sketch rate diagrams and use them to determine steady-state probabilities for
systems with limited queue size.
We’ve seen that the four steady-state performance measures are related by three simple
equations,
1
L = λ̄S, Lq = λ̄W, S=W+ , (3.1)
µ
where λ̄ is the mean arrival rate for the system, leaving us one equation short. The usual way
to complete the set is to compute the queue length in terms of the steady-state probabilities
17
See the R program ExpDistTest.R or the python program ExpDistTest.py.
14
for the system states:
∞
X
Lq = Ps+1 + 2Ps+2 + 3Ps+3 + · · · = (n − s)Pn . (3.2)
n=s+1
In general, there is no way to compute these steady-state probabilities other than to esti-
mate them with simulations. However, in the case where both arrival and service times are
exponentially distributed, we can compute the probabilities as the solutions of the steady-state
versions of a set of differential equations. We’ll do this in two stages. In this section we’ll
consider the relatively straightforward case that occurs when the queue size is limited. Then
in Section 4, we’ll deal with the additional mathematics needed for queues that are unlimited
in size. Keep in mind that the formulas require careful interpretation for systems with limited
queue sizes because customers can be turned away. This means that the mean arrival rate λ̄ is
actually less than the usual arrival rate λ.
λ λ λ 0
0 1 2 3
µ 2µ 2µ
Each arrow in the figure is labeled with a rate parameter corresponding to the corresponding
transition. On the upper arrows, we see that customers enter at rate λ when the state is 0, 1,
or 2, but they do not enter at all when the state is 3. The model assumes that any customer
who tries will leave, never to return.1 On the lower arrows, the service rate parameter is µ
when the state is 1, but 2µ when the state is either 2 or 3. This is because both servers are
busy at these states, so the overall service rate when the system is full is twice as large as the
rate for one server.
15
parameter µ for service completions at state 1, corresponding to the rate µP1 . These two rates
change the probability of state 0 according to the differential equation
dP0
= µP1 − λP0 .
dt
If the queue system opens at time 0 with no customers, then P0 (0) = 1 and P1 (0) = 0, so
the probability P0 will initially decrease at rate −λ. Over time, P0 will decrease and P1 will
increase until they reach steady-state values. Since they are no longer changing at that point,
the differential equation reduces to the steady state equation µP1 = λP0 , which we write in
terms of the single parameter γ = λ/µ as
λ
P1 = P0 = γP0 . (3.3)
µ
The two rates that affect P0 also affect P1 , but in the opposite way. Two other rates affect
P1 as well. In sum,
dP1
= 2µP2 − λP1 − µP1 + λP0 ,
dt
or
dP1 dP0
= 2µP2 − λP1 − .
dt dt
At steady state, both P1 and P0 are unchanging, so the last equation reduces to 2µP2 = λP1 ,
which we write as
λ γ γ2
P2 = P1 = P 1 = P0 . (3.4)
2µ 2 2
A similar study of the changes in P2 yields
γ γ3
P3 = P2 = P0 . (3.5)
2 4
The overall change in P3 is already 0 from equation (3.5), so we do not get a fourth equation.
L = λS, Lq = λW,
with λ = 2, we get
L 1 Lq 1
S= = , W = = .
2 2 2 22
But these results cannot be correct, because they do not satisfy S − W = 1/µ = 1/2. The
problem here is that the λ̄ in Little’s formulas (3.1ab) is the mean arrival rate for the system
whereas the λ weve been using in Figure 3.1 and our calculation of the probabilities is the mean
arrival rate when the system is not full.2 Instead we can compute λ̄ as a weighted average of
the arrival rates for the different states:
The fraction λ̄/λ = 1 − P3 is another critical performance measure of the system, representing
the fraction of customers who are lost because of limited system capacity. In the present
example, 1/11 of potential customers are lost, so the actual mean arrival rate is (10/11)*2, or
20/11. Little’s formulas then give
Lq 1 L 11
W = = , S= = .
20/11 20 20/11 20
17
λ0 λ1 λs−1 λs λK−1
0 1 2 s−1 s s+1 K −1 K
µ1 µ2 µs µs µs
Figure 3.2: A schematic diagram of the most general form of M/M/s/K queue
The differential equations are constructed in the same way as in the previous example.
There are K + 1 unknown probabilities related by K steady-state equations, giving us
Then we get P0 by setting the sum of the probabilities P0 through PK equal to 1. Once the
probabilities are known, we get L and Lq from
K
X K
X
L= nPn , Lq = (n − s)Pn , (3.14)
n=0 n=s
and then
K−1
X Lq L
λ̄ = λ n Pn , W = , S= . (3.15)
n=0
λ̄ λ̄
We can confirm our results by checking S − W = 1/µ. Note that in the usual case where λn = λ
for n < K, we have
λ̄ = (1 − PK )λ (3.16)
18
3λ 2λ λ 0
0 1 2 3
µ 2µ 2µ
and also confirm the specific probabilities given in the example of Section 2 for the case λ = 1,
µ = 2.
2. Use a computer program that calculates Lq and L for given λ, µ, and s for an M/M/s
queue system to obtain results and plot them as graphs.
In Section 3 we learned the basic procedure for analyzing the steady-state probability dis-
tribution and performance measures for M/M queues with a limited queue size. The limited
queue size means that there are only a finite number of unknowns. Now we consider queue
systems with unlimited queue size. We further assume that the arrival rates are the same for
all states and each server has the same service rate. We can use the same method as before, but
we will have infinitely many unknown probabilities, leading to a formula for P0 that involves an
infinite sum. The needed sum can be computed analytically (although the algebra is tedious).
We therefore consider systems that have the rate diagram shown in Figure 4.1. As is commonly
done, we will refer to these systems in this section as M/M/s rather than M/M/s/∞/∞.
Some general infinite sum formulas will be needed. First is the geometric series formula:
1
1 + x + x 2 + x3 + · · · = , 0 < x < 1. (4.1)
1−x
Note that the formula works for |x| < 1, but we will only use it with positive x. The second
formula comes from differentiating formula (4.1) term by term:
1
1 + 2x + 3x2 + · · · = , 0 < x < 1. (4.2)
(1 − x)2
We’ll work out the details for M/M/1 and M/M/2 as preludes to the general case.
19
λ λ λ λ
0 1 2 3
µ µ µ µ
P0 = 1 − ρ, Pn = ρn (1 − ρ). (4.4)
Next we can compute Lq using these results. Note that the queue is empty when the system
state is either 0 or 1 and n − 1 for larger states.
We can now use the sum formula (4.2) to obtain the result
P2 ρ2 (1 − ρ) ρ2
Lq = = = . (4.5)
(1 − ρ)2 (1 − ρ)2 1−ρ
20
λ λ λ λ
0 1 2 3
µ 2µ 2µ 2µ
after which the other probabilities can be calculated from formulas (4.8). Following the same
method as for s = 1 (note that the queue size is 0 until state n = 3 and then n − 2), the
expected queue size is
P3 2ρ3
Lq = P3 + 2P4 + 3P5 + · · · = (1 + 2ρ + 3ρ2 + · · · )P3 = = · · · = . (4.11)
(1 − ρ)2 1 − ρ2
λ λ λ λ λ
0 1 2 3 4
µ 2µ 3µ 4µ 4µ
21
In general, we have
λ λ
ρ= , γ= . (4.13)
sµ µ
Figure 4.3 illustrates the M/M/4 queue system as an example for the general case. We
begin by looking at just the portion of the rate diagram that starts at n = 3. In terms of P3 ,
we have
P4 = ρP3 , P5 = ρ2 P3 , .... (4.14)
On the theory that it is best to do the easier parts of a problem first, we now derive a simple
formula that calculates Lq in terms of P3 :
P5 ρ2
Lq = P5 + 2P6 + 3P7 + · · · = (1 + 2ρ + 3ρ2 + · · · )P5 = = P3 .
(1 − ρ)2 (1 − ρ)2
This formula is convenient because it generalizes easily to the case where s is unspecified. The
current version uses P3 because P3 begins the chain appearing in formula (4.14). In general,
the result is
ρ2
Lq = Ps−1 . (4.15)
(1 − ρ)2
Hence, we need only find a formula for Ps−1 , and then we’ll have Lq .
To calculate Ps−1 with s = 4, we can start by adding the probabilities P3 and up:
P3
P3 + P4 + P5 + · · · = (1 + ρ + ρ2 + · · · )P3 = .
1−ρ
In the case of general s, the corresponding formula will have to start at state s − 1:
Ps−1
Ps−1 + Ps + Ps+1 + · · · = .
1−ρ
The hard part of calculating Ps−1 is the portion of the probability sum that precedes Ps−1 . For
s = 4, we have
γ γ2 γ3
P1 = γP0 , P2 = P1 = P0 , P3 = P0 .
2 2 6
Thus,
γ2
P 0 + P1 + P 2 = 1 + γ + P0 .
2
Combining the two sums yields
γ2
P3
1 = (P0 + P1 + P2 ) + (P3 + P4 + · · · ) = 1+γ+ P0 + .
2 1−ρ
The usual procedure here would be to use P3 = (γ 3 /6)P0 to replace P3 and then have an
equation for P0 . But since it is P3 that we want, we can save effort by instead using the
relationship between P3 and P0 to replace P0 by (6/γ 3 )P3 , giving us
γ2
6 P3
1 = (P0 + P1 + P2 ) + (P3 = P4 + · · · ) = 3 1 + γ + P3 + .
γ 2 1−ρ
22
We can now solve this equation for P3 , with the messy result
1
P3 = .
1 6 γ2
1−ρ
+ γ3
1+γ+ 2
While it has taken a lot of work to get there, the result is a set of relatively simple formulas
that we can use to calculate L: Given s, λ, and µ, we can calculate γ and ρ and then get Rs
from formula (4.17), Ps−1 from formula (4.16), and Lq from formula (4.15). L then follows from
L = Lq + γ, (4.18)
Example
Suppose ρ = 0.8 for the M/M/3 queue. Since ρ = γ/s, this means γ = 2.4. Formulas (4.17),
(4.16), (4.15), and (4.18) yield
2
R3 = 2 (1 + γ) ≈ 1.181,
γ
1
P2 = ≈ 0.1618,
5 + 1.181
ρ 2 P2
Lq = = 2.589, L = Lq + γ = 4.989.
(1 − ρ)2
For cases where s > 3 or values are desired for a variety of ρ values, one would not want to
calculate the results manually. it is better to write a computer program that defines a function
that implement these formulas to calculate L from input values γ and s. Repeatedly calling
this program with a range of γ values (making sure γ < s) allows a large number of results
to be presented in the form of a graph. The results of such a program are displayed in Figure
4.4. If we wanted to emphasize the relationships between the formulas for different numbers
of servers, it would be better to use ρ as the horizontal coordinate. Using γ as the horizontal
coordinate produces a plot that better illustrates the effect of adding servers, making the plot
more useful in a decision-making context.
23
10
L 6
4
M/M/1
2 M/M/2
M/M/3
0 M/M/4
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5
γ
Figure 4.4: Dependence of expected system size on arrival-service ratio γ = λ/µ for M/M/s
queues.
The Erlang distribution (known outside of queueing theory as the gamma distribution) is
defined by its probability density function
(kµ)k k−1 −kµt
f (t) = t e , (5.1)
(k − 1)!
leading to the survival function
∞
(kµ)k k−1 −kµt
Z
P {T > t} = t e dt. (5.2)
t (k − 1)!
Note that if k = 1, the probability density function is just
f (t) = µe−µt ,
24
2000 2000
k=1 k=2
1500 σ = 2.0 1500 σ = 1.414
frequency
1000 1000
500 500
0 0
0 2 4 6 8 0 2 4 6 8
2000 2000
k=3 k=4
1500 σ = 1.155 1500 σ = 1.0
frequency
1000 1000
500 500
0 0
0 2 4 6 8 0 2 4 6 8
time time
Figure 5.1: Histograms of 40000 values drawn from the Erlang distribution Ek , all with µ = 0.5.
Note that the mean in each case is µT = 1/µ = 2.
which is the exponential distribution; hence, the Erlang distribution generalizes the latter by
adding the parameter k, which must be a positive integer.
Figure 5.1 shows histograms of E1 to E4 , all with mean time µT = 2. The plots show
that choosing k > 1 changes the shape of the distribution. The exponential distribution,
corresponding to k = 1, is highly skewed, meaning that the peak at t = 0 is far from the mean
value of µT = 1/µ. The skewness decreases as k increases.
25
standard deviation as µ 2
T
k= . (5.4)
σ
With empirical data, this calculation will not produce an integer, but we can round the result
off to the nearest integer to obtain a suitable Erlang model for the unknown distribution.1
Even when the standard deviation is quite a bit less than the mean, it is common in practice
to use the exponential distribution without any justification, although this could yield poor
results.2 The decision between a distribution that makes for easy calculation and one that
better approximates reality should be based on the difference seen in the final results. If the
simpler distribution gives results that are not much different than the better-fitting distribution,
then there is no harm in using it, but the choice should not be made without investigating this
question.
Among the alternatives to the exponential distribution, the Erlang distribution has a sig-
nificant theoretical advantage. Unless the queue system can be represented by a rate diagram,
the only way to obtain results for s > 1 is with a simulation. This is not very satisfactory, as
simulations take a long time to converge to the mean.3 But since M/Ek /s queue systems can be
represented using rate diagrams, they ultimately lead to formulas that allow the probabilities
to be calculated using large sums rather than simulations. (See Section 5.2 and Appendix B.)
26
0 λ 1,1 λ 2,1 λ 3,1 λ
2µ 2µ 2µ 2µ 2µ 2µ
of arrivals and service. Arrival from state 0 results in state (1,1) because the new customer
will begin phase 1 service. Any arrivals in other states will be customers who must wait in the
queue; thus, arrivals when n > 0 increase n while leaving p unchanged. Arrivals are marked
by left-right arrows on each horizontal row of nodes in the graph. To see what happens with
service completions, consider the example with current state (3,1), meaning that there is one
customer in phase 1 of service and two customers waiting in the queue. Now suppose we have
a string of service completions with no intervening arrivals. The first service completion moves
the customer being served from phase 1 to phase 2; hence, the system moves from (3,1) to (3,2).
The next completion finishes that customer, which decreases the system size by one. The next
customer in line begins phase 1 service, so the system moves from (3,2) to (2,1). Similarly,
consecutive service completions move the system from (2,1) to (2,2) to (1,1) to (1,2) to 0. Of
course the chain can be broken by an arrival, but that merely serves to move the system to
a state we have already studied. Service arrows in the rate diagram point from right to left,
always moving from one row to the other. These arrows are labeled with the rate 2µ because
the mean completion time for half of a full service sequence is µT /2 = 1/(2µ). Alternatively,
we can think of a half service process as running twice as fast as a full service process.
Once we have the rate diagram, we can write down the steady-state equations, working from
left to right through the diagram:
for states 0, (1,2), (1,1), (2,2), (2,1), (3,2), and (3,1), respectively, and so on.
Our goal is to obtain formulas for the total probability Pn for each system size. We don’t
actually care about how that probability is divided between phases. After a fair amount of
algebra (see Appendix A), we obtain a multi-formula computational scheme to compute the
probabilities Pn in terms of intermediate quantities an and bn :
λ
a1 = δ, δ≡ , (5.12)
2µ
27
M/E2 /1 M/M/1 M/E2 /1 M/M/1
n an bn Pn Pn n an bn Pn Pn
0 0.4000 0.4000
1 0.3000 0.6900 0.2760 0.2400 7 0.0085 0.0146 0.0058 0.0112
2 0.2070 0.3861 0.1544 0.1440 8 0.0044 0.0075 0.0030 0.0067
3 0.1158 0.2043 0.0817 0.0864 9 0.0023 0.0039 0.0016 0.0040
4 0.0613 0.1062 0.0425 0.0518 10 0.0012 0.0020 0.0008 0.0024
5 0.0319 0.0549 0.0220 0.0311 11 0.0006 0.0010 0.0004 0.0015
6 0.0165 0.0283 0.0113 0.0187 12 0.0003 0.0005 0.0002 0.0009
Table 5.1: Intermediate values and probabilities for M/E2 /1 as compared to M/M/1 for ρ = 0.6.
a2 = (1 + δ)2 a1 − δ, (5.13)
a3 = (1 + δ)2 a2 − 2δ(1 + δ)a1 , (5.14)
an = (1 + δ)2 an−1 − 2δ(1 + δ)an−2 + δ 2 an−3 , n > 3, (5.15)
b1 = δ(2 + δ), (5.16)
bn = (2 + δ)an − δan−1 , n > 1. (5.17)
1
P0 = P∞ , (5.18)
1+ n=1 bn
P n = bn P 0 , n ≥ 1. (5.19)
The only difficulty with the implementation of this scheme is that the infinite sum must be
approximated numerically. The good news is that this problem is significant only when ρ is too
close to 1. For ρ ≤ 0.8, thirty-one terms are enough to produce results that have less than a
0.1% error.
Other M/Ek /s systems can be analyzed in the same fashion; however, there are more sub-
divisions for each state n. The M/E2 /2 system is analyzed in Appendix B; more servers or a
larger k make for a much messier derivation.
Example
A queue system has exponentially distributed arrival times with a mean rate of 18 per hour
and one server with an average service completion time of 2 minutes and standard deviation
of 1.2 minutes. Since the standard deviation is significantly less than the mean service time,
we model this system with an Erlang distribution. Formula (5.4) yields a nominal value of 2.8.
The E3 distribution is probably the best choice for accuracy, but we choose E2 as a reasonable
compromise between accuracy and tractability. The actual system performance should be
slightly better than what we find with our model because our model has a larger standard
deviation of 1.414.
From the data in the narrative, the parameter values are
28
0.200 M/E2/1
M/M/1
0.175
0.150
0.125
Pn
0.100
0.075
0.050
0.025
0.000
0 5 10 15 20
n
Figure 5.3: Steady-state probabilities for the M/E2 /1 and M/M/1 systems with ρ = 0.8.
Table 5.1 shows the results of the computation, along with the corresponding probabilities for
the M/M/1 system.
Figure 5.3 presents a visual comparison of M/E2 /1 and M/M/1 for ρ = 0.8. Note that the
probabilities for large states are greater for M/M/1 than for M/E2 /1. This is because higher
variability of service times results in worse performance. Curiously, P0 is the same for both
service distributions. This would seem to be a coincidence, but it appears to hold for all values
of ρ, so there is probably some subtle reason. While the other probabilities seem very similar,
they are enough different that the expected system sizes differ by 15%, with 1.5 for M/M/1
and 1.275 for M/E2 /1.
5.3 Results
Figure 5.4 compares the performance of systems with 1, 2, and 3 servers and service distributions
M, E2 , and D. Four different computation methods were used to plot these curves:
1. The M/M/s analytical results were used for M/M/1, M/M/2, and M/M/3.
2. The formulas developed in this section were used for M/E2 /1, and similar formulas were
used for M/E2 /2.
4. Simulations with durations equivalent to 100000 service completions were used for M/E2 /3,
M/D/2, and M/D/3. These three curves are jagged in appearance because a duration of
100000 service completions is not quite enough to guarantee that the observed mean will
match the theoretical mean. These curves would eventually be smooth if the simulations
were run for a much longer period of time.
29
6
M/M/1
M/E2/1
5 M/D/1
M/M/2
4 M/E2/2
M/D/2
M/M/3
L M/E2/3
3
M/D/3
2
0
0.0 0.5 1.0 1.5 2.0 2.5 3.0
γ
Figure 5.4: Dependence of expected system size on arrival-service ratio γ = λ/µ for M/M/s,
M/E2 /s and M/D/s queue systems.
2. Be able to identify specific direct and indirect costs for queue system examples.
3. Be able to choose an appropriate model to represent direct and indirect costs for queue
system examples.
4. Be able to compute indirect costs using weighted averages with either analytical summa-
tion formulas or numerical computation as appropriate.
Now that we have acquired some familiarity with the mathematics of queue systems, it is
time to consider questions about how to use the results to make decisions. We’ll do this in
two steps: this section is about ways to quantify the cost of a queue system, and the following
section is about types of queue system optimization problems that use this section’s cost models
to derive the objective function to be optimized. There are two types of costs involved in a
queue system. Direct costs are those that are associated with the operation of the system, such
as the wages paid to servers or routine maintenance for machines used for service. Indirect
costs are those associated with performance, such as lost business caused by slow service.1 We
consider each of these in turn.
1
The cost is actually based on the “poorness” of the performance, but there is no simple way to say this.
30
6.1 Direct (Operational) Cost
For most queue system design problems, the direct cost (DC) is simply proportional to the
number of servers:
DC = Cs s, (6.1)
where Cs is the cost per unit time for each server. In some cases, however, the cost per unit
time for each server is not a fixed quantity. For example, we might buy better tools so that
service is more efficient, in which case the cost per unit server depends in some way on the
mean service completion rate:
DC = Cs f (µ)s. (6.2)
This form assumes that there is some standard service rate µ0 for which the cost of one server
per unit time is Cs . Then f is chosen so that f (µ0 ) = 1. It should also be an increasing
function, since faster service should cost more.
It is also possible that the arrival parameter λ is not fixed. This would occur for an internal
queue system in which the potential customers are machines in a factory. The factory operator
could institute a preventive maintenance program that decreases λ. The cost might then be of
the form
DC = Cs s + Cm f (λ)N, (6.3)
where N is the number of potential customers and Cm f (λ) is the cost per unit time per potential
customer.2 This function assumes a fixed cost per unit time per server for the service system
itself and a preventive maintenance cost of Cm f (λ) per unit time per potential customer. The
modeling is most convenient if there is a maximum arrival rate λ0 associated with no preventive
maintenance program and a minimum arrival rate λm for which the cost is Cm . Then f must
be chosen so that f (λ0 ) = 0 and f (λm ) = 1.
31
6.2.1 Indirect cost proportional to system size
The simplest choice is to make the indirect cost (IC) a linear function of the system size n. As
an example, consider the case of customers that are machines in a factory. Here it makes sense
to associate the cost of being in the system with the lost productivity of the machines. If each
working machine produces a value of Cn per time unit, then the total cost of lost productivity
for n machines is
IC = Cn n. (6.4)
This is conceptually more complicated than it sounds. While the actual direct cost is pre-
dictable, the actual indirect cost is not. The cost in this formula assumes that we know the
state n of the system. The actual indirect cost of the system depends on the mix of system
states, which is a random variable. As with everything governed by probability distributions,
we need to combine the costs of the different states together into an expected indirect cost,
using the probabilities of the states as the weights for the averaging. Fortunately, this works
out nicely in this case because we already have a performance measure that is the expected
value of n. Thus,
E(IC) = E(Cn n) = Cn E(n) = Cn L. (6.5)
where Ct is the cost per unit time for a customer who spends total time t in the system. Thus,
Ct t has the dimension of cost per unit time multiplied by time per customer. So (IC)c is the
cost per customer. What we actually need is the cost per unit time, so we need to multiply by
the number of customers per unit time, which is λ. Thus,
32
6.2.3 Indirect cost function g(n)
Suppose the indirect cost of a system is associated with business that is lost when customers
see a long line ahead of them. The probability that a given customer balks might be a linear
function of the state n, but it is more likely that the function is concave up; that is, the
probability a customer leaves increases faster as the system size increases. Put another way,
perhaps the indirect cost when there are four customers in the system is more than double the
indirect cost when there are only two. In such cases, the indirect cost is a nonlinear function
of n. The modeling issue in this form of indirect cost is to choose a function g(n) to represent
the cost per unit time for system state n. There are some standard properties we expect such a
function to have. There should be no indirect cost when the system is empty, so g(0) = 0. The
cost cannot go down as the system size increases, so g 0 (n) ≥ 0. Most likely the cost increases
at a growing rate, so g 00 (n) ≥ 0.
For an optimization model, what we need is the expected value of the cost, which is a
weighted average of the values g(0), g(1), and so on, using the probability of each state as the
weight. Thus,
X∞
E(IC) = E(g(n)) = g(n)Pn . (6.9)
n=0
The sum in formula (6.9) can sometimes be computed analytically. This would always be
true for an M/M/s/K system because the probabilities can always be computed analytically
(Section 3) and the sum is finite. For an M/M/s queue, only a few cost functions permit an
analytical solution formula. One example is the cost function
0, n≤s
g(n) = Cn , (6.10)
n − s, n > s
which simply says that only customers in the queue count toward the cost. In this case, we
have a similar calculation to formula (6.5), with the result
E(IC) = Cn Lq . (6.11)
g(n) = Cn n2 (6.12)
with an M/M/1 or M/M/2 system. Both of these queue systems have a simple relationship
among the probabilities:
Pn = ρn−1 P1 .3 (6.13)
The expected value of g(n) can then be written as
We can complete the calculation using the sum formula (see Appendix C)
1+ρ
1 + 4ρ + 9ρ2 + 16ρ3 + · · · = (6.14)
(1 − ρ)3
3
See formulas (4.4) and (4.8).
33
to get
1+ρ
E(Cn n2 ) = Cn P1 . (6.15)
(1 − ρ)3
For M/M/s queues with s > 2, the probabilities can be computed analytically, but then the
sum in formula (6.9) must be computed numerically.
The extra factor of λ is needed because E(h(t)) is the expected waiting cost per customer while
E(IC) is the expected waiting cost per unit time.
In practice, there are several different ways to evaluate the expected waiting cost from
formula (6.16):
2. If the queue system is M/M/s with infinite queue size and calling population, then we
can use the formulas
fs (t) = µ(1 − γ)e−µ(1−γ)t , s = 1, (6.18)
34
γ s P0 1 − (s − γ)e−µ(s−1−γ)t
−µt
fs (t) = µe 1+ , s > 1, γ 6= s − 1, (6.19)
s!(1 − ρ) s−1−γ
and
γ s P0
−µt
fs (t) = µe 1+ (µt − 1) , s > 1, γ = s − 1, (6.20)
s!(1 − ρ)
which we will not derive here. Given these analytical forms for the probability density
function, we might be able to calculate the integral by hand and can use numerical
integration otherwise.
3. If the waiting times are given by a simulation, then we have a finite list of waiting times
instead of a theoretical probability distribution. In this case, the method used for g(n)
works:
J
λX
E(IC) = h(tj ), (6.21)
J j=1
where J is the number of customers in the database and tj is the waiting time for customer
j. Note that the sum is divided by J to get an average waiting cost per customer and
then multiplied by λ to convert the result into an expected waiting cost per unit time.
where fw is the probability density function for waiting times. Similar to formula (6.17), the
linear case ends up as Z ∞
E(IC) = λ Cw tq fw (t) dtq = Cw λW. (6.23)
0
In place of the formulas (6.18)–(6.20) for the probability density function, we have just one
formula,
fw (tq ) = Pq sµ(1 − ρ)e−sµ(1−ρ)tq , (6.24)
where Pq is the overall probability that a new arrival has to enter the queue rather than being
served immediately; that is,
s−1
X
Pq = 1 − Pn . (6.25)
n=0
35
Example
Consider a system with one server and a service cost that depends on the speed of service. We
assume that the cost is 2 per unit time at standard service rate µ = 1 and increases linearly
up to a maximum service rate of µ = 2 with cost 6 per unit time. The indirect cost for each
customer is 1 unit for each unit of time spent in the system. The mean arrival rate is 0.8. We
assume that the service times are exponentially distributed.
The description of the direct cost matches formula (6.2) with s = 1. With costs of 2 for
µ = 1 and 6 for µ = 2, we have a slope of 4 cost units per speed unit. Thus, the total direct
cost is
DC = 2 + 4(µ − 1) = 4µ − 2.
The indirect cost is a linear function of the time in system, which matches formula (6.7). Ct = 1
and λ = 0.8 were given. For this M/M/1 system, we have the formula
ρ λ 0.8
L= = = ;
1−ρ µ−λ µ − 0.8
thus, the expected indirect cost is
0.8 0.64
E(IC) = (1)(0.8) = .
µ − 0.8 µ − 0.8
Combining these gives the expected total cost
0.64
E(TC) = 4µ − 2 + , 1 ≤ µ ≤ 2. (6.26)
µ − 0.8
This result appears in Figure 6.1.
6.5
6.0
E(total cost)
5.5
5.0
4.5
Figure 6.1: Expected total cost for the variable service speed example from formula (6.25).
36
7 Queue System Optimization (see Hillier and Lieber-
man 26.2, 4)
Learning Objectives
1. Understand the difference between design parameters and fixed parameters.
3. Understand how to obtain an objective function from an expected total cost by removing
parameters.
4. Be able to solve some optimization problems with a discrete design parameter by plotting
curves that mark indifference between alternatives.
5. Be able to use calculus to solve optimization problems with a continuous design parameter
and a simple formula for the objective function.
Design issues arise because of trade-offs. An alternative that is better than the others in
every way is clearly the best choice, with no need for mathematical decision making, but that
seldom occurs in practice. In the case of queue systems, it is reasonable to expect that better
performance costs more. There is usually a principle of diminishing returns: each increment
of additional spending produces progressively less improvement in indirect cost. Thus it is
common that small improvements are worth the cost while large improvements are not.
In any design problem, there are certain features of the setting that can be chosen, per-
haps with constraints on their possible values. There are generally also some features that are
inherent and cannot be altered. The critical modeling tasks in a design problem are to dis-
tinguish these elements and to identify a mathematical quantity that can be used to compare
alternatives. Specifically, the modeler must
1. Identify qualitative features and parameters in the real world setting that can be chosen
and determine whether or not there is a limited range of possible values;
2. Identify features of the real world setting that are unalterable, keeping in mind that
unalterable features may still have a range of values that need to be considered;
Once these modeling tasks are completed, we are left with a mathematical optimization problem.
In this phase we need to
4. Use calculation or simulation to determine the value of the objective function for any
given set of parameter values;
5. Use some mathematical, graphical, or numerical method to identify the design parameter
values that yield the best value of the objective function.
37
A common issue in modeling is the trade-off between accuracy and tractability. Sometimes
we have a more accurate model that is hard to analyze and a simplified model that is easy
to analyze. Which of these we choose depends on how certain we are of the scenario facts.
In the case of queueing theory, the mathematical results are expected values of probabilistic
quantities rather than actual values of deterministic quantities. The probabilistic nature of
queueing theory also affects the reliability of the values we choose for the fixed parameters.
For example, given that simulations need to run for a duration on the order of 100,000 service
completion times in order to converge to the expected value, we cannot realistically expect to
measure the mean service completion time with a high degree of reliability. These considerations
make the connection between the input and output quantities less certain than would be the case
for a deterministic scenario, suggesting that tractability might sometimes be more important
relative to accuracy in queue systems than other optimization settings. Accordingly, we will
want to try to make our models reasonably accurate, but we’ll accept a modest amount of error
in exchange for analysis that we can reasonably do without simulations.
Example 1 – Choosing the number of servers In the simplest case, the only design choice
is the number of servers. Other features, such as the distributions and mean rates for arrival
and service, are unalterable. Consider the case where the indirect cost is a linear function of
the system size. We can write the expected total cost as
38
where the cost per time for a server Cs , the cost per time for each customer in the system Cn ,
and the arrival to service ratio γ are fixed parameters. Note that these parameters are “fixed”
in the sense that the designer cannot change them, not in the sense that they have one specific
value. This objective function represents a class of problems rather than a single problem with
a specific set of parameters. We might therefore want to know the solution for many sets of
parameter values or we might want to know how the solution changes as a parameter value
changes. For this reason, it is worth the effort needed to look over a problem to see if all
the parameters are really necessary. In this case, we could define a cost ratio parameter by
C = Cs /Cn and then rewrite the expected total cost as
E(TC) = Cn (Cs + L(s, γ)). (7.2)
In this formulation, the parameter Cn is a multiplicative factor. We need it to calculate the
expected total cost, but it plays no role in determining the optimal server number . Thus, we
can replace the expected total cost with the simpler objective function
Z(s; γ, C) = Cs + L(s, γ). (7.3)
In this notation, any quantities in front of the semicolon are true independent variables, while
quantities after the semicolon are fixed parameters whose effects we might want to study. This
notation serves to define a class of problems. In this case, we can think of the optimal strategy
as a function s(γ, C) that tells us how many servers to choose for a given set of parameter
values. We’ll return to this idea later when we analyze the model.
Note that each value of s has its own formula for L(s, γ). We’ll discuss methods for analyzing
this optimization problem below.
This particular example uses the simplest indirect cost model. Other scenarios might call
for different choices.
Example 2 – Choosing the service rate In some cases, it may be possible to improve the
service rate through better maintenance of machines used for service, better server training, or
some other modification. We briefly considered an example in Section 6 where there was only
one server with a range of possible µ values. Generalizing that example a little, suppose the
parameter µ is confined to the range µ0 ≤ µ ≤ µ1 , with the cost a linear function that runs
from Cs = C0 for µ0 to Cs = C1 for µ1 . The rate of change of Cs is
C1 − C0
Cs0 = . (7.4)
µ1 − µ0
We can then use the point-slope form for a straight line to write the direct cost as
DC = C0 + Cs0 (µ − µ0 ) = [C0 − Cs0 µ0 ] + Cs0 µ ≡ A + Cs0 µ. (7.5)
Taking the simplest form for the indirect cost, as in Example 1, we have
Cn ρ Cn λ
E(IC) = Cn L(λ, µ) = = , (7.6)
1−ρ µ−λ
where we have separated out the factors λ and µ since one of these is fixed and the other is
variable, and we have substituted in the formula for L when s = 1. We therefore have expected
total cost of
Cn λ
E(TC) = A + Cs0 µ + . (7.7)
µ−λ
39
In addition to the design parameter µ, this formula has four fixed parameters, but only two of
them are necessary. We can define
0 Cs0
C = (7.8)
Cn
so as to obtain the formula
0 λ
E(TC) = A + Cn C µ + . (7.9)
µ−λ
Only the portion of this formula inside the brackets depends on µ , so this is all we need for
the objective function. We therefore have
λ
z(µ; λ, C 0 ) = C 0 µ + , µ0 ≤ µ ≤ µ1 . (7.10)
µ−λ
Example 1 – Choosing the number of servers For a complete analysis of problem (7.3),
we need to produce a result that identifies the correct number of servers for all possible values
of γ and C. To see how this can be done, let’s do a thought experiment. Suppose we solve
the problem with one set of parameters and find that 1 server is better than 2. We plot that
point in the γC plane with a red dot. We keep solving the problem with different sets of γ
and C, plotting red dots if 1 server is better and blue dots if 2 servers are better. Near the
γ axis, C is small, meaning that servers are really cheap; hence, 2 servers will be better than
1. Similarly, when C is large enough 1 server will be better than 2. For each value of γ there
will be a “purple” point that marks the boundary between red and blue. These purple points
40
combine to make a purple curve. If we can plot this purple curve, we’ll know that 1 server is
optimal for points above it and 2 servers are better than 1 for points below it. We won’t need
to keep solving the problem with different sets of parameters.
The purple curve in our thought experiment marks the points for which the objective func-
tion values are the same for both cases; that is, the values of γ and C satisfy the equation
For any particular γ, there will be one value of C that satisfies this equation because higher
values of C always give greater preference to fewer servers. We can calculate this value of C as
a function of γ, by solving equation (7.11) for C, with result
It is a simple matter to plot this curve using a computer by choosing a set of γ values and
calculating the corresponding C value.
With the same reasoning, we can identify the point of indifference between 2 servers and 3,
which is
C23 (γ) = L(2, γ) − L(3, γ). (7.13)
Two servers are optimal if C23 (γ) < C < C12 (γ) because the first inequality marks 2 servers
as better than 3 while the second marks 2 servers as better than 1. In a similar fashion, we
can compute curves that mark points of indifference between any adjacent server numbers. By
plotting these curves on common axes, we obtain a graphical solution for the model (see Figure
7.1). Given any particular set of parameter values, we obtain the optimal number of servers by
computing γ and C, plotting the point on the graph, and seeing which of the regions the point
lies in .
Example 2 – Choosing the service rate Example 2 differs from Example 1 in two impor-
tant ways: (1) the design parameter is continuous rather than discrete, which makes calculus a
possibility, and (2) the function to be optimized has just one formula used for all cases .
Before solving an optimization problem with parameters, it is helpful to look at some illus-
trations of what the function can look like. Figure 6.1 shows the expected total cost for one
particular set of parameter values: λ = 0.8 and C 0 = 4. In this case, the minimum cost occurs
at a point between µ0 = 1 and µ1 = 2. This point can be found by setting the derivative equal
to 0. In other cases, however, the graph might be monotone increasing, as would be the case
with the same λ and C 0 but having µ0 = 1.3. In this case, the optimal µ is at the left end point
rather than the point where the derivative is 0.
We start by finding the value µ = M where the derivative of the objective function is 0.
From formula (7.11), we have
dz λ
= C0 − .
dµ (µ − λ)2
Setting this equal to 0 at µ = M eventually yields the equation
λ
(M − λ)2 = ,
C0
41
1.0
0.8
service cost / waiting cost
s* = 1 s* = 2 s* = 3 s* = 4 s* = 5
0.6
s* = 6
0.4
0.2
s* = 7
0.0
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0
γ
42
APPENDICES
A Detailed Analysis of the M/E2/1 System
This section presents the details in the derivation of the computational scheme (5.13)–(5.20)
from the balance equations (5.6)–(5.12).
We start by rearranging the balance laws to express the unknown in each successive equation
in terms of the previously known values, taking P0 to be known:
P12 = δP0
P11 = (1 + δ)P12
P22 = (1 + δ)P11 − δP0
P21 = (1 + δ)P22 − δP12
P32 = (1 + δ)P21 − δP11
P31 = (1 + δ)P32 − δP22
P42 = (1 + δ)P31 − δP21
P1 = (2 + δ)P12 , (A.6)
and
Pn = (2 + δ)Pn2 − δP(n−1)2 , n > 1. (A.7)
The formulas (A.2)–(A.7) express each of the other probabilities in terms of the unknown
probability P0 . If we define an = Pn2 /P0 and bn = Pn /P0 , we have
a1 = δ, (A.8)
a2 = (1 + δ)2 a1 − δ, (A.9)
a3 = (1 + δ)2 a2 − 2δ(1 + δ)a1 , (A.10)
43
an = (1 + δ)2 an−1 − 2δ(1 + δ)an−2 + δ 2 an−3 , n > 3, (A.11)
and
b1 = δ(2 + δ), (A.12)
bn = (2 + δ)an − δan−1 , n > 1. (A.13)
The sum of probabilities Pn must equal 1, so (dividing by P0 ),
1
= 1 + b1 + b2 + · · · .
P0
P0 is then given by the formula
1
P0 = P∞ , (A.14)
1+ n=1 bn
2µ 2µ 2µ 4µ 2µ 4µ 2µ 4µ 2µ
1,2 λ 2,3 λ 3,3 λ 4,3 λ 5,3 λ
4µ 2µ 4µ 2µ 4µ 2µ 4µ 2µ
2,4 λ 3,4 λ 4,4 λ 5,4 λ
As in M/E2 /1, arrivals occur at rate λ and increase the size n. If the new size is 1 or 2,
then service starts immediately in phase 1, so p also increases. There are a number of different
cases for service completions. From (1,1), completions occur at rate 2µ and move the system
44
to state (1,2). From (1,2) the rate is the same and the resulting state is 0. These are the same
as in M/E2 /1. From (2,2), (3,2), and so on, service completions move a customer from phase
1 to phase 2 without changing the size of the system, so (2,2) to (2,3), (3,2) to (3,3), and so
on. The rates are 4µ rather than 2µ because each of the two servers has rate 2µ. Along the
bottom row, service completions are always phase 2, so the size of the system decreases by
1. The phase decreases by 2 from (2,4) because the second server becomes idle, but decreases
by 1 from (3,4) and upwards because the second server changes from phase 2 on the previous
customer to phase 1 for the customer moving in from the queue. As with the top row, these
transitions have rate 4µ because there are two different servers whose rates are combined. The
middle row (not counting state (1,2)) is a bit more complicated because the two servers are in
different phases. If the completion is for the server in phase 1, then that server moves on to
phase 2 and the system has the same n while p increases by 1. If the completion is for the server
in phase 2, then n decreases by 1 while p decreases by 2 if the transitioning server becomes idle
and by 1 if a new customers comes in from the queue. All of the transitions from the middle
row are at rate 2µ because there is only one server corresponding to each transition.
The procedure for obtaining equations to represent the M/E2 /2 system is similar to that
for M/E2 /1, except that there is a lot more algebra. We start by defining
Pnp Pn
anp = , bn = , n > 0. (B.1)
P0 P0
Once we have formulas for anp , we can complete the specifications with
b1 = a11 + a12 , bn = an1 + an2 + an3 , n>1 (B.2)
and
1
P0 = P∞ , (B.3)
1+ n=1 bn
The balance equation for state 0 easily yields
λ
a12 = ρ = . (B.4)
2µ
After that it gets difficult.
Working from left to right, we have the balance equation for state (1,2), which yields
a11 + 2a24 = (1 + ρ)a12 = ρ(1 + ρ). (B.5)
This equation has two unknowns, so we need to combine it with the balance equations for the
states in the next column: (1,1) and (2,4). These are
(1 + ρ)a11 = a23 + ρ, (2 + ρ)a24 = a23 ; (B.6)
subtracting the second from the first yields
(1 + ρ)a11 − (2 + ρ)a24 = ρ. (B.7)
Equations B.5 and B.7 are a pair of equations for the unknowns a11 and a24 . Some careful
algebra yields the results
4ρ + 3ρ2 + ρ3 2ρ2 + ρ3
a11 = , a24 = , (B.8)
4 + 3ρ 4 + 3ρ
45
and then
a23 = (2 + ρ)a24 . (B.9)
The procedure for obtaining a22 , a34 , and a33 is similar, albeit messier. We need the balance
equations for states (2,3), (2,2), and (3,4):
Finally, we can add and subtract these two equations to get a22 and a34 respectively:
2+ρ ρ ρ
a22 = a23 − a12 + (a11 − a24 ), (B.14)
4 4 2(2 + ρ)
2+ρ ρ ρ
a34 = a23 − a12 − (a11 − a24 ), (B.15)
4 4 2(2 + ρ)
and
an+1,3 = (2 + ρ)an+1,4 − ρan4 , n ≥ 2. (B.19)
The full set of a values is given in turn by formulas (B.4), (B.8), (B.9), and (B.14) through
(B.19), with the last three repeated for the increasing sequence n = 2, 3, . . . until the values are
small enough not to matter.
Figure B.2 shows the steady-state probabilities for the M/E2 2 and M/M/2 systems. The
differences are similar to those seen with one server in Figure 5.3.
46
0.175 M/E2/2
M/M/2
0.150
0.125
Pn0.100
0.075
0.050
0.025
0.000
0 5 10 15 20
n
Figure B.2: Steady-state probabilities for the M/E2 2 and M/M/2 systems with ρ = 0.8.
P∞ 2 n−1
C The Sum Formula n=1 n x (6.13)
Let ∞
X
S= n2 xn−1 = 1 + 4x + 9x2 + 16x3 + · · · .
n=1
Then
xS = x + 4x2 + 9x3 + 16x4 + · · · .
Subtracting these yields
or
(1 − x)S = (1 + x)(1 + 2x + 3x2 + 4x3 + · · · ).
The infinite sum in this formula is known:
2 3 d d 1 1
1 + 2x + 3x + 4x + · · · = (1 + x + x2 + x3 + · · · ) = = .
dx dx 1−x (1 − x)2
Thus,
1+x
(1 − x)S = ,
(1 − x)2
or
1+x
S= .
(1 − x)3
47