Rate and Time in Quesing Model
Rate and Time in Quesing Model
If we have an idea of the customer arrival rate, that information might be used to determine an
optimal service system configuration. If we have too much capacity, such as with too many
servers working at too fast of a collective rate, then the servers could wind up spending most of
their time in idleness. However, if we have too little capacity–with too few servers–then
customers may spend much of their time in idleness waiting to be served.
For example, think of the last time you visited a grocery store. Do you recall how many of the
cash registers had cashiers to operate them? Although modern grocery stores typically have
numerous check-out lanes, they are rarely if ever all utilized at the same time. Why is that the
case? Because the grocery store does not want to pay cashier labor when no customers are
present. What if one or two customers is “inventoried” in a cashier waiting line? Should
capacity be increased by bringing on additional cashiers? Probably not, since it is not
unreasonable to have customers wait a few minutes to be served. But, what if customers wait ten
minutes or more? Then, the cost of customer waiting is likely larger than the cost of an
additional cashier or two. These are not easy tradeoffs.
How
As mentioned above, the factors that influence the amount of waiting that customers are
subjected to include: the rate at which customers arrive, how fast the servers serve, and, the way
the service system is configured. The first two items can be expressed mathematically as
probability distributions, as explained below.
Customer Arrivals
Customer can arrive in many ways. They can arrive individually or in groups. They can arrive
at a steady rate on in spurts. They can arrive and then depart because the expected wait is too
long, which is called balking. Other customers may arrive and wait for a while, only to be
frustrated and leave, which is called reneging.
We may characterize random customer arrivals by a probability distribution. The most common
distribution used for arrivals is the Poisson distribution. The Poisson distribution is a discrete
distribution, meaning that only certain numerical values will come from the distribution. The
values that come from the Poisson distribution are whole numbers greater than or equal to zero.
A value from the Poisson distribution represents a number of customers arriving in a particular
time period. We assume that only whole numbers of customers arrive–no fractional customers.
Which is interpreted to mean that the probability that exactly n customers will arrive during a
time period of length T will be PT (n). The parameter λ is called lambda, and represents the
average number of customers arriving per single time period. (Recall from your statistics class
that e is a constant of approximately 2.718, and n! = n×(n-1)×(n-2)×...×2×1.)
For example, if a single time period is defined as one hour, and an average of 4 customers arrive
each hour, then λ=4. Note that lambda is a “rate,” which is expressed in units (i.e. customers)
per time period. If the arrival of customers is random according to a Poisson probability
distribution, then the probability that exactly 3 customers will arrive in a one-hour time period is
P1(5) = (4×1)3e-4×1 / 3! = 0.1954, or a 19.54% probability.
Perhaps a more insightful way of looking at the arrival of customers is by considering the
average time between any two customers arriving. If one customer arrives at exactly 10:03 and
the next customer arrives at exactly 10:25 then the time between those arrivals is 22 minutes.
The time between adjacent arrivals is called the interarrival time. It is important to note that the
average arrival rate is simply the inverse of the average interarrival time.
If the arrival rate, expressed as customers per time period, is random and follows a Poisson
distribution, then the interarrival time is also random and follows a negative-exponential
distribution (sometimes just called an exponential distribution). The exponential distribution is
continuous, meaning that values from the distribution can be on any real number. In fact, the
amount of time between arrivals can be any real number, it could be 3 minutes, 3.2 minutes, 3.21
minutes, 3.21958382349438234823 minutes, etc.
The mean of the negative exponential distribution is 1/λ and is in units of time period. For
example, if the average time between customer arrivals is 2 minutes then lambda is ½. Note that
this corresponds with the Poisson mean of ½ of a customer per minute. In this sense, lambda in
the Poisson distribution is the same lambda in the corresponding exponential distribution.
The probability density function for the exponential distribution is:
This probability density function only has relative interpretation, and the f (t) value does not
directly represent the probability of occurrence. For example, if lambda equals 4 then
f (1)=0.0733, which does not mean a 7.33% probability of the interarrival time being exactly 1.
In fact, since there are an infinite amount of unique real numbers then the actual probability of
the interarrival time being any particular real value is zero. (Technically, the plim or probability
limit, is zero.)
What we might be interested in is the probability that an interarrival time will be in a given range
of values. For example, we may desire to know the probability that a customer will arrive in the
next five minutes, give that a customer just arrived. This value comes from the cumulative
probability distribution for the exponential distribution:
If the average interarrival time is 4 minutes then lambda equals 0.25. The probability that the
next customer will arrive in the next 5 minutes is F(5) = 1-e-4×5 = 0.7135 or a 71.35 percent
probability.
Now, let’s imagine that no customer arrives in the next 5 minutes (which has a 28.65 percent
probability of occurring). At that point, what is the probability of a customer arriving in the next
5 minutes? Is it greater than or less than the original 71.35 percent probability? Some people
would argue that it is less, since customers must be generally delayed. Others would argue that it
is more, since we need more customers to reach our average of 4 minute interarrival times (a
concept called “regression to the mean”). Rather than speculate, we can calculate the
probability.
If we assume interarrival times follow the exponential distribution, and 5 minutes have passed,
then we can calculate a new F(t) function that starts at t = 5. This new F(t) will be the same as
the old F(t) except that it will be “normalized,” meaning that it will be scaled by a factor so that
the probability of a customer arriving between t=5 and t=∞ will be 100 percent. This must be,
since it means that the interarrival time will be some value. (Ignoring the possibility that no
customer will ever arrive.)
The probability of the next customer arriving between t=5 and t=∞ from our original function
was 28.65 percent. So, to scale that to 100 percent we simply subtract 0.7135 from the original
F(t), since no customer arrived during that period, and multiple the remaining value by 1/0.2865.
The resulting cumulative probability function is:
You will note that Ft+5(5) = 0, since we already know that no customer arrived in the first 5
minutes. What we are interested in knowing is the probability of a customer arriving in the next
5 minutes, given that no customer arrived in the first five minutes. That is simply Ft+5(10) which,
when accounting for rounding error, is (0.9179-0.7135)/0.2865=0.7135. Amazing! What that
means is that the probability of a customer arriving in the second 5 minutes given that no
customer arrived in the first 5 minutes is exactly the same as the original probability of a
customer arriving in the first 5 minutes! The passage of time has absolutely no impact on the
probability distribution of future interarrival times. This is known as the memoryless property of
the distribution.
The memoryless property indicates that the probability of a customer arriving in the future is not
a function of when customers arrived in the past. If each customer arrival is independent of each
other customer arrival, the memoryless property makes perfect sense. It is a common
assumption that customers exist independently of one another, and arrive at the service provider
when they please.
In summary, if customer arrivals are completely independent then the memoryless property
makes sense. The exponential distribution is a good distribution to assume, since it possesses the
memoryless property (although it is not the only one that does). If we assume an exponential
distribution for interarrival times, then we are inherently assuming a Poisson distribution for
arrival rates. This is an important assumption that we will come back to later.
Service Rates
The “service rate” is the average rate at which customers can be served, and is expressed as
customers per time period. This is different from the “service time,” which is the average
amount of time it takes to serve one customer. We can easily calculate one from the other by
inverting the value.
It makes sense that if servers serve faster then customers will have to wait a shorter time to be
served. How long does it take to be serve a customer? Sometimes the service rate is constant as
are rates in many manufacturing processes. In other cases, service times vary based on
variations in customer inputs and variations in service requirements. For example, we might
assume that service rates also follow a Poisson distribution. This implies that the time between
completing the service for individual customers follows an exponential distribution.
Recall that we used lambda to represent the average arrival rate. We will use μ (spelled mu and
pronounced “m-you” like the sound of a cat) to represent average service rate. When the service
rate is constant, then μ is that service rate.
As just mentioned, service rates can be converted to or from service times by inverting the
value. For example, if we have a service time of 2 customers per minute, then the service rate
would be ½ minutes per customer. Note that we invert not only the numbers, but also the units
of measure. For this example we inverted “customers per minute” to come up with “minutes per
customer.”
As another example, if we have a service time of 20 minutes per customer, then we have a
service rate of 1/20 customer per minute. What is 1/20th of a customer? That means that on
average we are able to complete 1/20th of a customer’s service in one minute. It would probably
be clearer to express this service rate in customers per hour. This conversion is simply done by
recognizing that 60 minutes/hour is a unity, since the top and the bottom of the fraction are equal
(60 minutes = 1 hour). Therefore, we can multiply our service rate by the time conversion unity:
You will observe that the equation has “minute” in the numerator and in the denominator, so the
“minutes” cancel.
I recommend that when doing time unit conversions you always include the units of measure in
the conversion. If the units of measure do not convert properly, you have probably made a
mistake. The following is an example of a mistake while trying to convert a service time of 5
minutes per customer to hours as the time unit:
You will note that the minutes do not cancel because they are both in the numerator. A correct
way to do that conversion is:
where the “minutes” cancel because they are both in the numerator and in the denominator.
Recall that the average arrival rate is simply the inverse of the average interarrival time. This
discussion about converting service rates to or from service times also applies to converting
average arrival rates to or from average interarrival times.
• Times are different than rates. If you have an equation that calls for the service rate,
and you use the service time instead, it will produce erroneous results. You must invert
the time first.
• Be careful about units of measure. If you have an equation that asks for both the service
rate and the arrival rate, then they must both have the same units of measure. Using a
service rate in “customers per minute” with an arrival rate in “customers per hour” will
cause problems.
Queue configuration, which is how the queue is organized. There might be one queue
feeding multiple servers, or each server having its’ own queue.
Queue discipline, or the way the next customer is selected from the queue to be served.
A common queue discipline is first-in-first-out (FIFO) which is the same as first-come-
first-served (FCFS). Hospital emergency rooms take patients based on urgency of need,
which is often not FIFO. Restaurants and hair solon may serve customers with
reservations first, based on the time of their reservation.
Queue size limits. Some queues will hold a limited number of customers. Others will
only hold a fixed number of customers, after which subsequent customers will be turned
away.
Number of service phases. Some services have the customer wait for a single service.
With others, the customer waits to see the first server, then may wait to see another
server, and perhaps a third, etc. For example, at some fast food restaurants the customer
waits to place an order, then gets in a cashier line to wait to pay for the order.
Number of servers. There could be one server who serves all of the customers, or
multiple servers. In some situations, such as self-serve, there are as many servers as there
are customers (since each customer is a server, although there might be a limited number
of self-service stations). Each server who provides essentially the same service to
customers at a given phase of the service process is called a “channel.” If three servers
serve customers waiting in a single queue, then we have a three-channel system.
“Queuing theory” is an analytical method for estimating the performance of a queuing system
under certain assumptions. System performance may include length of the queue in terms of
time or number of customers, utilization of workers which is the percentage of the time they are
busy serving customers.
Herein we will only consider a relatively simple set of assumptions. For more complex systems
you will need to see books on queuing theory. We may make the following assumptions:
These assumptions define what is known as a M/G/1 queuing system. M means there is an
exponential arrival process, G means there is a general probability distribution for the service
process, and 1 means there is one server.
We might also define the “system” to include the queue and the server station. Therefore, the
number of customers in the system include the customers waiting in the queue and any customer
being served.
With all of these equations, it is essential that you use consistent units of measure. If, for
example, your lambda is in customers per minute and your mu is in customers per hour, the
resulting calculations will be meaningless. (To convert customers per minute to customers per
hour multiply by 60.)
A special case of the M/G/1 model is the M/D/1 model, which shares all of the same assumptions
except for now assuming that the service time is a constant amount, not varying from one
customer to the next. (The “D” stands for deterministic, which is that the service time has no
randomness.) A constant service time simply implies that there is no variance in the service
time, which is that σ2 = 0. If we substitute σ2 = 0 to the M/G/1 equations above we get the
M/D/1 queuing equations:
The M/D/1 queuing equations might be used in a highly-automated service process, such as an
automatic car wash. If multiple services are offered and customers randomly choose different
services of different service times, then it would probably be necessary to use the M/G/1
equations.
If the average service rate (μ) is greater than the average arrival rate (λ) will customers still wait
in line? Initially we may think that we are serving customers faster than they are arriving, so
customers will not have to wait, but that is incorrect. The fact is, λ and μ are average rates.
During some time periods, more than λ customers will randomly arrive, and it is possible that
some customers will take longer than 1/μ to be served. Over time, this random nature of arrivals
and service means that the average line length will be greater than zero.
If, on the other hand, arrivals were not random but occurred exactly λ per time period, and if the
service rate was exactly μ all the time, then customers would not have to wait. However, that
situation is unlikely to occur in interactive service processes.
Quantitative Tools for Service Operations Management by Dr. Scott Sampson ©2012 (return to
menu)