0% found this document useful (0 votes)
28 views

Evolving Heuristics For Dynamic Vehicle Routing With Time Windows Using Genetic Programming

This document summarizes a research paper on evolving heuristics for solving the dynamic vehicle routing problem with time windows (DVRPTW) using genetic programming. The paper proposes a meta-algorithm that uses heuristics to generate routes for new requests in real-time. It manually designs some initial heuristics and uses genetic programming to automatically evolve new heuristics that outperform the manual ones. The goal is to develop an approach that can generate effective heuristics for solving DVRPTW without requiring manual design.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

Evolving Heuristics For Dynamic Vehicle Routing With Time Windows Using Genetic Programming

This document summarizes a research paper on evolving heuristics for solving the dynamic vehicle routing problem with time windows (DVRPTW) using genetic programming. The paper proposes a meta-algorithm that uses heuristics to generate routes for new requests in real-time. It manually designs some initial heuristics and uses genetic programming to automatically evolve new heuristics that outperform the manual ones. The goal is to develop an approach that can generate effective heuristics for solving DVRPTW without requiring manual design.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Evolving Heuristics for Dynamic Vehicle Routing

with Time Windows Using Genetic Programming


Josiah Jacobsen-Grocott, Yi Mei, Gang Chen, Mengjie Zhang
∗ School of Engineering and Computer Science, Victoria University of Wellington
PO Box 600, Wellington 6140, New Zealand
{yi.mei, aaron.chen, mengjie.zhang}@ecs.vuw.ac.nz

Abstract—Dynamic vehicle routing problem with time windows fulfillment of the time windows. Whenever a new request
is an important combinatorial optimisation problem in many arrives, it is necessary to decide whether to accept or reject the
real-world applications. The most challenging part of the problem new request immediately so that the customer will be notified
is to make real-time decisions (i.e. whether to accept the newly
arrived service requests or not) during the execution of the routes. in time. This raises the Dynamic VRPTW (DVRPTW) [9].
It is hardly applicable to use the optimisation methods such In the dynamic environment, the commonly used solution-
as mathematical programming and evolutionary algorithms that optimisation algorithms such as evolutionary algorithms [10],
are competitive for static problems, since they are usually time- variable neighborhood search [11] and ant colony optimisation
consuming, and cannot give real-time responses. In this paper, [12] are hardly applicable due to their high computational
we consider solving this problem using heuristics. A heuristic
gradually builds a solution by adding the requests to the end of complexity. They are based on iterative search framework,
the route one by one. This way, it can take advantage of the latest which normally takes time to reach high-quality solutions. On
information when making the next decision, and give immediate the contrary, heuristics (e.g. the savings heuristic [13]) can
response. In this paper, we propose a meta-algorithm to generate give immediate and reasonably good responses, and thus are
a solution given any heuristic. The meta-algorithm maintains a well suited for the dynamic environment. The recent advance
set of routes throughout the scheduling horizon. Whenever a
new request arrives, it tries to re-generate new routes to include of telecommunication technologies also makes it much easier
the new request by the heuristic. It accepts the new request to communicate between the management centre and the
if successful, and reject otherwise. Then we manually designed vehicles. Therefore, more and more studies consider solving
several heuristics, and proposed a genetic programming-based dynamic VRPs using heuristics (or so-called routing policies)
hyper-heuristic to automatically evolve heuristics. The results [14].
showed that the heuristics evolved by genetic programming
significantly outperformed the manually designed heuristics.
There have been a number of works for designing differ-
ent types of heuristics for constructing routes or modifying
I. I NTRODUCTION the predetermined routes in real time [14]. However, the
The Vehicle Routing Problem (VRP) [1] is an important effectiveness of a heuristic largely depends on the scenario,
combinatorial optimisation problem that has a wide range of objective(s), and even the graph topology. Therefore, it is
real-world applications in supply chain and logistics-related hard to manually design an effective heuristic for a particular
areas. It is to design a set of routes, each for a vehicle, to problem scenario of interest.
serve the given requests from different locations subject to Recently, automated design of heuristics using hyper-
some constraints such as the capacity constraint. VRP has heuristics [15] has attracted more and more research interests.
a number of variations, among which the VRP with Time Genetic Programming (GP) [16], [17] is one of the most
Windows (VRPTW) [2], [3] is one of the most commonly commonly used hyper-heuristic method due to its flexible
investigated one. In VRPTW, each request has a time window representation. It has been used for automatically evolving
that specifies the desired period that the customer likes the heuristics for a wide range of combinatorial optimisation prob-
service to occur. Then, VRPTW aims to design routes that lems such as production scheduling [18], knapsack problem
minimises the violation of the time windows of the requests. [19], timetabling problem [20] and arc routing problem [21].
There have been extensive studies for solving VRPTW, and However, no effort has been made for DVRPTW so far.
many effective algorithms (e.g. [4]–[8]) have been proposed In this paper, we aim to develop a GP-based hyper-heuristic
to solve it. approach for automatically evolving heuristics for DVRPTW.
In reality, the environment is usually dynamic and new To be specific, the paper consists of the following goals:
requests may arrive in real time during the execution of the • Develop a meta-algorithm that can generate a solution
routes. A typical example is the same-day pick-up and delivery given any heuristic and any problem instance;
service provided by the delivery companies. In this case, the • Manually design heuristics for DVRPTW based on do-
vehicles are sent out to serve the existing requests, and new main knowledge. These manually designed heuristics will
requests may arrive while the vehicles are still on their ways. be used as the baseline benchmark heuristics;
The company aims to accept as many of the new requests • Develop a GP-based Hyper-Heuristic (GPHH) for auto-
as possible subject to the capacity of the vehicles and the matically evolving heuristics;

978-1-5090-4601-0/17/$31.00 c 2017 IEEE


1948
• Investigate the effectiveness of the GP-evolved heuristics bound of) the time window of the request (time window
in comparison with the manually designed heuristics; constraint);
• Investigate the effectiveness of different routing strategies If a request cannot be served feasibly (i.e. in terms of
in the meta-algorithm including the driving-first and the capacity and time window constraints), then it has to be
waiting-first strategies [22]. rejected. The objective of the problem is to find a feasible
The rest of the paper is organised as follows: Section II solution that maximises the number of accepted requests.
gives the background introduction of the problem and related A solution S to DVRPTW can be represented as a
work. Section III describes the proposed GPHH method for set of routes S = {R1 , R2 , . . . , Rk }. Each route Ri =
evolving heuristics for DVRPTW. Then, Section IV shows (τ0 , τi1 , . . . , τi,li , τ0 ) is a sequence of requests, where τ0 =
the experimental studies and discussions. Finally, Section V (v0 , 0, 0, 0, 0, T ) represents a special request of visiting the
concludes the paper and discusses about future directions. depot node. Then, the problem can be formulated as follows:

II. BACKGROUND k
X
A. Dynamic Vehicle Routing Problem with Time Windows max lk , (1)
i=1
DVRPTW was formally defined in [23]. In DVRPTW,
s.t.: τij 6= τuv , ∀i 6= u or j 6= v, i ∈ {1, . . . , k}, j ∈ {1, . . . , li },
suppose we have a connected graph G = (V, E), where V is
(2)
the vertex set and E is the edge set. There is a special vertex
li
v0 ∈ V which is called the depot vertex. There are k vehicles, X
each with capacity Q, located at the depot vertex for serving d(τij ) ≤ Q, ∀i ∈ {1, . . . , k}, (3)
j=1
the requests. Each pair of vertex (vi , vj ) ∈ V ×V is associated
with a travel time c(vi , vj ). Without loss of generality, we l(τij ) ≤ ts (τij ) ≤ u(τij ), i ∈ {1, . . . , k}, j ∈ {1, . . . , li },
assume that the graph G is fully connected, and c(vi , vj ) (4)
indicates the travel time of the shortest path from vi to vj , ts (τij ) ≥ arr(τij ), i ∈ {1, . . . , k}, j ∈ {1, . . . , li }, (5)
∀ vi , vj ∈ V . τij ∈ Υ, i ∈ {1, . . . , k}, j ∈ {1, . . . , li }. (6)
The request arrival process is defined as a discrete event
simulation within a time horizon [0, T ]. Each request τ is where Eq. 1 is the objective function, which is maximising the
characterised as τ = (v(τ ), t(τ ), s(τ ), d(τ ), l(τ ), u(τ )), where number of accepted requests. Note that in [23], the objective
v(τ ) ∈ V \ v0 is the vertex (location) where the request is was defined as minimising the rejected requests. Given the
invoked, t(τ ) means the time when the request is invoked, s(τ ) same number of requests in total, these two objectives are
is the service time of the request, d(τ ) indicates the demand equivalent. Eq. (2 means that each request is served exactly
of the request, and l(τ ) and u(τ ) stand for the lower and once by one vehicle. Eq. (3 indicates the capacity constraint,
upper bounds of the time window of the request. Obviously, and Eq. (4) refers to the time window constraint, where ts (τ )
0 ≤ t(τ ) ≤ l(τ ) ≤ u(τ ) ≤ T . The request set Υ contains stands for the time to start serving τ . Eq. (5) specifies that
a subset Υ1 of requests that already exist at the beginning the service of each request cannot be started until the vehicle
of the period and a subset Υ2 of requests that arrive during arrives its location, where arr(τ ) indicates the time that the
the execution of the routes. That is, t(τ ) = 0, ∀τ ∈ Υ1 and vehicle arrives v(τ ). It is decided as follows:
t(τ ) > 0, ∀τ ∈ Υ2 . The information of a request is not known
until it arrives. arr(τi1 ) ≥ max{0, t(τi1 )} + c(v0 , v(τi1 )),
The solution of DVRPTW can be seen as a decision making ∀i ∈ {1, . . . , k}, (7)
process, in which a decision is made whenever a new request
arrives or a vehicle becomes available to serve the next arr(τij ) = dep(τi,j−1 ) + c(v(τi,j−1 ), v(τij )),
request. When a new request arrives, a decision needs to be
∀i ∈ {1, . . . , k}, j ∈ {2, . . . , li }, (8)
made on whether to accept or reject the new request while
not changing the decisions that have already made for the
previous requests (including the requests in Υ1 ). When a dep(τij ) ≥ ts (τij ) + s(τij ),
vehicle becomes available and there are requests waiting in the ∀i ∈ {1, . . . , k}, j ∈ {1, . . . , li − 1}. (9)
queue, a decision needs to be made on which is its next request
to serve. The solution has to satisfy the following constraints: dep(τi,j−1 ) ≥ t(τij ),
• Each vehicle starts and ends its route at the depot;
∀i ∈ {1, . . . , k}, j ∈ {2, . . . , li }. (10)
• Each request is served exactly once by one vehicle (no
interruption); where dep(τ ) represents the departure time of the vehicle after
• The total demand of the requests served by each vehicle finishing the service of τ . Eq. (7) means that for each route, the
does not exceed its capacity (capacity constraint); arrival time of the first request is no earlier than the travel cost
• The starting time of each service cannot be outside from v0 to its location since its arrival (0 if it already exists
(earlier than the lower bound or later than the upper initially). Eq. (8) indicates that for each subsequent request,

1949
the arrival time equals the departure time of its predecessor for heuristics to operate in; (2) decide on the terminal and
plus the travel time in between. Eq. (9) means that the vehicle function sets and (3) identify the fitness function.
cannot depart until it finishes serving the request. Eq. (10) Weise et al. [21] proposed a GPHH approach for the Arc
means that the vehicle cannot go to the next request before its Routing Problem (ARP), which is the counterpart of VRP that
arrival. serves edges/arcs instead of nodes. They proposed a meta-
algorithm that can generate a feasible solution given any
B. Related Work on Dynamic Vehicle Routing instance, and defined a heuristic as a priority function of each
Pillac et al. [9] and Ritzinger et al. [14] gave two compre- unserved arc. The proposed GPHH showed promising results
hensive surveys for DVRP. They introduced different problem on both static benchmark instances and stochastic instances in
variations such as deterministic and stochastic DVRPs, as comparison with manually designed heuristics.
well as commonly used approaches for solving DVRP. For Sim and Hart [29] proposed a hyper-heuristic approach for
example, one can divide the whole time horizon into short time VRP, which is a combination of a GP-evolved generative
slices, and periodically solve the corresponding optimisation heuristic and a perturbative heuristic to further improve the
problem at the beginning of each time slice (e.g. [24]). solution generated by the generative heuristic. The results
Another alternative is to keep a pool of good and diversely showed that the proposed hyper-heuristic performed compet-
distributed solutions during the search process, so that when itively when applied to solve a wide range of different types
the environment changes, at least one of the solutions in the of VRP instances. However, there is no GPHH proposed for
pool still performs well. An example is the adaptive memory DVRPTW so far. In this paper, we fill this gap by propose a
proposed by Taillard et al. [25]. new GPHH that considers both dynamic request arrivals and
During the execution process, a number of heuristics have time window constraint.
been proposed to decide the next customer to visit, such
as the consensus, expectation and regret methods. The con- III. G ENETIC P ROGRAMMING H YPER -H EURISTIC FOR
sensus method [26] selects the customer that appears the DVRPTW
most frequently in the past history or sampled scenarios. The There are two major differences between DVRPTW and
expectation method [26] evaluates the cost of visiting each cus- other VRP variants. First, in VRP, while adding the new
tomer by forcing its visit and then optimise for the remaining customers, it is possible to remove some less profitable cus-
customers. The regret method [27] is an approximation of the tomers from the remaining routes. However, in DVRPTW,
expectation method. when deciding whether to accept each new request or not, all
Note that after deciding the next customer to go, there is still the requests that have already been accepted cannot be rejected
another issue to determine the departure time. For example, again. That is, rejecting a request that is previously accepted
the vehicle can depart immediately to arrive the next customer will cause an infinite penalty. Second, in many VRP variants, it
as soon as possible (i.e. drive-first) or wait at the current may not be necessary to make acceptance/rejection decisions
location for a while (i.e. wait-first), as long as all the remaining for all the existing customers. However, in DVRPTW, such
customers can still be served in time. The waiting strategy decision has to be made for each request as soon as it arrives.
has been demonstrated to be effective in many scenarios (e.g. Due to these two differences, the existing meta-algorithms
[28]). (i.e. frameworks that heuristics operate on) cannot be applied
Saint-Guillain et al. [23] proposed the DVRPTW, and pro- directly to DVRPTW. Therefore, we developed a new meta-
posed to solve the problem using a Multi Scenario Approach algorithm for DVRPTW.
(MSA). The main idea is to randomly sample a number of
scenarios based on certain distribution, and conduct robust A. Meta-algorithm
optimisation so that the obtained solution performs well on The proposed meta-algorithm can be seen as a decision
all the sampled scenarios. Then, whenever a decision needs process. Whenever a new request arrives, a decision needs to
to be made (i.e. a new request arrives or a vehicle becomes be made on whether to accept or reject the new request without
available), a short re-optimisation is carried out by means of an rejecting any requests that have already been accepted before.
online decision rule called the Global Stochastic Assessment Note that at the beginning of the scheduling horizon, there
(GSA) rule. may already exists a set of initial requests (i.e. Υ1 6= ∅). In
this case, the acceptance/rejection decisions are made for all
C. Genetic Programming for Evolving Heuristics the initial requests at the beginning.
GP has been used for evolving heuristics for many chal- Note that in DVRPTW, the requests that have been accepted
lenging combinatorial optimisation problems. For example, before cannot be rejected again. In other words, when accept-
Branke et al. gave a review for evolving production scheduling ing a new request, one has to make sure all the previously
dispatching rules with GP [18]. Burke et al. [17] explored the accepted requests can still be served. For this purpose, we
potential of GP as a hyper-heuristic for evolving heuristics, and maintain a complete feasible solution for serving all the
showed the applications to the SAT problem and online bin accepted requests at all times. At the beginning, we generate
packing. This paper pointed out the key steps for designing a a feasible solution (by heuristic or optimisation method) that
hyper-heuristic: (1) designing a framework (meta-algorithm) serves as many initial requests as possible, and reject the initial

1950
requests that cannot be served. Then, whenever a new request is incremented to the time when the next vehicle becomes
arrives, we generate a new solution based on the current available. Then, the servable requests are identified (line 4). A
vehicle states and request set plus the new request. If the new request is servable to a vehicle if (1) the vehicle has sufficient
solution successfully serves all the requests, then we accept remaining capacity to serve it, and (2) the vehicle can arrive
the new request, otherwise we reject it. the location no later than the upper bound of the time window.
In this algorithm, a solution S is represented as a sequence If there is no servable request for the vehicle, it goes back
of actions, i.e. S = (π1 , π2 , . . . ), where each action π = to the depot, refill its capacity and wait for potential future
htyπ , stπ , etπ , veπ , τπ i is characterised by its type tyπ , start requests. Otherwise, we select the request with the minimal
time stπ , end time etπ , vehicle veπ and request τπ . Here we value of the heuristic value h(·).
define two types of actions: (1) travel from one node to another After deciding the next request, the departure time of the
and (2) serve a request. vehicle is decided (line 10). The decision on the departure
Given the above representations, the proposed meta- time is not a trivial task, since there may be a wide fea-
algorithm is described in Algorithm 1. sible range to choose from. It is known that t ≤ dep ≤
u(τ 0 ) − c(loci∗ , v(τ 0 )). In this case, we consider two different
Algorithm 1: The meta-algorithm of the GPHH for DVRPTW. strategies for deciding the departure time: (1) driving-first and
Input: A problem instance I, a heuristic h(·). (2) wait-first. The two strategies are described as follows:
Output: A solution S.
// Initialise the state • Driving-first strategy: the vehicle departs as soon as
1 Set t ← 0, Υt ← Υ1 ; possible, and waits at the request location if necessary.
2 for i = 1 → k do
That is, dep ← t;
3 Set vehicle location at loci ← v0 , remaining capacity
Q̄i ← Q, available time ai ← 0; • Wait-first strategy: the vehicle waits at the current location
4 end if it is expected to arrive earlier than the time window.
That is, dep ← max{l(τ 0 ) − c(loci∗ , v(τ 0 )), t}.
Qk
5 Set the state Ω ← i=1 hloci , Q̄i , ai i;
// Generate a solution for initial requests
6 if Υt 6= ∅ then S ← InitialSolution(Ω, Υt ) ;
// Online decision making
7 while t ≤ T do Algorithm 2: S ← GenerateSolution(t, Ω, Υt )
8 Extract the next arrived request τ 0 from the random arrival Input: Current time t, state Ω and request set Υt .
process; Output: A solution S.
9 for unfinished action π ∈ S do 1 while t ≤ T do
10 if st(π) < t(τ 0 ) then Execute π and update Ω ; 2 Find the next available vehicle and time
11 end (i∗ , a∗ ) = minki=1 {ai };
12 Υt ← Υt ∪ τ 0 , S 0 ← S; 3 t ← a∗ ;
13 S ← GenerateSolution(t, Ω, Υt ); 4 Ῡt ← ServableRequests(Υt , i∗ );
14 if S serves all the requests in Υt then 5 if Ῡt = ∅ then
15 Accept τ 0 ; // Go back to depot
16 else 6 S ← (S, htravel, t, t + c(loci∗ , v0 ), i∗ , ∅i);
17 Reject τ 0 , Υt ← Υt \ τ 0 , S ← S 0 ; 7 Q̄i∗ ← Q, ai∗ ← t + c(loci∗ , v0 ), loci∗ ← v0 ;
18 end 8 else
19 t ← t(τ 0 ); // Select the request with the minimal
20 end heuristic value
21 return S;
9 τ 0 ← arg minτ ∈Ῡ {h(τ )};
10 Decide the departure time dep;
11 arr ← dep + c(loci∗ , v(τ 0 ));
In line 10, an action updates the state Ω as follows: 12 ts ← max{arr, l(τ 0 )};
• When a vehicle travels from one node to another, its 13 S ← (S, htravel, dep, dep + c(vcurr , v(τ 0 )), i∗ , ∅i);
location is changed to the new node and its available 14 S ← (S, hserve, ts , ts + s(τ 0 ), i∗ , τ 0 i);
15 loci∗ ← v(τ 0 ), Q̄i∗ ← Q̄i∗ − d(τ 0 ), ai∗ ← ts + s(τ 0 );
time becomes the end time of the action;
16 end
• When a vehicle serves a request, its remaining capacity is 17 end
deducted by the demand of the request and its available 18 return S;
time becomes the end time of the action. The request is
removed from the request set.
In Algorithm 1, a key component is the function The initial solution S is generated by the function
GenerateSolution(·), which generates a complete solu- InitialSolution(·). The function can be
tion based on the current state Ω and request set Υt . Since defined as a constructive heuristic or a search-based
the acceptance/rejection decision of the new request needs to optimisation method. For the sake of simplicity, in
be made immediately, we choose an efficient heuristic which our experiment we generate the initial solution using
is similar to those used for VRP [29] and ARP [21]. The Algorithm 2. That is, InitialSolution(Ω, Υt ) =
algorithm is described in 2. At each step, the simulation time GenerateSolution(0, Ω, Υt ).

1951
B. Manually Designed Heuristics • Expected number of future requests at v(τ ) (NFR)
From Algorithm 2, it can be seen that the heuristic function • Probability that a new request arrives at v(τ ) within the
h(·) plays an important role in generating a high-quality solu- next T /6 time (PNR)
tion. Here, we design three simple heuristic functions manually These two terminals are calculated based on the stochastic
based on our domain knowledge and intuition. They are the information (temporal distributions of requests during the
earliest-first (EF), urgent-first (UF) and linear-combination horizon) given by the dataset. Based on the terminals, we
(LC) heuristic functions. They are defined as follows: propose the following two GP versions:
• EF heuristic: first serve the request that can be started the • GP1 with the terminals shown in Table I;
earliest, i.e. h(τ ) = max{t + c(loc, v(τ )), l(τ )}, where • GP2 with the terminals shown in Table I as well as NFR
loc is the current location of the vehicle; and PNR.
• UF heuristic: first serve the request that is the most urgent, For the function set, we used the basic arithmetic operators
i.e. whose time window will close earliest. h(τ ) = u(τ ); including addition, subtraction, multiplication and protected
• LC heuristic is a more sophisticated heuristic that con- division (returns 1 if denominator is zero), along with the non-
siders a number of attributes including the following linear operators max(·, ·) and exp(·).
attributes: A1 = max{t + c(loc, v(τ )), l(τ )} (EF heuris- The fitness of a GP individual is simply defined as the
tic), A2 = u(τ ) (UF heuristic), A3 = c(loc, v(τ )) average objective value over a set of training instances. Given
(travel time from the current location), A4 = (1 − a set of training instances Itr , the fitness function of a heuristic
Q̄/Q)c(v(τ ), v0 ) (travel time to the depot times the function h is defined as follows:
current normalised load), A5 = d(τ ) (demand), and 1 X
A6 = minki=1 {c(loci , v(τ ))} (travel time to the nearest f it(h) = f (S(I, h)), (11)
|Itr |
vehicle). The heuristic function is defined as a weighted I∈Itr
linear combination as follows: h(τ ) = A1 + A2 + A3 + where S(I, h) is the solution obtained by applying the meta-
0.1A4 + 0.1A5 − A6 . algorithm (Algorithm 1) to the instance I and heuristic h, and
In the experimental studies, these three heuristic will be used f (S) indicates the number of rejected requests in the solution
as the baseline heuristics to compare with the GP-evolved S.
heuristics.
IV. E XPERIMENTAL S TUDIES
C. Genetic Programming-based Hyper-Heuristic
We use the benchmark instances designed by Saint-Guillain
In the GPHH, a heuristic function is represented as a syntax
et al. [23]. The benchmark consists of 6 classes, with the
tree. The terminal set is given in Table I. The vertex density
degree of dynamism from ranging low to high. Our experi-
reflects the density of vertices around the given vertex, where
ments focus on classes 4, 5 and 6, which are highly dynamic
the scaling parameter s was set to s = max{d(vi , vj )}/100
instances (with average degree of dynamism of 57% for
after some parameter tuning. The vehicle density indicates the
class 4, 81% for class 5 and 100% for class 6). Each class
density of vehicles around the given vertex. It was defined in
contains 3 scenarios (namely rc101, rc102 and rc104), which
the same way as the vertex density, except that the scaling
are based on the same graph with 100 nodes. Each scenario
parameter was set to s0 = max{d(vi , vj )}/8.
has the same distribution of request arrivals. In the datasets,
for each scenario, 5 instances were randomly sampled from
TABLE I
T HE TERMINALS USED IN THE GPHH FOR DVRPTW
the distribution. More details of the benchmark instances can
be found in [23].
Notation Description In our experiments, we train a heuristic function using GP
TD Normalised travel time to the depot c(v(τ ), v0 )/T for each of the 9 scenarios (3 classes, each with 3 scenarios).
T Normalised travel time from current location c(loc, v(τ ))/T For each scenario, the 5 instances given in the dataset were
OT Normalised relative open time max{l(τ ) − t, 0}/T used as the test instances. At each generation of the GP
CT Normalised relative close time (u(τ ) − t)/T
ST Normalised service time s(τ )/T process, we randomly sample 20 training instances from the
Q Normalised remaining capacity Q̄/Q distribution of the scenario. We change the random seed
DEM Normalised demand d(τ )/Q for sampling the training set at each generation to prevent
2
c(v 0 ,v(τ ))
 
P −1
2 s overfitting.
VTD Vertex density v 0 ∈V,v 0 6=v(τ ) e
−1
c(loci ,v(τ )) 2
  The parameter setting of GP is as follows: the population
Vehicle density ki=1,i6=i∗ e 2 s0
P
VHD size is set to 1024. The maximal depth is 8. The crossover,
TOV k
Travel time from nearest other vehicle min i=1 {c(loci , v(τ ))}
i6=i∗
mutation and reproduction rates are 0.8, 0.15 and 0.05 respec-
1 The constant 1 tively. The best 10 individuals are considered as elitists. The
parents are selected by tournament selection of size 7. The
In addition to the above terminals, we designed two more number of generations is set to 25. The algorithms were run for
terminals about the probabilistic information of the requests 30 times independently on desktops with Intel(R) Core(TM)
as follows: i7 CPU @3.60GHz.

1952
A. Results and Discussions training instances successfully prevented overfitting. The test
performance also converged after generation 15.
Tables II–IV show the test performance (with the format of
“mean(std)”) of the proposed GP1 and GP2 methods as well
as the three manually designed heuristics on the Class 4–6
instances. As mentioned in Algorithm 2, we consider either
drive-first or wait-first strategies for deciding the departure
time. For each test instance, we conducted Wilcoxon’s rank
sum test with significance level of 0.05, and highlight the
tables as follows:
• For drive-first/wait-first strategy, if the results of
GP1/GP2 is significantly better than all the manually
designed heuristics, then the results are highlighted in
bold;
• For drive-first/wait-first strategy, if the results of one GP
approach is significantly better than the other, then the
better results are marked with ∗ ;
• For each heuristic, if drive-first (wait-first) strategy is Fig. 1. Convergence curves of GP2 on Class-6 rc101 using wait-first strategy.
better than the other, then the better results are marked
with underline. Fig. 2 shows the average frequency of terminals used in
From the tables, we have the following observations: the GP2-evolved heuristics over the 30 runs on Class-6 rc104
using wait-first strategy. All the other cases show similar
• GP1 and GP2 can evolve significantly better heuristics patterns. It can be seen that CT and T are the most important
than the manually designed heuristics in most cases. As terminals in the heuristic, followed by OT. This makes sense,
the degree dynamism increases, such advantage becomes since CT is equivalent to the UF heuristic, and max{T, OT}
more obvious. For example, when using the drive-first is essentially equivalent to the EF heuristic. On the contrary,
strategy, GP1 obtained significantly better results for 5 NFR and PNR were rarely used in the heuristics, which means
Class-4 instances, 6 Class-5 instances and 14 Class-6 GP failed to effectively use these terminals. A wiser way of
instances. incorporating the probability information is needed. Finally,
• GP1 and GP2 obtained similar test performance in most the constant terminal was not often used, indicating that it can
cases. The only exceptions are Class-5 rc101-4 with be removed without much affecting the performance of GP.
drive-first strategy and rc101-2 with wait-first strategy,
Class-6 rc101-3 and rc102-2 with wait-first strategy. GP1
outperformed GP2 in three out of these four exceptions.
This implies that simply including the probability infor-
mation as terminals failed to help the discovery of better
heuristics.
• The wait-first strategy is generally significantly better
than the drive-first strategy. There are much more under-
lines in the wait-first side than the drive-first side. This
observation is consistent with intuition and those found
in other literatures such as [22], [23].
• The UF heuristic is usually worse than the EF and LC
heuristics. When the degree of dynamism is not high
(Class-4), the EF heuristic performs better than LC more
often. However, as the degree of dynamism increases
(Classes 5 and 6), the LC heuristic performs better.

Fig. 2. The average frequency of terminals used in the GP2-evolved heuristics


B. Further Analysis on Class-6 rc104 using wait-first strategy.
Fig. 1 shows the convergence curves of GP1 on Class-6
rc101 using the wait-first strategy. The other instances and
V. C ONCLUSIONS AND F UTURE W ORK
other algorithms show similar patterns. From the figure, we
have two observations. First, the search almost converged at This paper solves the Dynamic Vehicle Routing Problem
around generation 15. Therefore, 25 generations is enough to with Time Windows (DVRPTW), which requires an immediate
guarantee convergence. Second, the test curve is consistent decision on accepting/rejecting the newly arrived request in
with the training curve. This implies that the rotation of real time. We consider solving DVRPTW using a GP-based

1953
TABLE II
T HE TEST PERFORMANCE OF THE COMPARED ALGORITHMS ON THE C LASS -4 INSTANCES .

Drive-first Wait-first
Instance EF UF LC GP1 GP2 EF UF LC GP1 GP2
rc101-1 11.0 13.0 20.0 12.47(4.03) 10.80(4.28) 12.0 8.0 20.0 5.23(1.45) 4.80(1.79)
rc101-2 14.0 15.0 13.0 12.87(3.77) 12.90(4.25) 10.0 12.0 21.0 7.70(2.91) 7.20(2.35)
rc101-3 7.0 12.0 17.0 8.80(3.67) 8.53(3.66) 5.0 12.0 11.0 4.23(1.74) 3.70(1.86)
rc101-4 6.0 15.0 17.0 12.07(4.70) 12.53(3.95) 7.0 8.0 5.0 5.97(1.94) 6.20(1.56)
rc101-5 24.0 22.0 23.0 14.67(5.01) 14.53(4.40) 14.0 12.0 16.0 7.47(1.36) 7.77(2.05)
avg 12.4 15.4 18.0 12.17(2.20) 11.86(1.97) 9.6 10.4 14.6 6.12(1.04) 5.93(0.93)
rc102-1 18.0 19.0 15.0 8.83(6.34) 7.40(4.57) 7.0 8.0 8.0 2.57(2.03) 1.97(1.38)
rc102-2 19.0 17.0 7.0 11.33(5.78) 9.77(5.30) 8.0 13.0 11.0 10.50(3.65) 9.90(4.00)
rc102-3 14.0 20.0 11.0 11.87(6.63) 12.13(4.92) 21.0 14.0 16.0 11.43(4.75) 11.23(3.04)
rc102-4 16.0 19.0 6.0 8.77(2.91) 8.90(2.58) 7.0 12.0 2.0 7.87(3.55) 7.50(3.71)
rc102-5 9.0 14.0 17.0 5.60(4.38) 7.27(4.76) 19.0 15.0 9.0 4.57(3.32) 4.23(3.49)
avg 15.2 17.8 11.2 9.28(2.65) 9.09(2.06) 12.4 12.4 9.2 7.39(1.36) 6.97(1.39)
rc104-1 14.0 46.0 16.0 8.60(5.79) 8.70(6.28) 6.0 46.0 8.0 6.80(3.93) 7.90(7.69)
rc104-2 20.0 38.0 7.0 6.67(5.84) 5.37(3.85) 17.0 33.0 7.0 9.80(5.73) 8.93(8.12)
rc104-3 33.0 36.0 19.0 8.00(5.02) 6.87(5.28) 19.0 39.0 34.0 8.30(5.34) 7.40(4.14)
rc104-4 24.0 45.0 2.0 10.67(6.26) 11.43(6.03) 24.0 44.0 12.0 8.17(4.23) 10.13(6.56)
rc104-5 11.0 38.0 23.0 10.20(5.10) 10.73(4.72) 8.0 41.0 2.0 8.70(4.19) 9.10(4.78)
avg 20.4 40.6 13.4 8.83(2.50) 8.62(2.34) 14.8 40.6 12.6 8.35(2.14) 8.69(2.94)

TABLE III
T HE TEST PERFORMANCE OF THE COMPARED ALGORITHMS ON THE C LASS -5 INSTANCES .

Drive-first Wait-first
Instance EF UF LC GP1 GP2 EF UF LC GP1 GP2
rc101-1 15.0 15.0 22.0 13.53(4.67) 13.23(5.48) 13.0 14.0 19.0 10.67(4.69) 10.13(3.59)
rc101-2 14.0 13.0 16.0 10.97(5.36) 8.90(3.32) 6.0 8.0 4.0 8.43(3.49) 6.23(2.66)*
rc101-3 9.0 21.0 10.0 10.90(7.48) 12.27(5.34) 8.0 8.0 19.0 5.60(3.51) 6.23(4.17)
rc101-4 8.0 17.0 19.0 7.80(4.32)* 9.10(3.69) 6.0 10.0 19.0 7.47(3.06) 7.37(3.27)
rc101-5 20.0 19.0 16.0 15.27(5.75) 13.47(4.61) 17.0 15.0 20.0 10.97(3.52) 9.83(3.68)
avg 13.2 17.0 16.6 11.69(3.23) 11.39(2.34) 10.0 11.0 16.2 8.63(1.73) 7.96(1.50)
rc102-1 14.0 27.0 6.0 11.50(6.21) 10.83(4.93) 10.0 19.0 2.0 5.97(4.68) 6.00(3.49)
rc102-2 19.0 33.0 30.0 5.93(3.78) 5.13(3.14) 18.0 15.0 8.0 3.17(2.96) 3.43(2.46)
rc102-3 8.0 20.0 5.0 6.60(4.85) 7.00(5.14) 5.0 3.0 3.0 3.00(2.12) 2.97(2.28)
rc102-4 7.0 17.0 7.0 3.90(2.67) 3.87(2.29) 2.0 10.0 7.0 3.30(3.06) 3.27(2.16)
rc102-5 22.0 16.0 14.0 1.77(2.08) 2.10(2.09) 10.0 14.0 6.0 4.57(3.40) 4.27(2.92)
avg 14.0 22.6 12.4 5.94(1.86) 5.79(1.95) 9.0 12.2 5.2 4.00(1.48) 3.99(1.25)
rc104-1 11.0 36.0 5.0 10.20(4.16) 9.87(4.25) 22.0 32.0 13.0 14.77(3.74) 13.57(4.22)
rc104-2 29.0 42.0 22.0 25.90(7.01) 23.30(6.90) 32.0 39.0 27.0 19.50(5.53) 21.30(8.17)
rc104-3 11.0 39.0 4.0 7.50(5.73) 8.63(7.82) 19.0 33.0 6.0 5.07(2.98) 5.07(3.42)
rc104-4 27.0 33.0 12.0 11.10(5.12) 11.80(5.44) 22.0 37.0 13.0 16.57(4.78) 16.90(5.62)
rc104-5 15.0 46.0 16.0 6.70(6.50) 7.03(6.03) 28.0 32.0 8.0 9.07(7.10) 12.13(8.81)
avg 18.6 39.2 11.8 12.28(2.54) 12.13(2.75) 24.6 34.6 13.4 12.99(1.60) 13.79(3.60)

Hyper-Heuristic (GPHH). To this end, we proposed a meta- the future, we will investigate wiser ways of using such
algorithm that maintains a set of routes throughout the schedul- information, e.g. manually designing new composite terminals
ing horizon, and updates it by heuristic in an attempt to accept based on the raw information.
new requests. Then we designed three heuristics manually,
and developed a GPHH for automatically evolving heuristics.
R EFERENCES
Experimental results show that the GP-evolved heuristics
significantly outperformed the manually designed heuristics, [1] B. L. Golden, S. Raghavan, and E. A. Wasil, The vehicle routing
and such advantage becomes more obvious as the degree of problem: latest advances and new challenges. Springer Science &
dynamism increases. This demonstrates the efficacy of GPHH Business Media, 2008, vol. 43.
in designing heuristics for DVRPTW. On the other hand, it [2] O. Bräysy and M. Gendreau, “Vehicle routing problem with time
windows, part i: Route construction and local search algorithms,”
is shown that simply including the probability information Transportation science, vol. 39, no. 1, pp. 104–118, 2005.
as terminals is not effective for finding good heuristics. In [3] ——, “Vehicle routing problem with time windows, part ii: Metaheuris-
tics,” Transportation science, vol. 39, no. 1, pp. 119–139, 2005.

1954
TABLE IV
T HE TEST PERFORMANCE OF THE COMPARED ALGORITHMS ON THE C LASS -6 INSTANCES .

Drive-first Wait-first
Instance EF UF LC GP1 GP2 EF UF LC GP1 GP2
rc101-1 35.0 41.0 26.0 16.17(6.08) 18.53(7.42) 20.0 15.0 7.0 12.97(3.44) 11.93(2.85)
rc101-2 32.0 35.0 34.0 12.93(4.35) 14.37(5.48) 20.0 19.0 12.0 11.63(2.76) 12.13(3.17)
rc101-3 38.0 39.0 34.0 18.17(5.50) 19.00(7.60) 19.0 21.0 22.0 12.27(4.09)* 14.67(4.68)
rc101-4 23.0 28.0 28.0 18.07(4.95) 19.70(6.75) 17.0 17.0 18.0 18.50(2.86) 17.73(2.80)
rc101-5 37.0 39.0 35.0 15.53(5.18) 17.10(4.04) 26.0 21.0 13.0 14.33(3.39) 14.13(3.37)
avg 33.0 36.4 31.4 16.17(2.80) 17.74(3.74) 20.4 18.6 14.4 13.94(1.76) 14.12(2.04)
rc102-1 32.0 42.0 32.0 18.17(6.16) 16.00(6.32) 26.0 28.0 26.0 16.73(6.07) 14.03(6.00)
rc102-2 41.0 37.0 37.0 15.70(5.15) 15.63(5.81) 25.0 23.0 10.0 12.87(3.39)* 14.57(3.78)
rc102-3 39.0 38.0 32.0 14.87(3.65) 13.17(3.22) 12.0 16.0 16.0 8.00(3.10) 7.30(3.63)
rc102-4 29.0 29.0 23.0 17.57(6.16) 17.67(6.05) 24.0 17.0 16.0 14.80(2.92) 14.07(3.31)
rc102-5 45.0 46.0 32.0 19.07(5.10) 17.77(4.52) 28.0 23.0 20.0 12.47(4.38) 13.13(5.14)
avg 37.2 38.4 31.2 17.07(3.26) 16.05(3.65) 23.0 21.4 17.6 12.97(1.71) 12.62(2.09)
rc104-1 46.0 56.0 31.0 13.27(8.20) 13.33(7.62) 20.0 43.0 24.0 18.13(4.02) 18.10(4.63)
rc104-2 33.0 56.0 21.0 18.13(8.62) 16.33(6.70) 18.0 44.0 25.0 14.67(2.54) 15.83(4.71)
rc104-3 40.0 55.0 43.0 23.77(7.11) 22.93(5.75) 25.0 50.0 25.0 18.00(6.15) 16.97(3.91)
rc104-4 29.0 52.0 28.0 21.13(6.94) 20.83(6.20) 33.0 44.0 28.0 24.30(3.99) 22.87(4.01)
rc104-5 22.0 51.0 16.0 21.07(5.94) 21.30(5.05) 22.0 38.0 16.0 22.20(5.63) 22.80(4.97)
avg 34.0 54.0 27.8 19.47(3.93) 18.95(2.87) 23.6 43.8 23.6 19.46(2.25) 19.31(1.99)

[4] L. M. Gambardella, É. Taillard, and G. Agazzi, “Macs-vrptw: A multiple programming,” in Computational intelligence. Springer, 2009, pp. 177–
ant colony system for vehicle routing problems with time windows,” 201.
1999. [18] J. Branke, S. Nguyen, C. W. Pickardt, and M. Zhang, “Automated design
[5] P. K. Nguyen, T. G. Crainic, and M. Toulouse, “A tabu search for of production scheduling heuristics: A review,” IEEE Transactions on
time-dependent multi-zone multi-trip vehicle routing problem with time Evolutionary Computation, vol. 20, no. 1, pp. 110–124, 2016.
windows,” European Journal of Operational Research, vol. 231, no. 1, [19] R. Glanville, D. Griffiths, P. Baron, J. H. Drake, M. Hyde, K. Ibrahim,
pp. 43–56, 2013. and E. Ozcan, “A genetic programming hyper-heuristic for the multidi-
[6] T. Vidal, T. G. Crainic, M. Gendreau, and C. Prins, “A hybrid genetic mensional knapsack problem,” Kybernetes, vol. 43, no. 9/10, pp. 1500–
algorithm with adaptive diversity management for a large class of 1511, 2014.
vehicle routing problems with time-windows,” Computers & operations [20] M. Bader-El-Den, R. Poli, and S. Fatima, “Evolving timetabling heuris-
research, vol. 40, no. 1, pp. 475–489, 2013. tics using a grammar-based genetic programming hyper-heuristic frame-
[7] S. Belhaiza, P. Hansen, and G. Laporte, “A hybrid variable neighborhood work,” Memetic Computing, vol. 1, no. 3, pp. 205–219, 2009.
tabu search heuristic for the vehicle routing problem with multiple time [21] T. Weise, A. Devert, and K. Tang, “A developmental solution to (dy-
windows,” Computers & Operations Research, vol. 52, pp. 269–281, namic) capacitated arc routing problems using genetic programming,” in
2014. Proceedings of the 14th annual conference on Genetic and evolutionary
[8] P. Garrido and M. C. Riff, “Dvrp: a hard dynamic combinatorial opti- computation. ACM, 2012, pp. 831–838.
misation problem tackled by an evolutionary hyper-heuristic,” Journal [22] R. Bent and P. Van Hentenryck, “Waiting and relocation strategies in
of Heuristics, vol. 16, no. 6, pp. 795–834, 2010. online stochastic vehicle routing.” in IJCAI, 2007, pp. 1816–1821.
[9] V. Pillac, M. Gendreau, C. Guéret, and A. L. Medaglia, “A review of [23] M. Saint-Guillain, Y. Deville, and C. Solnon, “A multistage stochastic
dynamic vehicle routing problems,” European Journal of Operational programming approach to the dynamic and stochastic vrptw,” in Interna-
Research, vol. 225, no. 1, pp. 1–11, 2013. tional Conference on AI and OR Techniques in Constriant Programming
[10] C. Prins, “A simple and effective evolutionary algorithm for the vehicle for Combinatorial Optimization Problems. Springer, 2015, pp. 357–
routing problem,” Computers & Operations Research, vol. 31, no. 12, 374.
pp. 1985–2002, 2004. [24] R. Montemanni, L. M. Gambardella, A. E. Rizzoli, and A. V. Donati,
[11] J. Kytöjoki, T. Nuortio, O. Bräysy, and M. Gendreau, “An efficient “Ant colony system for a dynamic vehicle routing problem,” Journal of
variable neighborhood search heuristic for very large scale vehicle Combinatorial Optimization, vol. 10, no. 4, pp. 327–343, 2005.
routing problems,” Computers & operations research, vol. 34, no. 9, [25] É. D. Taillard, L. M. Gambardella, M. Gendreau, and J.-Y. Potvin,
pp. 2743–2757, 2007. “Adaptive memory programming: A unified view of metaheuristics,”
[12] B. Yu, Z.-Z. Yang, and B. Yao, “An improved ant colony optimization European Journal of Operational Research, vol. 135, no. 1, pp. 1–16,
for vehicle routing problem,” European journal of operational research, 2001.
vol. 196, no. 1, pp. 171–176, 2009. [26] R. Bent and P. Van Hentenryck, “The value of consensus in online
[13] G. Laporte, M. Gendreau, J.-Y. Potvin, and F. Semet, “Classical and stochastic scheduling.” in ICAPS, vol. 4, 2004, pp. 219–226.
modern heuristics for the vehicle routing problem,” International trans- [27] ——, “Regrets only! online stochastic optimization under time con-
actions in operational research, vol. 7, no. 4-5, pp. 285–300, 2000. straints,” in AAAI, vol. 4, 2004, pp. 501–506.
[14] U. Ritzinger, J. Puchinger, and R. F. Hartl, “A survey on dynamic and [28] G. Ghiani, E. Manni, A. Quaranta, and C. Triki, “Anticipatory algorithms
stochastic vehicle routing problems,” International Journal of Produc- for same-day courier dispatching,” Transportation Research Part E:
tion Research, vol. 54, no. 1, pp. 215–231, 2016. Logistics and Transportation Review, vol. 45, no. 1, pp. 96–106, 2009.
[15] E. K. Burke, M. Gendreau, M. Hyde, G. Kendall, G. Ochoa, E. Özcan, [29] K. Sim and E. Hart, “A combined generative and selective hyper-
and R. Qu, “Hyper-heuristics: A survey of the state of the art,” Journal heuristic for the vehicle routing problem,” in Proceedings of the 2016
of the Operational Research Society, vol. 64, no. 12, pp. 1695–1724, on Genetic and Evolutionary Computation Conference. ACM, 2016,
2013. pp. 1093–1100.
[16] J. R. Koza, Genetic programming: on the programming of computers by
means of natural selection. MIT press, 1992, vol. 1.
[17] E. K. Burke, M. R. Hyde, G. Kendall, G. Ochoa, E. Ozcan, and
J. R. Woodward, “Exploring hyper-heuristic methodologies with genetic

1955

You might also like