Automatic Itinerary Planning For Traveling Services: Gang Chen, Sai Wu, Jingbo Zhou, and Anthony K.H. Tung
Automatic Itinerary Planning For Traveling Services: Gang Chen, Sai Wu, Jingbo Zhou, and Anthony K.H. Tung
3, MARCH 2014
Abstract—Creating an efficient and economic trip plan is the most annoying job for a backpack traveler. Although travel agency
can provide some predefined itineraries, they are not tailored for each specific customer. Previous efforts address the problem by
providing an automatic itinerary planning service, which organizes the points-of-interests (POIs) into a customized itinerary.
Because the search space of all possible itineraries is too costly to fully explore, to simplify the complexity, most work assume that
user’s trip is limited to some important POIs and will complete within one day. To address the above limitation, in this paper, we
design a more general itinerary planning service, which generates multiday itineraries for the users. In our service, all POIs are
considered and ranked based on the users’ preference. The problem of searching the optimal itinerary is a team orienteering
problem (TOP), a well-known NP-complete problem. To reduce the processing cost, a two-stage planning scheme is proposed. In
its preprocessing stage, single-day itineraries are precomputed via the MapReduce jobs. In its online stage, an approximate
search algorithm is used to combine the single day itineraries. In this way, we transfer the TOP problem with no polynomial
approximation into another NP-complete problem (set-packing problem) with good approximate algorithms. Experiments on real
data sets show that our approach can generate high-quality itineraries efficiently.
Index Terms—Map reduce, trajectory, team orienteering problem, itinerary planning, location-based service
1 INTRODUCTION
1. https://developers.google.com/maps/. 2. http://travel.yahoo.com/.
516 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 26, NO. 3, MARCH 2014
highest among all possible itineraries. The score of the old path cannot result in any new path, we will output the
itinerary is computed based on the POI weights. However, old path. For the last MapReduce job (the mth job), all
as shown in the following theorem, this is an NP-complete the candidate itineraries are used as the results. The output
problem and no polynomial time algorithm exists. key-value pair is using the sorted POIs in the itinerary as
Theorem 1. Finding optimal k-day itinerary in a POI graph the key.
G ¼ ðV ; EÞ is an NP-complete problem. Algorithm 1. map(Object key, Text value,
Proof (Sketch). The optimal k-day itinerary can be reduced Context context).
to the TOP [3], which is a well-known NP-complete // we allow maximally m round MapReduce jobs, i.e.,
problem. Consider a simple scenario where, the maximally length of path is m
//value: existing path, each MapReduce job tries to add one
1. k vehicles are created, which start from the same more POI to the path
position.
1: Path P ¼ parsePath(value)
2. Each vehicle has a time limit (1 day) for traveling
2: for i ¼ 0 to P OIGraph.POINumber do
the POIs.
3: if isConnected(P , i) and !P .contains(i) then
3. Each vehicle collects the profit by visiting the
4: Path newP ath ¼ P .append(i)
POIs.
4. The POI accessed by a vehicle will not be 5: cost ¼ P .cost þ P OIGraph.getCost(P .endPOI, i)
considered by other vehicles. þ P OIGraph.getCost(i)
5. The POI’s profit is equal to its weight. 6: weight ¼ P .weight þ P OIGraph.getWeight(i)
7: newP ath.cost ¼ cost
The TOP is to find the traveling plan that generates
the most profits. The results of the TOP are also the best 8: newP ath.weight ¼ weight
k-day itinerary. u
t 9: if newP ath.cost H then
10: Key newKey ¼ parsePath(newP ath).sort();
Due to the complexity of TOP, it is impossible to find the 11: context.collect(newKey, newP ath)
exact solution. Instead, previous work focuses on proposing 12: else
heuristic algorithms. The basic idea is to generate an initial 13: DFS.write(resultF ile, P )
plan and then adjust it based on some heuristic rules. Those
algorithms have three drawbacks. First, the heuristic Algorithm 2. reduce(Key key, Iterable values,
algorithms need many iterations to get a good enough Context context).
result, which incur high computation cost [7]. Second, the 1: bestCost ¼ 1
adjusting rules are too complicated and the potential gains 2: bestP ath ¼ NIL
are unknown. Finally, there is no bound of the approximate 3: for Path P : values do
result, which may be arbitrarily bad in some cases. 4: if P :cost < bestCost then
In this paper, we reduce the complexity of the TOP by 5: bestP ath ¼ P
transforming it into a set-packing [8] problem. As the 6: bestCost ¼ P :cost
transformation is done in an offline manner, the perfor- 7: context.collect(key, bestP ath)
mance of online query processing is not affected. In the mappers, to compute the weight and cost of new
itinerary, we load the POI graph table from the DFS. As the
3.2 Single-Day Itinerary
graph table is small, each reducer maintains a copy in its
The basic idea of transformation is to iterate all possible
memory. The table’s schema is as follows:
single-day itineraries. This is done by a set of MapReduce
jobs. In the first job, we generate jPj initial itineraries for the ðS P OI; E P OI; S weight; E weight; S cost; E cost; costÞ;
POI set P. Each initial itinerary only consists of one POI.
Iteratively, the subsequent MapReduce job tries to add one where S_POI and E_POI denote the two POIs linked by a
more POI to the itineraries. If no more single-day itineraries specific edge, cost is the traveling cost from S_POI to E_POI,
can be generated, the process terminates. In current and S_POI is the primary key of the table.
implementation, we allow maximally m MapReduce jobs In the reducers (Algorithm 2), we select the path with
in the transformation process to reduce the overheads. smallest cost of paths with the same POIs. In each reducer,
Therefore, a single-day itinerary contains at most m POIs. all the paths have the same POIs. We only keep the path
This strategy is based on the assumption that users cannot with smallest cost and output such path for the next round.
visit too many POIs in one day. In our crawled data set from Note that since all the paths have the same POIs, these paths
Yahoo travel, setting m to 10 is enough for Singapore data, have the same weight.
which include more than 400 POIs. Only a few single-day After all itineraries have been generated, a clean process is
itineraries can contain more than 10 POIs. invoked to remove the duplication. For two itineraries
Algorithms 1 and 2 show the pseudocodes of the (L0 ¼ v0 e> e> vn and L1 ¼ v00 e> e> v0n ), L0 contains L1 , iff
MapReduce job. The mappers load the partial paths from 8v0j 2 L1 ! 9vi 2 L0 ðvi ¼ v0j Þ:
the DFS, which are generated in the previous MapReduce
jobs. We try to append new POI to the existing itineraries. Namely, all POIs in L1 are also included by L0 . If L0
For each new path, we test whether it can be completed contains L1 , we will only keep L0 , as it provides more POIs
within one day. If not, we will discard the new path. If the for the users.
518 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 26, NO. 3, MARCH 2014
bucket to the 1th bucket, we can get a sorted list for all
itineraries involving a POI.
To simplify the index manipulation, an index manager is
built in our query engine. The index manager only provides
one interface scan(POI), where P OI denotes the owner of
the index. The interface returns an iterator, which can be
used to retrieve all itineraries of the POI. A memory buffer
is established to cache the used itineraries and the LRU
strategy is applied to maintain the buffer.
Fig. 4. Itinerary index.
3.4 Discussion: Why MapReduce
3.3 Itinerary Index Although the input data set (POI graph) is small in size, the
To efficiently locate the single-day itineraries, an inverted partial results of the possible itineraries are extremely large
index is built. The key is the POI and the values are all (more than 100G or even 1T). The computation is also
itineraries involving the POI. By scanning the index, we can intensive, which cannot be completed by a single machine.
retrieve all the itineraries. Fig. 4 illustrates the index MapReduce is the solution to partition the partial results and
structure. We create an index file for each POI in the DFS. generate the itineraries in parallel. Its advantages are twofold:
The file includes all single itineraries involving the POI,
1. Parallel computing effectively reduces the running
which are sorted based on their weights. For example, in
time of preprocessing. The search space explodes,
Fig. 4, “1.idx” contains all itineraries for the first POI. The
when the number of POIs and traveling days
itinerary “1j5j20j12j40” is the most important itinerary in
increases. It is impractical to generate all possible
the index file with weight 320. itineraries. But by exploiting the power of MapRe-
The inverted index is constructed via a MapReduce job. duce, we can share and balance the workload between
Algorithms 3 and 4 show the process. The mappers load the multiple machines. The scalability is achieved by
single-day itinerary and generate key-value pairs for each adding more nodes into the cluster. In our experi-
involved POI. The reducers collect all itineraries for a specific ment, the running time of preprocessing is signifi-
POI and sort them based on the weights before creating the cantly reduced with the number of nodes (see Fig. 12)
index file. In our system, the size of the index file may vary a 2. MapReduce algorithms can remove the duplicated
lot. Some POI may have an extremely large index file, due to itineraries in a simple way. In Algorithm 2, by
its popularity and short visit time. In reducers, those POIs may leveraging the framework of MapReduce, we map
result in the exception of memory overflow in the sorting all the itineraries with the same POIs into the same
process. To address this problem, in the map phase, instead of reducer and only keep one itinerary with the lowest
using the POI as the key, we generate the composite key by cost. This approach can prune the low-benefit partial
combining the POI and the itinerary weight. itineraries as early as possible and lead to less input
Algorithm 3. map(Object key, Text value, for the next round of computation.
Context context).
//value: single-day itinerary 4 GREEDY-BASED APPROXIMATION ALGORITHM
1: Itinerary it ¼ parse(value) After the itinerary indexes are constructed, the user request
2: for i ¼ 0 to it.POISize() do ðSp ; kÞ can be processed by selecting k best itineraries
3: int nextP OI ¼ it.getNext(i) from the indexes. Namely, the problem of generating
4: Key key ¼ new CompositeKey(nextP OI, it.weight/ optimal k-day itinerary is transformed into a weighted
bucketSize) set-packing problem as shown in the following theorem.
5: context.collect(key, it) Definition 3 (Weighted Set-Packing Problem). In a universe
U, we assume that each element in U has a weight and the
Algorithm 4. reduce(Key key, Iterable values, weight of any subset of U equals to the sum of the element
Context context). weights in the subset. Given a family S of U’s subsets, the set-
1: CompositeKey ck ¼ key, Set s ¼ ; packing problem is to select a subfamily S’ from S, where all
2: for Itinerary it: values do subsets in S’ are disjoint and the weight of S’ is maximal
3: s.add(it) among all possible selections.
4: sort(s)
Theorem 2. Finding optimal k-day itinerary can be reduced to
5: DFSFile f ¼ new DFSFile(ck:first þ “ ”þck:second)
the weighted set-packing problem.
6: f.write(s)
Proof (Sketch). By solving the set-packing problem, we can
In particular, we partition the itineraries into n buckets.
also get the optimal k-day itinerary, as
The bucket ID is used as a part of the composite key. In this
way, we split the itineraries of a POI into n groups and 1. Each single-day itinerary can be considered as a
each group can be efficiently sorted in the memory. Each subset of the POI set P.
group will result in an index file. However, it is not 2. The subsets selected by the set-packing problem
necessary to merge the files, as the files are partitioned are disjoint, and hence in the k-day itinerary, we
based on the weights. By scanning all files from the nth will not visit a POI twice.
CHEN ET AL.: AUTOMATIC ITINERARY PLANNING FOR TRAVELING SERVICES 519
Definition 4 (Neighborhood). Given an itinerary Li , its Algorithm 6. Adjustment(Set S, double P , int step).
neighborhood ngbðLi Þ is an itinerary set satisfying: 1: int j ¼ 0;
[ 2: while j < step do
ngbðLi Þ ¼ idxðvj Þ: 3: Set cand ¼ ;, int max ¼ 1, int idx ¼ 1
vj 2Li
4: for i ¼ 0 to S.size() do
5: Set ngb ¼ S.get(i).getNeighborhood()
For example, in Fig. 5,
6: Set ind ¼
ngbð1j2j4Þ ¼ f5j1j6; 5j2j4; 2j8j9; 4j2j1; 3j4j5g: ð2Þ getIndependentSetWithMaximalWeight(ngb)
7: Set S 0 ¼ S fðS; ngbÞ þ ind
The neighborhood of Li represents the candidate itineraries 8: double B ¼ weightðS 0 Þ weightðSÞ
that can replace Li . However, some itineraries share the 9: cand.add(S 0 )
common POIs, which cannot coexist in the result. Therefore, 10: if B > max then
we define the independent set as 11: max ¼ B, idx ¼ i
Definition 5 (Independent Set). An independent set ISðLi Þ is 12: if max > 0 then
a subset of ngbðLi Þ. Any two itineraries in ISðLi Þ do not share 13: S ¼ cand.get(idx)
a common POI. Namely, 8L0 ; L1 2 ISðLi Þ ! ðL0 and L1 are 14: else
disjoint). 15: if randProb() > P then
16: S ¼ cand.get(idx)
Neighborhood of each itinerary can have multiple 17: j++
independent sets and each set denotes a different adjust-
Theorem 4. Algorithm 6 returns a k-day itinerary, which
ment strategy. Let S be the initial itinerary set returned by
approximates the optimal solution with the bound ¼ 2ðmþ1Þ .
Algorithm 5. An alternative solution S 0 can be constructed 3
from S by replacing the itineraries by their independent Proof. (sketch) In Algorithm 6, we add a virtual POI to each
sets. More formally, itinerary to mark its traveling day. Therefore, the adjust-
ment algorithm at most returns k disjoint itineraries.
S 0 ¼ S fðS; ngbðLi ÞÞ þ ISðLi Þ; Otherwise, there are two itineraries sharing the same
virtual POI. Namely, they are supposed to be traveled in
where fðSa ; Sb Þ returns a subset of Sa , which shares at least the same day, which is not possible. If the algorithm
one POI with itineraries in Sb . returns less than k itineraries, we can still repeat the
For itinerary “1j2j4” in Fig. 5, its independent set is initialization and adjustment to fill in the left days. In
f2j8j9; 3j4j5g. If S ¼ f1j2j4; 7j5j3g, after the adjustment, we this way, we guarantee that Algorithm 6 returns exactly a
will get S 0 ¼ f2j8j9; 3j4j5g. All itineraries are replaced by k-day itinerary. Based on Theorem 2, the problem of
new ones. To avoid the case of cascading replacement, the selecting the k-day itinerary can be reduced to the
weighted set-packing problem. Therefore, in Algorithm 6,
size of ISðLi Þ should be less than k, as only k single-day
we simulate the heuristic set-packing algorithm. The
itineraries are required. In our implementation, we limit
heuristic algorithm has been analyzed in [8]. Suppose
the size of ISðLi Þ to k2 . Namely, at most half of the
there are X iterations in the algorithm. Let Ii be the results
itineraries are replaced. of the X i 1 iteration. I1 will be the final result. Let di be
The benefit of itinerary adjustment is computed as the payoff factor of each iteration. We have
B ¼ weightðS 0 Þ weightðSÞ: 1 1
ðm þ 1ÞwðI1 Þ 2 þ 2 wðoptÞ;
d1 2d1
If B > 0, we assume that the adjustment improves the
quality of the results. Hence, a better itinerary can be where wðI1 Þ and wðoptÞ represent the weights of the
produced by replacing the old itineraries with correspond- itinerary returned by the heuristic algorithm and the
ing independent sets. optimal itinerary, respectively. The right side of
Algorithm 6 summarizes the idea of adjustment process. the equation is minimized when d1 ¼ 1. In that case,
We set a threshold for the maximal number of adjustments. we have
In each iteration, we find the independent sets for the
1
existing itineraries. If one itinerary has multiple indepen- ðm þ 1ÞwðI1 Þ 1 þ wðoptÞ:
2
dent sets, we will select the one with maximal weight
(line 6). The new results are then computed by performing Therefore, we have a bound ¼ 2ðmþ1Þ
3 for the heuristic
the replacement (line 7) and we record the benefit (line 8). approach, where m is the maximal number POIs in the
After all possible replacement strategies have been checked, itinerary (m is the number of MapReduce jobs in our
we will select the one with maximal benefit. If the benefit is preprocessing). u
t
larger than 0, the result itineraries are updated as the new The most expensive operations in Algorithm 6 are
ones (line 13). Otherwise, we will perform the updates, only retrieving the neighborhood sets. We need to scan the
with a small probability (line 15-16). The idea is to simulate indices of involved POIs to find all itineraries. We find that
the hill-climbing algorithm to avoid the suboptimal solu- as Algorithm 6 only selects one independent set for each
tion. The algorithm guarantees the quality of the returned itinerary, we can save I/O costs by scanning a small portion
itinerary as shown in the below theorem. of the index file. Therefore, in our implementation, we read
CHEN ET AL.: AUTOMATIC ITINERARY PLANNING FOR TRAVELING SERVICES 521
Fig. 11. Indexing cost. Fig. 14. Effect of graph size (processing time).
Fig. 12. Scalability of indexing. Fig. 15. Effect of graph size (quality).
Fig. 17. Effect of selected POIs (quality). Fig. 20. Effect of adjustment (processing time).
Fig. 18. Effect of traveling time (processing time). Fig. 21. Effect of adjustment (quality).
Fig. 23. Effectiveness of single hotel selection. Fig. 24. User study.
Although the adjustment phase incurs high processing discover the users’ traveling patterns from their published
cost, it can significantly improve the result quality. As images, geolocations and events [11], [12], [13]. Based on the
shown in Fig. 21, the adjustment phase can double the relationships of those historical data, new itineraries are
weight of generated itinerary if more than 15 POIs are generated and recommended to the users [14], [15], [16].
selected.4 With more POIs selected, the adjustment phase This scheme leverages the user data to retrieve POIs and
can generate more replacement itineraries and therefore, organize the POIs into itinerary, which is based on a
has a better chance of finding the high-quality result. different application scenario to ours. We help the traveling
agency provide the customized itinerary service, where all
5.8 Effect of Single Hotel Selection
details of POIs are known and each user prefers different
In this section, we justify the effectiveness of hotel selection
itinerary instead of adopting the most popular ones. In our
algorithm. In Algorithm 7, we adopt a “best-effort” solution
case, the itinerary generation problem is a search problem
to append the hotel to the end of each itinerary. To evaluate
for the optimal POI combinations.
the performance of such a solution, we define a new
metric, the hotel weight ratio. In particular, let Wm and Ws In fact, searching for the optimal single-day itinerary has
denote the total weights of generated itineraries in the been well studied. It can be transformed into the traveling
multiple hotel case and single hotel case, respectively. The salesman problem (TSP) [5], which is a well-known NP-
hotel weight ratio is defined as W Ws
. Our “best-effort” complete problem. For example, in [17], given a set of POIs,
m
solution still provides high-quality results. Fig. 23 shows the the system will generate a shortest itinerary to access all the
change of the hotel weight ratio. We can see that, in the POIs. If the distance measure is a metric and symmetric,
single hotel case, the total weight of generated itineraries is the TSP has the polynomial approximate solution [18], but
penalized as each single-day itinerary should end in the the approximate solution incurs high overhead for a large
same hotel POI. However, the “best-effort” solution can POI graph [19]. Therefore, some heuristic approaches [1] are
provide an approximate result with 85-90 percent of the adopted to simplify the computation.
total weight as in the multiple hotel case. This indicates that Some interactive search algorithms [2], [20] are proposed
Algorithm 7 is still able to find good itineraries with the in recent years. These algorithms still focus on optimal
single hotel constraint. single-day itinerary planning. To reduce the computation
overhead and improve the quality of generated itineraries,
5.9 User study
users’ feedbacks are integrated into the search algorithm.
To evaluate the quality of the generated itineraries, we The search algorithm works iteratively. It proposes new
conduct a user study, which asks the users to manually rank itineraries for users based on their previous feedbacks and
the itineraries. Our study hires 20 undergraduate students as the users can adjust the weights of POIs in the itinerary or
the users. Given a set of selected POIs, we use the TOP and select new POIs into the itinerary. In the next iteration, the
MR-Set methods to generate 20 groups of itineraries (three- algorithm will refine its results based on the collected
day itineraries in the experiment). Each participant assigns a information. Those work can be considered as variants of
score (ranging from 1 to 5) to each itinerary in his group. The optimal single-day itinerary planning problems, whereas
average ranks are then computed for the itineraries our algorithms focus on generating multi-day itineraries.
generated by different approaches. Fig. 24 shows the results. Moreover, interactive algorithms pose requirements for the
Most users prefer the results generated by MR-Set. We also users, who may be reluctant to provide the feedbacks.
observe that the ratings of both the TOP and MR-Set are To the best of our knowledge, no previous work studied
reduced, when more POIs are selected as the necessary POIs. the problem of generating multiday itinerary. This problem
It is because that some of the user selected POIs are missing is more challenging than the single-day itinerary, because
in the itineraries due to the constraint of travel time. simply combining multiple optimal single-day itineraries
may result in a suboptimal solution. The multiday itinerary,
6 RELATED WORK as shown in this paper, can be reduced to the team orienting
problem (TOP) [3], which is an NP-complete problem with
Most existing work on itinerary generation take a two-step
no approximate solution. Therefore, many heuristic ap-
scheme. They first adopt the data mining algorithms to
proaches are proposed [6], [21], [22]. The heuristic ap-
4. In this figure, the weight ratio is computed between the MR-Set with proaches cannot guarantee the quality of generated
adjustment and MR-Set without adjustment. itineraries. To address the problem, in this paper, we apply
526 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 26, NO. 3, MARCH 2014
the MapReduce framework to generate the single-day [7] P. Vansteenwegen, W. Souffriau, and D.V. Oudheusden, “The
Orienteering Problem: A Survey,” European J. Operational Research,
itineraries. The parallel engine of MapReduce allows us to vol. 209, pp. 1-10, Feb. 2011.
solve some NP-complete problems more efficiently. Other [8] M.M. Halldórsson and B. Chandra, “Greedy Local Improvement
work [23], [24] also try to leverage the power of MapReduce and Weighted Set Packing Approximation,” J. Algorithms, vol. 39,
pp. 223-240, May 2001.
to reduce the processing cost of NP-complete problems. The [9] E.M. Arkin and R. Hassin, “On Local Search for Weighted K-Set
beauty of our approach is that after the transformation, the Packing,” Math. Operations Research, vol. 23, pp. 640-648, Mar.
1998.
itinerary planning problem is reduced to the weighted set- [10] http://hadoop.apache.org/, 2013.
packing problem, which has approximate solutions under [11] T. Rattenbury, N. Good, and M. Naaman, “Toward Automatic
some contraints. Extraction of Event and Place Semantics from Flickr Tags,” Proc.
30th Ann. Int’l ACM SIGIR Conf. Research and Development in
Information Retrieval (SIGIR ’07), pp. 103-110, 2007.
7 CONCLUSION [12] D.J. Crandall, L. Backstrom, D.P. Huttenlocher, and J.M.
Kleinberg, “Mapping the World’s Photos,” Proc. 18th Int’l
In this paper, we present an automatic itinerary generation Conf. World Wide Web (WWW), pp. 761-770, 2009.
[13] M. Clements, P. Serdyukov, A.P. de Vries, and M.J. Reinders,
service for the backpack travelers. The service creates a “Using Flickr Geotags to Predict User Travel Behaviour,” Proc.
customized multiday itinerary based on the user’s pre- 33rd Int’l ACM SIGIR Conf. Research and Development in Information
ference. This problem is a famous NP-complete problem, Retrieval (SIGIR), 2010.
[14] C.-H. Tai, D.-N. Yang, L.-T. Lin, and M.-S. Chen, “Recommending
team orienting problem, which has no polynomial time Personalized Scenic Itinerary with Geo-Tagged Photos,” Proc.
approximate algorithm. To search for the optimal solution, IEEE Int’l Conf. Multimedia and Expo (ICME), pp. 1209-1212, 2008.
a two-stage scheme is adopted. In the preprocessing stage, [15] M.D. Choudhury, M. Feldman, S. Amer-Yahia, N. Golbandi, R.
Lempel, and C. Yu, “Automatic Construction of Travel Itineraries
we iterate and index the candidate single-day itineraries Using Social Breadcrumbs,” Proc. 21st ACM Conf. Hypertext and
using the MapReduce framework. The parallel processing Hypermedia (HT), pp. 35-44, 2010.
[16] H. Yoon, Y. Zheng, X. Xie, and W. Woo, “Smart Itinerary
engine allows us to scan the whole dataset and index as
Recommendation Based on User-Generated GPS Trajectories,”
many itineraries as possible. After the preprocessing stage, Proc. Seventh Int’l Conf. Ubiquitous Intelligence and Computing (UIC),
the TOP is transformed into the weighted set-packing pp. 19-34, 2010.
[17] I. Hefez, Y. Kanza, and R. Levin, “TARSIUS: A System for Traffic-
problem, which has efficient approximate algorithms. In Aware Route Search under Conditions of Uncertainty,” Proc. 19th
the next stage, we simulate the approximate algorithm for ACM SIGSPATIAL Int’l Conf. Advances in Geographic Information
the set-packing problem. The algorithm follows the Systems (GIS), pp. 517-520, 2011.
[18] N. Christofides, “Worst-Case Analysis of a New Heuristic for the
initialization-adjustment model and can generate a result, Traveling Salesman Problem,” Technical Report 388, Graduate
which is at most 2ðmþ1Þ3 worse than the optimal result. School of Industrial Administration, Carnegie-Mellon Univ., 1976.
Experiments on real data set from Yahoo’s traveling [19] G. Laporte, “The Traveling Salesman Problem: An Overview of
Exact and Approximate Algorithms,” European J. Operational
website show that our proposed approach can efficiently Research, vol. 59, no. 2, pp. 231-247, June 1992.
generate high-quality customized itineraries. [20] R. Levin, Y. Kanza, E. Safra, and Y. Sagiv, “Interactive Route
Search in the Presence of Order Constraints,” Proc. VLDB
Endowment, vol. 3, no. 1, pp. 117-128, 2010.
ACKNOWLEDGMENTS [21] W. Souffriau, P. Vansteenwegen, G.V. Berghe, and D.V.
Oudheusden, “A Path Relinking Approach for the Team
The work of Sai Wu was supported by the National Science Orienteering Problem,” Computers and Operations Research,
Foundation of China (NSFC Grant 60970124, 61170034). The vol. 37, pp. 1853-1859, 2010.
[22] M.V.S.P. de Aragao, H. Viana, and E. Uchoa, “The Team
work of Sai Wu, Jingbo Zhou, and Anthony K.H. Tung was Orienteering Problem: Formulations and Branch-Cut and Price,”
carried out at the SeSaMe Centre. It is supported by the Proc. Algorithmic Approaches for Transportation Modeling, Optimiza-
tion, and Systems (ATMOS), vol. 14, pp. 142-155, 2010.
Singapore NRF under its IRC@SG Funding Initiative and [23] F. Chierichetti, R. Kumar, and A. Tomkins, “Max-Cover in Map-
administered by the IDMPO. Sai Wu was the corresponding Reduce,” Proc. 19th Int’l Conf. World Wide Web (WWW), pp. 231-
author. 240, 2010.
[24] Z. Zhao, G. Wang, A.R. Butt, M. Khan, V.A. Kumar, and M.V.
Marathe, “SAHAD: Subgraph Analysis in Massive Networks
Using Hadoop,” IEEE Int’l Parallel and Distributed Processing Symp.
REFERENCES (IPDPS), 2012.
[1] S. Dunstall, M.E. Horn, P. Kilby, M. Krishnamoorthy, B. Owens,
D. Sier, and S. Thiebaux, “An Automated Itinerary Planning
System for Holiday Travel,” Information Technology and Tourism,
vol. 6, no. 3, pp. 195-210, 2004.
[2] S.B. Roy, G. Das, S. Amer-Yahia, and C. Yu, “Interactive Itinerary
Planning,” Proc. IEEE 27th Int’l Conf. Data Eng. (ICDE), pp. 15-26,
2011.
[3] I.-M. Chao, B.L. Golden, and E.A. Wasil, “The Team Orienteering
Problem,” European J. Operational Research, vol. 88, no. 3, pp. 464-
474, Feb. 1996.
[4] J. Dean and S. Ghemawat, “MapReduce: A Flexible Data
Processing Tool,” Comm. ACM, vol. 53, pp. 72-77, Jan. 2010.
[5] T.H. Cormen, C.E. Leiserson, R.L. Rivest, and C. Stein, Introduction
to Algorithms, second ed. The MIT Press and McGraw-Hill Book
Company, 2001.
[6] C. Archetti, A. Hertz, and M.G. Speranza, “Metaheuristics for the
Team Orienteering Problem,” J. Heuristics, vol. 13, pp. 49-76, Feb.
2007.
CHEN ET AL.: AUTOMATIC ITINERARY PLANNING FOR TRAVELING SERVICES 527
Gang Chen received the BSc, MSc, and Jingbo Zhou is currently working toward the
PhD degrees in computer science and en- PhD degree in the School of Computing,
gineering from Zhejiang University in 1993, National University of Singapore. His research
1995, and 1998, respectively. He is currently interests include indexing and query processing
a professor at the College of Computer on the complex structure, such as trajectories,
Science, Zhejiang University. He is also the trees and graphs.
executive director of Zhejiang University—Ne-
tease Joint Lab on Internet Technology. His
research interests include database, informa-
tion retrieval, information security, and com-
puter supported cooperative work.
Anthony K.H. Tung received the BSc (second
Sai Wu received the bachelor’s and master’s class honor) and MSc degrees in computer
degrees from Peking University, and the PhD science from the National University of Singa-
degree from the National University of Singa- pore (NUS), in 1997 and 1998, respectively, and
pore in 2011. Now he is an assistant professor at the PhD degree in computer sciences from
the College of Computer Science, Zhejiang Simon Fraser University in 2001. He is currently
University. His research interests include P2P an associate professor in the Department of
systems, distributed database, cloud systems, Computer Science (NUS). His research interests
and indexing techniques. He has served as a include various aspects of databases and data
program committee member for VLDB, ICDE, mining (KDD) including buffer management,
and CIKM. frequent pattern discovery, spatial clustering, outlier detection, and
classification analysis.