0% found this document useful (0 votes)
104 views8 pages

A Framework For Passengers Demand Prediction and Recommendation

Uploaded by

AiTheruMinasse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
104 views8 pages

A Framework For Passengers Demand Prediction and Recommendation

Uploaded by

AiTheruMinasse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

2016 IEEE International Conference on Services Computing

A Framework for Passengers Demand Prediction and Recommendation

Kai Zhang 1,2, Zhiyong Feng1,2,3, Shizhan Chen1,2,Keman Huang1,2,*,Guiling Wang4,5


1
Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin, China
2
School of Computer Science and Technology, Tianjin University, Tianjin, China
3
School of Computer Software, Tianjin University, Tianjin, China
4
Research Center for Cloud Computing, North China University of Technology, China
5
Beijing Key Laboratory on Integration and Analysis of Large-scale Stream Data, Beijing, China
{zhang_kai, zyfeng, shizhan, keman.huang}@tju.edu.cn, [email protected]

Abstract— With the rapid development of mobile internet and drivers based on GPS traces [4, 5]. Other approaches [6-8]
wireless network technologies, more and more people use the are proposed to provide recommendation for carpooling
mobile app to call a taxicab to pick them up. Therefore, service to passengers according to traffic data analysis.
understanding the passengers’ travel demand becomes crucial Generally speaking, taxi drivers expect that the places
to improve the utilization of the taxicabs and reduce their cost. where passengers may take a taxi are given by the time and
In this paper, based on spatio-temporal clustering, we propose around the location. However, most of the researches about
a demand hotspots prediction framework to generate GPS data is focused on recommendation, while the
recommendation for taxi drivers. Specially, an adaptive relationships between passenger demands and space-time are
prediction approach is presented to demand hotspots and their
rarely captured.
hotness; and then combing the driver’s location and the
hotness, top candidates are recommended and visually
Therefore, to deal with these issues, we present a demand
presented to drivers. Based on the dataset provided by CAR hotspots prediction framework based on the spatio-temporal
INC., the experiment shows that our approach gains a analysis to predict and recommend the hotspots for drivers.
significant improvement in hotspots prediction and Based on the analysis of historical data, including when and
recommendation, with 15.21% improvement on average f- where passengers get on a taxi, we generate the demand
measure for prediction and 79.6% hit ratio for distribution to learn the patterns which can help to improve
recommendation. the performance of the spatio-temporal clustering. Then the
hotness score for each hotspot is predicted to represent the
Keywords- spatio-temporal cluster, passenger demand hotspot, potential requirement of the passengers. Considering the fact
demand prediction, hotspot recommendation that it would take time for the drivers to reach a given
location while the requirement is dynamic, the top-k
I. INTRODUCTION locations which combines the hotness and the distance is
With the rapid development of mobile internet and visually presented for each taxi driver to help them improve
wireless network technologies in these years, the the efficiency.
transportation industry has been greatly changed. More and Hence, the major contributions of this paper are a
more passengers in cities are relying on different mobile framework to predict passenger demands and generate
apps, such as DiDi 1 , Uber 2 , CAR 3 , Yongche 4 , to call a recommendation for drivers to improve their efficiency,
taxicab to pick them up for travel. This makes the knowledge including the following folds:
about the potential passengers’ requirements important and  An adaptive prediction approach is proposed to
valuable, which is lack for many taxicab drivers, especially identify the hotspots and predict the hotness of the passenger
the novice-like drivers. Actually, understanding the travel demands based on the historical GPS data;
requirements can not only help the drivers picking up  A method combing the hotness prediction and
passengers more quickly and earning more money, but also locations to calculate attractive score is presented to generate
reduce the cruising time and energy waste. Therefore, how to recommendation for each driver;
understand the travel requirements for efficiency  A visual prototype system is developed to prove the
improvement becomes an important issue for the effectiveness of the presented framework, with hotspots
transportation industry. distribution denoted by hotness score and hit ratio that taxi
Many efforts have been proposed to address this problem. drivers succeed in picking up passengers in predicted places;
A lot of clustering approaches [1-3] have been used in The rest of this paper is organized as follows. Section II
hotspot analysis, including k-means, hierarchical clustering, presents our framework for passenger demands prediction
Density-Based Spatial Clustering of Applications with Noise and recommendation. Section III details the methodology to
(DBSCAN). There are papers focusing on understanding the predict the hotness of passenger requirements and generate
traffic flow movement and the corresponding benefits for recommendation for drivers. Section IV reports the data and
the experiment results. Section V discusses the related work
1 and Section VI concludes the whole paper.
http://www.xiaojukeji.com/en/index.html
2
https://www.uber.com/
3
http://www.10101111.com/
4
http://www.yongche.com/

978-1-5090-2628-9/16 $31.00 © 2016 IEEE 340


DOI 10.1109/SCC.2016.51
II. FRAMEWORK III. PREDICTION AND RECOMMENDATION
The framework system of Passenger Demands Prediction A. Predictoin
and Recommendation (PDPR) framework is illustrated in
Figure 2, which consists of two parts: 1) offline prediction 2) 1) Definition
online recommendation, can discover the passengers’ hotspot Definition 1 (Passenger Demand, PD):
regions and provide the top-k most valuable places to taxi PD contains three attributes: location, hotness and
drivers. timestamp. Given the location L where taxis pick up a
passenger at time T, PD can be represented as
PD   L, H , t  (1)
where L consists of longitude and latitude while H
represents the hot degree of demand.
Definition 2 (Hotness Score, HS):
Given the set of clusters C and time t, for ci in C, the
Offline Prediction Online Recommendation hotness score is calculated by
|c |
HSi,t = n i
(2)
Trajectories Visualization

| ci |
i 1
EWMA Model

Data where HSi,t represents hotness score of cluster ci at time t.


Preprocessing Top-k places
Definition 3 (Demand Hotspot, DH):
Pick-up Execution Given the time t and the set of clusters C, for each cluster
Detection Engine
cięC, we can get a core point pi (lon, lat) standing for ci
in C, which is calculated by arithmetic average of all
Hotspots Knowledge of points in ci, then DHi is accepted:
Extraction pick-up places

DH i  pi , HSi , t  (3)
where HSi is hotness score of cluster ci.
Figure 1. The system framework 2) Passenger Demand Identify
In order to predict passenger demands, we should identify
In fact, the position passenger concentrated in is varying when and where a demand happens.
with time and location of taxi is dynamically changed.
a) Data Information
Hotspots prediction is performed offline. The process is as
follows: we first obtain historical taxi pick-up and drop-off The taxi GPS data used in this paper is provided by
points from GPS trajectories and deal the taxi pick-up points CAR Inc5. It is generated by about 3,760 taxis in Beijing
with a spatio-temporal clustering approach, then extract the from July 1 to July 29, 2015, while the total number of taxis
hotspots in different time slots and different regions. is sampled with a time interval ranging from 30-40 s. Each
Specifically, we divide 24 hours into 24 time segments and record denotes where and when passengers get on a taxi.
all pick-up points into 24 subsets based on 24 time segments, Table I reports the dimensions of the taxi GPS data.
then an adaptive DBSCAN method is utilized to obtain the TABLE I. METADATA OF RECORD
locations with high density in each time segment. We can get
a core point standing for hotspot in a small region step by Field Description
step in every historical date. Different time segment’s pick-
up points are deposited into different files prepared for VehicleID the unique taxi ID
online recommendation. Thirdly, the results are taken as Lon longitude of the point
input parameters of the Exponentially Weighted Moving- Lat latitude of the point
Average (EWMA) model, which is one of time series
forecasting methods. Online recommendation process is as Timestamp the sampling time
follows: we get the current hotspots based on the output of current state of point, 0 indicates the taxi is
EWMA model according to taxi drivers’ request time and PassengerState
vacant and 1 indicates the taxi is occupied
location, then they are ranked by the attractive score of
hotspots. We choose top-k hotpots around location of taxi to current state of record, 0 indicates the record is
RecordState
recommend to taxi drivers. correct and 1 indicates the record is incorrect
In our proposed framework, the prediction and evaluation
on recommendation are critical and we will discuss their
details in the following sections. 5
http://www.zuche.com

341
b) Data Preprocessing
Because of the abnormalities such as GPS device failure,
the GPS position may sometimes be incorrect. We focus on
the passenger demands of taxis in the city, thus data should
be cleaned firstly. To clarify the real vacant and occupied
trajectories, we carry out the data preprocessing as follows:
Step 1 Extract the raw taxi trajectories from GPS records
A shift of PassengerState means an occupied /vacant
event, for example, PassengerState changes from 0 to 1
indicates an occupied event, while PassengerState changes
from 1 to 0 indicates a vacant event. An occupied trajectory
is defined as a point sequence beginning with an occupied
event and ending with a vacant event, while a vacant
trajectory event is defined as a point sequence beginning
with a vacant event and ending with an occupied event. The
occupied/vacant event is presented in by (4). Figure 2. The pick-up points around Wangfujing Street

vacant trajectory occupied trajectory


We get the dataset as shown in Table II after extracting
(4)
10000000001...0
0000000001 0 111111111 0 the points from trajectories.
Step 2 Filtering incorrect records from a trajectory TABLE II. DATASET OF GPS
Due to incorrect records including flipping points would
Datasets 2015.7.1-7.29
result in a non-smooth trajectory with abnormal movement.
Given a trajectory record containing several points, function Number of taxis 3,760
f(pi, pj) is defined as
Effective days 29

0, Velocityi , j Threshold Number of Total 1,253,298
f ( pi , p j )  (5)
1, Velocityi , j
Threshold
points
 Per point/day 43,216

where Velocityi,j is calculated as the Manhattan distance Note that there are two types of points in the dataset,
divided by time interval between point i and point j. pick-up points and drop-off points. Each pick-up point
Threshold is set to 120km/h according to urban traffic corresponds to a drop-off point.
regulation. For each record, if f(pi, pj) is 0, then the record After trajectories been processed, we have detected the
should be filtered. pick-up and drop-off points. Then we perform a range query
Step 3 Pick-up and Drop-off Detection according to the location and time of the taxi, which picks
out the relevant request records for calculating hotspots. The
We filter the abnormal trajectories whose average records with the condition will be selected and form the
velocity is out of a normal range. During this stage, we dataset for clustering. Since the data could sometimes be
detect the places where passengers get on a taxi and where noisy, we conduct a request filtering as algorithm 1 to reduce
taxis drop off a passenger. For a trajectory TR, in which the false selections.
point meets
Algorithm 1 Request Filtering(location, time, expected)


 P  pi |S pi1  0  S pi  1

(6)
Input: location, the current location of car
time, the current time


D  pi |S pi1  1  S pi  0 records, the records data
Output: filtered, the set of filtered data 
where Spi is the PassengerState of point i. Through Procedure:
processing of trajectories, we get the set of pick-up points P
1. filtered ← ∅, P ← ∅
and drop-off points D. Fox example, Figure 2 shows the
2. n ← sizeof(filtered)
pick-up points around Wangfujing Street, which is the most
famous commercial area in Beijing. A blue marker 3. for ri in records
represents a pick-up point while the red marker indicates a 4. if !is_around(location)/* distance between taxi and
drop-off point. location of record more than threshold */
5. || !is_timeslot(time) /* find out records in time
segment of request */ then
6. continue;
7. end if
8. if n < expected
9. P ← ri

342
10. filtered ← filtered Ĥ P The expected value of e is the value of eps. We use the
historical data to learn the parameter eps. For example, as
11. n ← sizeof(filtered)
shown in Figure 3, the number of clusters and noise detected
12. end if by algorithm changes with different value of  when i is
13. end for between 0 and 20. As the number of i increases, the number
14. return filtered of clusters and isolated points shows a downward trend. The
Lines 4~7 execute the query in database to get the value of i is increasing as epsi goes on and after i = 7, the
relevant records according to the query condition, saved in number of clusters and noise reaches the convergence. Thus
set P. Lines 8~12 execute the iterative process if the number we get the optimal parameters of epsi when i = 7.
of returned set is not enough. 200
clusters
3) Hotspot Prediction based on Adaptive DBSCAN noise

Demand Hotspots Prediction aims to provide the taxi


100
driver with the places to find potential passengers. In
particular, given the current location L and timestamp T of a
taxi driver, the system provides the taxi driver with top-k
places which are relatively large probability to find the 0
passengers. 1 3 5 7 9 11 13 15 17 19
The DBSCAN algorithm can discover density-based Figure 3. Cluster and noise with different eps
clusters with an arbitrary shape in a noise spatial database,
which takes a cluster as the max density-reachable point set. Finally, each of points in the dataset has an influence
DBSCAN with proper parameters has greater advantage in radius which equals eps. We calculate the expected number
dealing with outliers and noises than pure hierarchical of points within a radius of eps as the value of minPts. It is
clustering or partition-based clustering. However, the input defined by summing element number in the ∆-neighborhood
parameters eps and minPts are needed manually. So that the of each point, and dividing with the number of points, as
process of clustering requires user’s intervention, leading to equation
n
the accuracy of clustering results depending directly on the
user’s selection of parameters. Therefore we proposed a | N (pi ) |
(10)
framework Passenger Demands Prediction and minPts  i 1

Recommendation (PDPR) based on spatio-temporal | pn |


clustering, avoiding the manual intervention in the process of where N∆ is the ∆-neighborhood of point pi.
clustering and realizing automation in the clustering. As described above, parameters eps and minPts can be
In DBSCAN, the bigger the value of eps is, the less the determined by the statistical features of dataset itself as
number of core points will be. Besides it will lead to a lot of shown in Algorithm 2.
records be labeled as noise or a cluster be split into multiple Algorithm 2 ADBSCAN(D, t)
clusters if the value of eps is too small. In our approach, we Input: D, the dataset clustered
proposed an adaptive method to calculate parameters.
t, the current time
First, we need to compute the matrix of distance
Output: C, the set of clusters
distribution, which is denoted as Mn×n as
Procedure:
M nn  dis(i, j ) |1
i
n, 1
j
n (7) 1. C ← ѫ;
where n=|M|, it is the number of the object in the data and 2. eps, minPts, N ← 0;
dis(i, j) is the Manhattan distance of i and j. The matrix Mn×n 3. n ← sizeof(D);
is a real symmetric matrix of n rows and n columns. Let d12 4. for pi ę D
be the Manhattan distance between point p1 and p2 which is
5. eps ← getNewEps(pi);
calculated by
6. end for
d12 | x1  x2 |  | y1  y2 | (8) 7. for pi ę D
where xi is the longitude of the point and yi is the latitude of 8. Ni ← getNeighbours(pi, eps);
the point. 9. end for
Then we sort the values of each row in the array, we can 10. minPts ← getAverage(Ni)
get the new matrix denoted as M’n×n. The i-th value of each
11. C ← DBSCAN(eps, minPts);
row in the array obeys the Poisson distribution [9] as follows:
The Poisson distribution i-th value is estimated with 12. return C;
maximum likelihood estimation method, which can model Lines 4~6 calculates the parameter eps by iterative
relative changes of the value very well. Actually, computation. Lines 7~9 generate the neighbor points for
each point. Line 10 gets the parameter minPts. Line 11
1 n
e X   xi
n i 1
(9) executes the DBSCAN using parameters eps and minPts.

343
Here,  is attenuation factor which is denoted as α =

empirically,  is the number of observation days. The

degree of weighting is determined by the factor  . For
example, Figure 5 shows the values of  when N = 15.
1

0.8

(a) 0:00 AM (b) 06:00 AM 0.6

weight
0.4

0.2

0
EWMA,N=15

Figure 5. The attenuation factor


(c) 12:00 PM (d) 18:00 PM
Figure 4. Clusters in different time segments
According to the EWMA model, we can predict the
hotness of each cluster. Intuitively, the cluster with more
With the above definitions, DBSCAN algorithm is requests in the past is more valuable, however, the number
performed with the parameters eps and minPts. Figure 4 of requests is affected by the size of the cluster at t and those
shows the result of clustering in different time segments, at past time. As a result, the density of each cluster , is
hotspots are denoted as polygon in white and points are defined as the number of the cluster  divided by the size of
draw in yellow circle on the electronic map. the cluster at time t. Just as definition 3, we can get , of
each cluster ci.
4) Hotness Prediction based on EWMA
Then Figure 6 shows the hotspots distribution with
Time series prediction method is an extension of hotness score in Beijing. The result is polymerized on Baidu
historical data, based on time sequence reflected in the Map. The color of circles indicates the score of hotspots
development process and the direction of the trend to predict while the number in the circle represents the number of
the result in next period of time. We use Exponential potential passengers predicted by the system.
Weighted Moving Average (EWMA) method, which is any
average that has multiplying factors to give different
weights to data at different position in the sample window.
It is a kind of moving average with exponential decreasing
weight.
Consider Hk = {Hk,0, Hk,1,…, Hk,t}to be a discrete time
series for the number of passengers at a hotspot k. The goal
is to build a model that predicts the score of hotspot k at
time t+1. To do so, different weights are assigned to
different periods of hotness score, because of the longer
time goes, the less it impacts the score of prediction goal.
Mathematically, the moving average is the convolution of
the datum points with fixed weighting function. We can get
the prediction of the  + 1 day by Figure 6. Example of hotspots distribution in Beijing, China
n

Y  X i i B. Recommendation
Yn 1  i 1
n Online Recommendation aims to provide the taxi drivers
(11)
X
i 1
i
with the best places to cruise, where it will bring a high
probability to get a passenger. Here we propose the
where  is the i-th day’s observation, and
is the day attractive score to evaluate this possibility for each driver.
we want to predict. Note that  for  = 1,2, … ,  is the Definition 4 (Attractive Score, AS): Given the HS, time t
weight of the i-th day while n is the size of the time window. and distance d between taxi and hotspot, the attractive score
In this way, the  + 1 day’s prediction value is given by of hotspot becomes smaller as the distance increases. The
attractive score of ci at time t is defined as
p1  (1  ) p2  (1  ) 2 p3  ...  (1  ) n1
EWMAn 1  (12)
1  (1  )  (1  ) 2 +...+(1  ) n

344
HSi ,t IV. RESULTS AND DISCUSSION
ASi ,t  2
(13)
d A. DataSet
In this stage, we provide recommendation based on the We evaluate our method using taxi trajectories data
proposed model according to the location and time of a taxi provided by CAR Inc. It is generated by about 3,760 taxis in
driver. Figure 2 outlines the steps of online recommendation, the city of Beijing from July 1-29, 2015.
including the prediction of pick-up points, calculation of the As we known, people have a different travel style on
hotness score, the sort of hotspots and visualization. Each weekdays and weekends. As shown in Figure 7, this
of hotspots contains location and hotness score, the higher difference is significant. Therefore only the data during the
the hotness score, the higher value of the place. weekdays can be used to predict the passenger demands in
Hotspots are urban areas in which passengers request to weekdays. Also we can observe that on weekdays, the
take a taxi with high probability. The activities in hotspots number of passengers around morning rush hour (8.am) is
can characterize the spatial distribution of passengers’ significantly higher than that on weekends. This matches the
demands. Once the clusters are identified, the hotness score generally accepted assumption that people are going to work
can be calculated. We rank top-k hotspots based on the score, during the morning rush hours. Likewise, the time slots of
then the system returns the top-k places to the taxi driver. 4.pm-7.pm correspond to the evening rushing hour in the
Notions are defined as Table III. workday when people go home. This means that people have
different travel preferences at different time in same day.
TABLE III. NOTIONS IN HOTNESS SCORE Thus we further segment time of day into 24 slots, the traffic
conditions and the semantic meaning of people’s travel are
 Description
similar in the same time slot.
 The cluster id is i 2000
Weeke…
Weekday
, The hotness score of cluster i 1600
Number of Passengers
, The attractive score of cluster 1200

N The number of point in cluster i 800

With all involved notions listed in Table III, we describe 400


the pseudo code of top-k ranking algorithm 3. Besides
hotness score to taxi, distance between taxi and hotspot 0
0:00 4:00 8:00 12:00 16:00 20:00
should be considered. We rank the top-k places by the
attractive score. Figure 7. Different characteristics between weekday and weekend

Algorithm 3 Top-k Places Generation(C, k) B. Passenger Demand Prediction


Input: C, the result set of ADBSCAN 1) Experimental Settings
k, the top-k value
We divide one day into 24 time segments, for every
Output: candidate, the set of top-k locations segment, ADBSCAN will be performed so that we can get a
Procedure: set of clusters for each period.
1.  ← 0; In order to evaluate the performance of our framework,
2. Candidate, A ← ∅ we use the historical data ranging from 7.3-7.13 to train the
3. if C ≠ ∅ then model to predict the hotspots areas of the date of 7.14. As the
4. for  in C size of the time window is set as 7, we can get the
"#$%& (') ) attenuation factor  as 0.25 so that we can assign different
5.  ← ;
-. weights to different date as follows:
6. A ← A ∪ 
7. end for TABLE IV. WEIGHT OF HISTORICAL DATA
8. end if
7.3 7.6 7.7 7.8 7.9 7.10 7.13
9. A = sort(A);//sort the list desc Date

10. while i < k do α 0.134 0.188 0.237 0.316 0.422 0.563 0.75
11. candidate ← candidate ∪  ;
12. i ← sizeof(candidate); 2) Comparing Methodology
In order to prove the effectiveness of the clustering
13. end while
method, we conduct a depth analysis of comparison on our
14. return candidate;
approach and consider the following four comparisons:
Lines 4~7 calculate the AS for each cluster ci and add it to  ADBSCAN with EWMA method (AE)
the set A. Line 9 sorts the set A desc. Lines 10~12 generate
the top-k candidate places.  DBSCAN with EWMA method (DE)
 DBSCAN without EWMA method (D)
 ADBSCAN without EWMA method (A)

345
3) Evaluation Metrics Recall 71.08% 62.16% 53.88% 54.38%
We use three measures to evaluate the prediction F-Measure 69.31% 60.76% 58.56% 54.1%
approach, including precision, recall and F-Measure. Table V details four methods evaluation on three
Precision is defined as the fraction of predicted records that average measures of an hour and shows that our approach
are relevant. Recall is defined as the fraction of predicted has a 15.21% improvement on f-measure comparing with
records that are retrieved. F-Measure integrates precision and method A for prediction.
recall into a single, composite harmonic mean. Formally,
C. Recommendation Effectivenss
|{correct}|
Precision  (14) 1) Comparing Methodology
|{all}|
In order to evaluate the effectiveness of recommendation,
|{relevant} {retrieved}| we perform experiments on three methods: only distance
Recall  (15) considered, only attractive score considered and
|{retrieved}|
combination of distance and attractive score.
2  Precision  Recall 2) Evaluation Metrics
F - Measure  (16)
Precision  Recall According to (13), it can be seen that the further away the
4) Result and Discussion hotspot is from the given location, the lower attractive score
Figure 8 shows three measures for four methods as the it will be assigned. We consider two measures to evaluate
time varying. Note that three measures are relatively low the effectiveness by comparison between prediction and real
from time 0 to 6. As time goes, more and more people go to data, including Number of Hit Points (NHP) and Hit Ratio
work, the approach performs better than former. (HR). If the taxi picks up a passenger in the recommended
100% location within one kilometer, we successfully hit the target.
NHP is the total number of hit. Formally,
80%
| {hit} |
60% HR  (17)
40%
| {recommended } |
3) Result and Discussion
20%
As shown in Figure 9, it can be seen that taking the
0% attractive score into account achieves a much better
0 4 8 12 16 20 performance than the method which only considers the
(a) Precision on four methods distance, with a 13.89% improvement for HR.
100% Actually, by assigning different weights to the attractive
80%
score and the distance, we get a best performance when
, : DIS = 0.4:0.6, resulting into the following method for
60%
recommendation:
40% 100% 500

20% 80% 400


0%
60% 300
0 4 8 12 16 20

(b) Recall on four methods 40% 200


100%
20% 100
80%
0% 0
60% 0 4 8 12 16 20
AS:Dis=0.4:0.6
40% Figure 9. Comparison of different attractive score (AS) and distance (Dis)
with time varying
20%

0% V. RELATED WORK
0 4 8 12 16 20
AE DE D A
Taxicab service falls into two general categories and
research follows this, occasionally attempting to bridge them
(c) F-Measure on four methods
[10]. The first category is dispatching where companies
Figure 8. Comparison between four methods in different time slots dispatch taxicabs to customer requested specific locations.

TABLE V. DETAIL OF THREE MEASURES ON FOUR METHODS
The second category is cruising. The taxicab cruises on the
road to looking for a passenger along the streets empirically.
Methods As in the application of the analysis of the taxi demands,
Measures Yuan J, etc. presented a bidirectional recommender for taxi
AE DE D A
Precision 67.92% 59.67% 54.47% 54.33% driver and people, using the knowledge of passengers’
mobility patterns and taxi drivers’ pick-up behaviors leaned

346
from the GPS trajectories of taxicabs [11]. Shen Ying, etc. Program of Application Foundation and Advanced
developed an analysis method based on a city’s short-dated Technology grant 14JCYBJC15600. Keman Huang is the
taxi GPS traces and provide recommendation to help taxi corresponding author.
drivers cruising to find potential passengers with optimal
routes [12]. Li proposed an algorithm using taxi GPS traces REFERENCES
to create a usage based on road segment [13]. Luis Moreira- [1] Murtagh, F. "A Survey of Recent Advances in Hierarchical
Matias etc. introduced a novel methodology for predicting Clustering Algorithms." Computer Journal 26.4(1983):354-359.
the spatio-temporal distribution of taxi-passengers demand [2] Macqueen, J. "Some Methods for Classification and Analysis of
[14]. MultiVariate Observations." In 5th Berkeley Symp. Math. Statist.
Prob 1967:281-297.
In other applications of taxi trajectory, researchers have
[3] Ester, Martin, et al. "A Density-Based Algorithm for Discovering
been concerned with understanding the traffic flow Clusters in Large Spatial Databases with Noise." Proceedings of the
movement and the corresponding benefits for drivers [15]. 2nd International Conference Knowledge Discovery and Data Mining
Yu Zheng, etc. implemented a series of researches based on 1996.
GPS trajectory: GeoLife (Geography Life) [16]. It is an [4] Yuan, Jing, et al. "T-drive: driving directions based on taxi
application system which uses GPS as data-centered and trajectories." Proceedings of the 18th SIGSPATIAL International
shown on electronic map. Taxi service strategies, as the Conference on Advances in Geographic Information Systems ACM,
2010:99-108.
crowd intelligence of massive taxi drivers, Daqing Zhang,
[5] V. W. Chu, R. K. Wong, W. Liu, F. Chen and C. S. Perng, "Traffic
etc. intended to understand the service strategies of skilled
Analysis as a Service via a Unified Model," Services Computing
taxi drivers, based on a large-scale GPS historical database (SCC), 2014 IEEE International Conference on, Anchorage, AK,
[17]. In [18], an exhaustive survey of the work on mining the 2014, pp. 195-202, doi: 10.1109/SCC.2014.34
traces which can tell us where passengers are picked up and [6] S. Ma, Y. Zheng and O. Wolfson, "T-share: A large-scale dynamic
dropped off and classifying the existing work into some taxi ridesharing service," Data Engineering (ICDE), 2013 IEEE 29th
categories. International Conference on, Brisbane, QLD, 2013, pp. 410-421, doi:
10.1109/ICDE.2013.6544843
Different from all previous work, we first conduct spatio-
temporal clustering method to extract hotspots from the taxi [7] Z. Zhang, G. Wang, B. Cao and Y. Han, "Data Services for
Carpooling Based on Large-Scale Traffic Data Analysis," Services
trajectories. Then, we combine the historical hotspots and Computing (SCC), 2015 IEEE International Conference on, New
time series forecasting model to predict the demands of York, NY, 2015, pp. 672-679, doi: 10.1109/SCC.2015.96
passengers in urban areas. [8] Ming-Kai Jiau, Shih-Chia Huang and Chih-Hsian Lin, "Optimizing
the Carpool Service Problem with Genetic Algorithm in Service-
VI. CONCLUSION Based Computing," Services Computing (SCC), 2013 IEEE
International Conference on, Santa Clara, CA, 2013, pp. 478-485, doi:
This paper proposed a novel framework which combines 10.1109/SCC.2013.56
time-series forecasting techniques and spatio-temporal [9] Zhou, Hongfang, and P. Wang. "Research on Adaptive Parameters
clustering method using historical taxi trajectories to predict Determination in DBSCAN Algorithm." Journal of Xian University
passengers’ demands in urban areas. First, we detect the of Technology (2012).
passengers demand from the GPS trajectories, including [10] Zheng, Yu, et al. "Understanding mobility based on GPS data."
location and time, then an adaptive prediction approach is International Conference on Ubiquitous Computing ACM, 2008:312-
proposed to identify the hotspots and predict the hotness of 321.
the passenger demands. Thirdly, a recommender combing [11] Yuan, Jing, et al. "Where to find my next passenger." Proceedings of
the 13th international conference on Ubiquitous computing ACM,
locations and the hotness prediction is generated for taxi 2011:109-118.
driver. Finally, a visual prototype system is developed to
[12] Shen, Ying, L. Zhao, and J. Fan. "Analysis and Visualization for Hot
prove the effectiveness of the presented framework. The Spot Based Route Recommendation Using Short-Dated Taxi GPS
experiments based on the GPS data generated by 3,760 taxis Traces." Information 6.2(2015):134-151
from CAR INC. show that comparing with original method, [13] Li, Qingquan, et al. "Hierarchical route planning based on taxi GPS-
our framework gains a 15.21% improvement on average f- trajectories." Geoinformatics, 2009 17th International Conference on
measure for prediction and 79.6% hit ratio for IEEE, 2009:1-5.
recommendation. [14] Moreira-Matias, L., et al. "Predicting Taxi–Passenger Demand Using
Actually, the prediction of hotspot not only help taxi Streaming Data." IEEE Transactions on Intelligent Transportation
Systems 14.3(2013):1393-1402.
driver find a passenger quickly, but also reduce the traffic
[15] J. Yuan, Y. Zheng, X. Xie and G. Sun, "T-Drive: Enhancing Driving
jam. In the future, we plan to extend our framework into a Directions with Taxi Drivers' Intelligence," in IEEE Transactions on
platform which combines traffic flow and road network, Knowledge and Data Engineering, vol. 25, no. 1, pp. 220-232, Jan.
providing scheduling service for the company. 2013, doi: 10.1109/TKDE.2011.200
[16] Zheng, Yu, et al. "GeoLife: A Collaborative Social Networking
ACKNOWLEDGMENT Service among User, Location and Trajectory." Bulletin of the
Technical Committee on Data Engineering 33.2(2010):32-39.
This work is supported by the National Natural Science
Foundation of China grant 61373035, 61502333, 61502334, [17] Zhang, Daqing, B. Guo, and Z. Yu. "The Emergence of Social and
Community Intelligence." Computer 44.7(2011):21 - 28.
61572350, the Open Fund of Beijing Key Laboratory on
[18] Castro, Pablo Samuel, et al. "From Taxi GPS Traces to Social and
Integration and Analysis of Large-scale Stream Data, North Community Dynamics: A Survey." Acm Computing Surveys
China University of Technology, and the Tianjin Research 46.2(2014):1167-1182.

347

You might also like