0% found this document useful (0 votes)

46 views

Daily Metro Origin-Destination Pattern Recognition Using Dimensionality Reduction and Clustering Methods

data mining paper

Uploaded by

den

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views

Daily Metro Origin-Destination Pattern Recognition Using Dimensionality Reduction and Clustering Methods

data mining paper

Uploaded by

den

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC): Workshop

Daily Metro Origin-Destination Pattern

Recognition Using Dimensionality Reduction
and Clustering Methods
Chao Yang Fenfan Yan Xiangdong Xu*
School of Transportation Engineering, School of Transportation Engineering, School of Transportation Engineering,
Key Laboratory of Road and Traffic Key Laboratory of Road and Traffic Key Laboratory of Road and Traffic
Engineering of the Ministry of Engineering of the Ministry of Engineering of the Ministry of
Education, Tongji University Education, Tongji University Education, Tongji University
Shanghai, 201804, China Shanghai, 201804, China Shanghai, 201804, China
[email protected] [email protected] [email protected]

Abstract—Widespread usage of Smart Cards is leading to challenged because of their inefficiency in dealing with
unprecedentedly massive growth of the quantity of data. However, high-dimensional OD matrix. For instance, smart card system in
traditional methods still fail to fully recognize mobility patterns metropolis may produce millions of records scattered in
and a better way of data mining is to be explored. In order to hundreds of stations. Basic parameters like traffic volume,
achieve reliable pattern recognition results, principal component similarities and distances have to be computed repeatedly in the
analysis and singular value decomposition are respectively applied. original data space, so it is impractical to utilize traditional
Based on the dimensionality reduced matrix, affinity propagation statistics or pattern recognition methods to cope with the
is selected as a suitable clustering algorithm to recognize demand original high-dimensional data. Dimensionality reduction is
patterns. Spectral clustering is introduced to make a comparison.
thus a necessity to reduce redundancy and increase efficiency
Different clustering evaluation indicators are used to serve as
objective references. Representative categories are clustered,
before clustering procedures [3].
which correspond to weekdays, weekends, holidays, and different Two common ways of dimensionality reduction—Principal
months, respectively. The integration of dimensionality reduction component analysis (PCA) and singular value decomposition
and clustering offers a new way to understand daily mobility (SVD) [4] are selected to avoid the curse of dimensionality in
structure. To metro system operators, this study also provides this paper. After linear transformation and reduction, the
information on traffic volume variation and temporal distribution original OD matrix can be reduced to a decent dimension. The
of the whole year. Besides, the procedures of dealing with daily
main goal of dimensionality reduction is to capture the main
demand matrix can be applied in traffic planning, management
features to efficiently proceed clustering analysis as well as
and operation.
pattern recognition. Affinity propagation (AP) is selected in
Keywords—Affinity propagation, daily demand matrix, clustering the dimensionality reduced matrix as well as in
Principal component analysis, spectral clustering recognizing the modes of daily demand matrix (DDM) [5].
Spectral clustering [6] is another clustering means applied
I. INTRODUCTION afterwards. Different from the research by Mendes-Moreira et al.
OD matrices in public transportation restore original [7] and Khiari et al. [8] which group days into different schedule
information of how users travel spatially and temporally, so it is types by travel time, current analysis pays more attention to the
also a key input to transportation system analyses and inner relationships between OD pairs. In-depth comparisons are
transportation planning. Demand patterns are usually made between dimensionality reduction methods and between
recognized by computationally and statistically analyzing spectral clustering and AP methods, which was seldom
information from the OD matrix [1]. As a result, the extraction discussed in previous literatures. In the end, daily mobility
and classification of OD matrices can’t be over emphasized. structure of metro DDM are analyzed and visualized on such a
Weijermars and Berkum [2] discussed the clustering procedures basis. Although clustering methods (AP and spectral algorithm)
of trip demand profiles. In both of the two directions, speed and are not new, the integration of dimensionality reduction and
flow parameters sampled by automatic vehicle location system clustering was seldom discussed before. In this paper, the
were aggregated into 15 minute intervals. The results show methodologies of dimensionality reduction and clustering are
clearly that working days are clustered into a distinct category compared in detail and integrated systematically on the basis of
from weekends. On the basis of Weijermars and Berkum, our initial version in [9].
Friedrich et al. proposed the notion of characteristic traffic days II. DATA DESCRIPTION
afterwards. Traffic OD matrix is collected and analyzed on the
basis of floating phone data. The study period is presorted into In this research, analysis and calculation are conducted on
three types and the data detected by mobile phone base stations Shenzhen metro data from September 1st, 2011 to August 31th,
in different calendar days are clustered to obtain a typical travel 2012. Smart cards usage constitutes 79.7% of the total amount
pattern and OD matrix [2]. in Shenzhen, which store major and representative records of
the metro system. Shenzhen metro network is made up of 5
However, traditional methods are currently vastly lines and 118 stations. The passenger flow of Shenzhen Metro
This work was supported by the National Nature Science Foundation of saw year-on-year rise of 69.9% in 2012, which is up to 780
China under project 71171147 and the Fundamental Research Funds for the
Central Universities.

978-1-5386-1526-3/17/$31.00 ©2017 IEEE 548

2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC): Workshop

million per annual. M 29013806 U 290290 * ¦ 29013806 V13806*13806

T

Before data processing, preliminary data cleaning was Specifically, M stands for the DDM with the data size of
conducted to ensure the completeness of OD trips. Days with 290*13806. U is a 290*290 matrix and its columns stand for
missing data are excluded from the dataset. Days in April and orthogonal eigenvectors of M * M T , Σ is a 290*13806 matrix
May are thus removed from the original dataset while 290 valid and all the numbers along the diagonal are non-negative, which
days are employed to guarantee data quality. are widely recognized as singular values. V is defined as a
When SC users tap their cards on the metro turnstile, the 13806*13806 matrix and its columns stand for orthogonal
ticketing device of fare payment automatically records all the eigenvectors of M T * M .
information related to the trip. The system helps to restore the To what extent should the dimensionality be reduced is a
whole course of trips of all individuals. It shows important problem yet to be explored. The number of principal
information like boarding time, alighting time, card ID, and components is generally determined by the following three rules.
station ID, which can track the trajectory of individuals. First, the major statistical programs in the past used a default
Afterwards ridership in the whole network and total trip per day setting named the Kaiser criterion, which keeps exclusively
can be calculated in the same way. factors with an eigenvalue over 1.0 [12]. Second, the scree plot
III. METHODOLOGY is utilized to determine the optimal number of components. The
principal components are listed by decreasing order of
As is previously mentioned, metro data are sampled from eigenvalues. And the point at which the line begins to bend or
290 valid days, so the dataset can be represented by a makes an elbow toward less steep decline indicates the number
290*118*117 matrix. Nevertheless, the 3-dimensional matrix is of factors that should be retained. Third, if the cumulative
neither efficient to calculate nor easy to understand. In this variance explained by components exceeds 85%, it is perceived
research, the 3-dimensional matrix is simplified by joining that the top k primary components can be representative of the
laterally to form a row vector. Then each of the 118*117 OD
matrix, which corresponds to 118 metro stations in Shenzhen, is
changed to a 1*13,806 matrix. The 2-dimensional 290*13,806
matrix is applied in following discussion. Each day is indicated
by a row while the ridership of stations on the metro network are
represented by columns. Here, we define this 290*13,806
matrix as the DDM. The DDM is a rich data source which
enables us to conduct demand pattern extraction and recognition
over the whole network.
However, the DDM has a huge size of 13,806 columns. All
the columns (OD pairs) are to some extent related to each other.
As the columns have some overlapping information, the high
dimensional DDM may fail to obtain major features that specify
demand patterns for each day. Besides, the 13806-dimensional
matrix may give rise to the curse of dimensionality, which
hinders revealing daily characteristics in pattern recognition as
well as in data mining [10]. In this paper, PCA and SVD are
Fig. 1. Scree Plot.
chosen as two major forms of dimensionality reduction to
compress the number of matrix columns to avoid the previously
mentioned problem of curse of dimensionality. The linear 100
Cumulative Contribution Rate (percent)

transformation procedures of PCA and SVD have the advantage

90
of conserving major features of the previous high dimensional 85
matrix [11]. After dimensionality reduction, we select affinity 80
propagation to categorize metro demand patterns of DDM. 70
A. Dimensionality Reduction with Principal Component 60
Analysis and Singular Value Decomposition 50 PC
In PCA, the cumulative contribution rate of the preceding m 40 A
m n
primary components can be calculated as ¦ Ot / ¦ Ot , where 30 SV
D
t=1 t=1 20
Ot (t 1, 2, n) are eigenvalues sorted in descending order, 10
O1 t O2 t , t On t 0 ; n is defined as the column number of 0
DDM.
20

80
0

175
100

120

140

160

180

200
4

SVD factorizes a specific matrix M in the form that Principal Components

M=UΣV, which can be decomposed as shown in the formula:
Fig. 2. Cumulative Contribution Rate.

549
2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC): Workshop

former data [13]. algorithms.

In view of the current research, the first six eigenvalues of IV. OUTCOMES AND DISCUSSION
PCA and first seven eigenvalues are greater than 1.0. However,
an obvious disadvantage of the first rule is its arbitrariness (e.g., A. Results of Clustering Using Different Dimensionality
an eigenvalue of 1.01 is included whereas an eigenvalue of 0.99 Reduction Methods
is excluded). As is shown in Fig. 1, in light of the core idea of The DDM after dimensionality reduction using PCA is
scree plot according to the second rule, four principal shown in Fig. 3 results in Sep., Oct., Jan. and Feb. are shown,
components seem to be the optimal choice for both PCA and
SVD. Four components only constitute for 57% of the total SEPTEMBER
variance in terms of SVD, which is far below 85% and may fail SUN MON TUE WED THU FRI SAT
to preserve the key information. As a result, the third law of 1 2 3
cumulative contribution rate is selected to be applied in the 4 5 6 7 8 9 10
further research. As can be seen in Fig. 2, 4 and 175 are the best
11 12 13 14 15 16 17
numbers of retained factors of PCA and SVD, respectively. The
two laws both select four as the optimal number in PCA. PCA is 18 19 20 21 22 23 24
an efficient method in that high cumulative contribution rate is 25 26 27 28 29 30
achieved even when it is reduced to a low dimension. The
contribution rate is then fixed on the same level, which is Mid-autumn Festival
therefore comparable and explicable.
OCTOBER National Day
As in SVD, by multiplying V13806*175 on both of the two SUN MON TUE WED THU FRI SAT
sides of (1), the formula can be expressed as below:

M 290175 M 29013806 V13806175 U 290175 6175*175

The dimensionality of matrix M is reduced from 290*13806
to 290*175, where the number of columns is compressed and
each row still means a day. SVD does not change the internal
relationship of metro stations, information on the macro level is

still preserved.
When it comes to PCA, four retained principal components New Year
are capable to reveal the inner structure of data in such way that JANUARY
best depicts the variability of the major axis. Consequently, the SUN MON TUE WED THU FRI SAT
number of columns of the original 290*13,806 matrix is
compressed to 4 by PCA. The retained matrix holds over 85% of
the cumulative contribution rate, and the projection as well

eases the difficulty of recognition of metro demand patterns.

B. Clustering with Affinity Propagation

Dueck and Frey put forward a clustering method named Spring Festival
“affinity propagation,” by calculating parameters between data
pairs (columns of the OD matrix), where the similarity s i, k JUNE
SUN MON TUE WED THU FRI SAT
indicates to what extent the data point with index k is
appropriate to be the "exemplar" (center selected from data

points) for data point i [14]. To minimize the sum of squared

errors, each similarity is calculated in terms of Euclidean

distance as below: For data points xi and xk (distance among

columns of the OD matrix), s i, k
2
x i xk . The Dragon Boat Festival

availability matrix contains values a(i, k) that symbols how *Dates with circle are public holidays
appropriate it is for xi to pick xk as its "exemplar". 1st 2nd 3rd 4th 5th
categor categor categor categor categor
Responsibility r (i, k ) symbols how appropriate it is for xk to be y y y y y
6th 7 th
8 th
9th 10th
chosen as the cluster center of xi . In each iteration, categor categor categor categor categor
availabilities and responsibilities are both taken into y y y y y
11th 12th 13th
consideration in determination of the "exemplars". The 290*175 categor categor categor
and 290*4 dimensionality reduced matrices by SVD and PCA y y y
are afterwards clustered utilizing affinity propagation Fig. 3. The outcomes of PCA displayed on the calendar.

550
2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC): Workshop

TABLE ĉMAIN CHARACTERISTICS CORRESPONDING TO 13 CATEGORIES is on Mondays, Wednesdays and Fridays. Just like the second
one, the tenth category covers Sundays in February to August.
Category Characteristics The eleventh and the twelfth are weekdays in June, July and
1 Weekdays from Sept. to Nov. and some weekdays in Dec. August. The thirteenth is Saturdays from February to August.
The results also show that trip patterns in the same week or in
2 Weekends from Sept. to Dec.
the same holidays are not necessarily the same.
3 National Day and Sundays from Oct. to Jan.
Different from PCA, the DDM after dimensionality
4 Some weekdays in Dec. reduction using SVD is divided into 16 clusters. The SVD
5 Weekdays in Jan.
clustering outcomes are more specific and detailed than PCA,
which as well captures inner features of different patterns.
6 Days before and after Spring Festival Spring Festival is similarly fallen into a distinct category
7 Spring Festival indicating this holiday has a rather typical demand pattern that
can be determined.
8 Weekdays from Feb. to Mar. (mostly Tuesdays and Thursdays)
One surprising result is that, as can be seen from Table Ċ,
Weekdays from Feb. to Mar. (mostly Mondays, Wednesdays and Halloween Eve and Christmas Eve are as well uniquely
9
Fridays)
clustered. These two western festivals are enjoying growing
10 Sundays from Feb. to Aug. popularity in the youthful city Shenzhen. One distinguishing
11 Weekdays from Jun. to Aug. (mostly Tuesday to Thursday) feature lies in that theme parks, shopping malls and other
recreational places have huge metro traffic flow on the two eves,
12 Weekdays from Jun. to Aug. (mostly Mondays and Fridays) especially at night. Each cluster is a unique demand profile and
13 Saturdays from Feb. to Aug. the identification of this category can be conducive and guiding
to the planning and overall management of Shenzhen Metro.
Pre-arranged schemes and warnings can be worked out to cope
which is divided into 13 clusters. with rapid change in traffic flow.
The 290 valid days in current research are clustered into 13
categories with PCA methods, each symbolized by different The results of DDM using PCA and SVD have many
patterns of grids (the days expelled are represented by white similarities and can hardly be judged subjectively in terms of
color). Public holidays are labeled with red circle. Public clustering quality. On the whole, the results of clustering under
holidays contain Mid-Autumn Festival (from September 10th to PCA and SVD show demand patterns of metro system and can
12th), National Day (from October 1st to 7th), New Year (from serve as a fundamental analysis basis in operation and
January 1st to 3rd), Spring Festival (from January 22nd to 28th) management. However, the dimensionality reduced matrix of
and the Dragon Boat Festival (from June 22nd to 24th). PCA is a 290*4 matrix, while that corresponding to SVD is a

The first category contains September 1st-3rd, 6th-9th, TABLE ĊMAIN CHARACTERISTICS CORRESPONDING TO 16 CATEGORIES
15th-16th, 20th-23rd, 28th-30th, October 11th-15th, 18th-19th,
25th-27th, November 2nd-3rd, 7th-8th, 10th, 14th-17th, Category Characteristics
21st-24th, 28th-30th, December 1st, 6th-8th, 13th-14th and
1 Sundays and Mondays from Sept. to Nov.
22nd. The first one is mostly weekdays from September to
December. The mode shown above is recognized as a 2 Weekdays from Sept. to Dec.
characteristic weekday in Fall. Shenzhen metro OD volume
3 Fridays and Saturdays from Sept. to Jan.
from June to August is vastly different from Fall, which can be
automatically recognized in clustering. The second one contains 4 Weekends from Sept. to Mar.
Saturdays as well as Sundays from September to December. 5 Sundays from Oct. to Jan.
Sundays from October to January are contained in the third
cluster while most Thursdays and Fridays from September to 6 Halloween Eve and Christmas Eve
January are included in the fourth cluster. The travel mode on 7 Days before Spring Festival
Thursdays and Fridays has some distinguishing features in these
months. The fifth is mainly weekdays in January. The sixth only 8 Spring Festival
has records around Spring Festival. 9 Two days before and after Spring Festival
Apart from that, the seventh category is exclusively January 10 Days after Spring Festival
21st to 29th, totally in accordance with the Spring Festival.
11 Weekdays from Feb. to Mar.
Chinese New Year or Spring Festival is the most important
traditional Chinese holiday for family reunion. Shenzhen has a 12 Sundays from Feb. to Mar.
huge migrant population coming from every corner of China. So
13 Weekdays in Jun.
in days around the festival, labors have a strong demand to go
back home. Metro stations near major external traffic nodes are 14 Sundays from Jun. to Aug.
expected to have a large volume, including railway, bus, ferry
15 Saturdays from Jun. to Aug.
stations and airports. The eighth category is mostly Tuesdays
and Thursdays from February to March, and the ninth category 16 Weekdays from Jul. to Aug.

551
2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC): Workshop

290*175 matrix. PCA reduces the original DDM to a rather low

dimension, and the clustering results of it still capture major
demand patterns. Although SVD achieves satisfactory results, it
fails to reduce the high dimensional matrix efficiently to a
decent dimension. In other words, SVD works well at the
expense of high time complexity and is much slower than PCA
matrix when performing AP clustering.
B. Comparisons Among Clustering Methods
AP may not be convincing enough to be a better clustering
method in this research. Other mainstream methods should be
selected and compared in detail as well.
Based on graph theory, spectral clustering conducts
dimensionality reduction prior to clustering using the spectrum
of the similarity matrix, which measures the similarity of the
data and serves as an input. The process of clustering is to find a
proper partition of the graph so that the edges among other
groups present lower weights while the edges in the same Fig. 4. Determination of the best number of clusters.
cluster present higher weights. Let G (V , E ) be an undirected
clusters corresponding to the largest average silhouette value is
graph with vertex set V {v1 , v2 ,..., vn } and edge E . Spectral
the optimum [17].
algorithms are listed as follows: First, the number of clusters k
needs to be predetermined. The similarity matrix is defined as a The silhouette criterion is defined as follows:
symmetric matrix S , and each element si measures the b(i) a(i)
s(i ) ,
similarities between data pairs. The unnormalized graph max{a(i), b(i)}
Laplacian matrix is defined as L D W .( / is the where a (i ) is the average dissimilarity (Euclidean distance)
unnormalized Laplacian; ' matrix is the sum of weighted
of observation i with all other points within the same cluster,
adjacency factor; W is the weight marix) Then / and the first k
b(i ) is the lowest average dissimilarity of i and other clusters
generalized eigenvectors u1 , , uk of the generalized it does not belong to. The average silhouette is here used to
eigen-problem Lu O Du are calculated as well. Then each
evaluate clustering and to estimate the best number of clusters.
element of u can be determined, which forms a matrix 8 by As is shown in Fig. 4, the largest average silhouette value is
achieved when the cluster number equals 11. Eleven clusters
combing column vectors u1 , , uk . For i 1, , n yi is the seem to be the most appropriate in spectral clustering. However,
vector corresponding to the i -th row of U . Finally, the points Spring Festival and the first half of February are clustered into
the same category, which do not agree with the general
yi is clustered with the k-means algorithm [6]. perception. National day is not recognized, and many Sundays
and weekdays, which have different features end up in the same
In the first place, the number of clusters requires to be category. Judging from the accuracy and the universality, AP
specified. The best choice of k is often controversial and clustering using dimensionality reduction method of PCA or
ambiguous, which depends on the shape and the distribution of SVD seem to outperform spectral clustering in current research.
data points and the desired clustering resolution. The optimal
choice of k should strike a balance between compression of the C. Experiments Done Under Different Clustering Evaluation
data using clustering methods, and accuracy by assigning data Indices
points to appropriate clusters. To evaluate the performance of clustering methods, the
Davies-Bouldin index [18], Dunn’s index, and
One simple rule of thumb sets the number of cluster to Calinski-Harabasz index [19] are all widely utilized. In this
k | n / 2 with n being the number of data points [15]. study, Calinski-Harabasz index, Davies-Bouldin index and
Information criterion approach like the Akaike information
criterion (AIC), Bayesian information criterion (BIC), and the TABLE ċCLUSTERING EVALUATION INDICES OF DIFFERENT METHODS
Deviance information criterion (DIC) are also popular criteria
Clustering
[16]. Compared with the previous criteria, the average silhouette PCA+AP (13 SVD+AP (16 Spectral Analysis
Evaluation
of the data is a relatively simple and efficient criterion for Indices
categories) categories) (11 categories)
assessing the number. The silhouette measures how closely a Calinski-Harabas
data point is linked within its cluster and how loosely it is 131.3417 96.9219 93.3874
z
matched to data in the neighboring cluster. The number of Davies-Bouldin 1.5846 1.3245 1.819

Silhouette 0.2833 0.3766 0.2422

552
2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC): Workshop

Silhouette index are selected to evaluate the clustering quality. clustering in pattern recognition was seldom discussed in
A high silhouette value or Calinski-Harabasz value indicates a previous literatures. Besides, many existing methods in pattern
good clustering solution. In the contrast, the optimal clustering recognition are sophisticated and difficult to be followed. In this
solution is achieved with the minimum Davies-Bouldin index paper, the advantages of the combination of dimensionality
value. reduction and clustering are fully drawn, and different
algorithms corresponding to dimensionality reduction and
Different methods are compared when the optimal number clustering are compared in detail and integrated systematically.
of clusters is achieved. As shown in Table Ⅲ, all the three It also sheds light on forming a set of simple and applicable
indices show that Spectral analysis is inferior to AP clustering methodologies in processing smart card data to obtain trip
with the application of PCA and SVD. AP with PCA is better patterns.
than AP with SVD judging from the index of Calinski-Harabasz,
but is slightly worse than AP with SVD judging from the indices REFERENCES
of Davies-Bouldin and Silhouette. According to the value of [1] M. A. Munizaga and C. Palma, “Estimation of a disaggregate multimodal
these indices, the performance of AP with PCA and AP with public transport Origin–Destination matrix from passive smartcard data
SVD is almost on the same level. AP clustering with the from Santiago, Chile,” Transp. Res., Part C: Emerg. Technol., vol. 24, pp.
dimensionality reduction method of PCA is much faster than 9-18, Oct. 2013.
SVD and achieves a balance between accuracy of clustering and [2] W. Weijermars and E. V. Berkum, “Analyzing highway flow patterns using
efficiency of dimensionality reduction comprehensively cluster analysis,” in Proc. IEEE Intelligent Transportation Syst., Vienna,
Austria, 2005, pp. 308-313.
considered. As a result, AP clustering with the dimensionality
reduction method of PCA is more cost-effective in current [3] S. T. Roweis and L. K. Saul, “Nonlinear dimensionality reduction by
locally linear embedding,” Science, vol. 290, no. 5500, pp. 2323-2326,
study. Dec. 2000.
[4] C. Ding, X. He, H. Zha, and H. D. Simon, “Adaptive dimension reduction
V. CONCLUSIONS for clustering high dimensional data,” in Proc. IEEE Int. Conf. Data
Current research on daily demand recognition forms a set of Mining, Maebashi City, Japan, 2002, pp. 147-154.
procedures that can be selected accordingly by metro operators. [5] B. J. Frey and D. Dueck, “Clustering by passing messages between data
It provides a basic understanding of daily demand patterns like points,” Science, vol. 315, no. 5814, pp. 972-976, Feb. 2007.
how many people will travel from A to B on the basis of current [6] U. V. Luxburg, “A tutorial on spectral clustering,” Stat. Comput., vol. 17,
no. 4, pp. 395-416, Aug. 2007.
research. The demand patterns can be used as a basis for macro
traffic models and as supplementary information for operational [7] J. Mendes-Moreira, L. Moreira-Matias, J. Gama, and J. F. de Sousa,
“Validating the coverage of bus schedules: A machine learning approach,”
management. Whereas in the past, metro operators formulated Information Sciences, Vol. 293, pp. 299-313,2015.
schedules according to real time flow variation, which is [8] J. Khiari, L. Moreira-Matias, V. Cerqueira, and O. Cats, “Automated
subjective and may fail to obtain early warning information. setting of Bus schedule coverage using unsupervised machine learning,”
Days in the same cluster can be represented by a typical demand In Pacific-Asia Conference on Knowledge Discovery and Data Mining,
matrix (average value of OD matrices of days within the same Springer International Publishing, pp. 552-564, 2016.
cluster). With the help of typical demand matrix, more specific [9] C. Yang, F. F. Yan, and X. D. Xu, “Clustering Daily Metro
and targeted demand patterns can be understood and database Origin-Destination Matrix in Shenzhen China,” Appl. Mech. Mater., vol.
743, pp. 422-432, Mar. 2015.
can be built to formulate a long term reaction mechanism. In
[10] K. Fukunaga, Introduction to statistical pattern recognition. Academic
spring or fall, on working days or holidays, in downtown or press, 2013.
rural areas, metro operators can make instant plans accordingly
[11] S. Sun, C. Zhang, and G. Yu, “A Bayesian network approach to traffic flow
based on demand patterns. forecasting,” IEEE Trans. Intell. Transp. Syst., vol. 7, no. 1, pp. 124-132,
Mar. 2006.
Moreover, the daily public transit OD matrices may be
[12] S. Jiang, S. Wang, Z. Li, W. Guo, and X. Pei. “Fluctuation Similarity
averaged and retrieved to serve as inputs to other models. Daily Modeling for Traffic Flow Time Series: A Clustering Approach,” in Proc.
trip patterns along with historical OD volume can serve as the 18th IEEE Int. Conf. Intelligent Transportation Syst., Canary Islands, 2015,
input or complementary information in traffic volume pp. 848-853.
prediction and analysis of congestion patterns. For instance, hot [13] Y. Sun, N. Ye, and X. Xu, “EEG analysis of alcoholics and controls based
OD pairs or popular areas where traffic jams may occur can be on feature extraction,” in Proc. 8th IEEE Int. Conf. Signal Process, Beijing,
recognized and predicted from historical patterns using current China, 2006.
methods and real-time volume. In case of emergency in metro [14] D. Dueck and B. J. Frey, “Non-metric affinity propagation for
system, traffic flow records alone do not provide sufficient unsupervised image categorization,” in 11th IEEE Int. Conf. Computer
Vision, Rio De Janeiro, Brazil, 2007, pp. 1-8.
information to make a decision, so current work can serve as a
[15] K. V. Mardia, J. T. Kent, and J. M. Bibby, Multivariate Analysis.
supplementary tool. Little is known about the real pattern in London: Academic press, 1979.
different stations in different time periods. And this research [16] C. Goutte, L. K. Hansen, M. G. Liptrot, and E. Rostrup, “Featureϋspace
opens a door to understanding the overall behavior of metro clustering for fMRI metaϋanalysis,” Hum. brain map., vol. 13, no. 3, pp.
travelers. 165-183, May 2001.
[17] M. S. Hossain and R. A. Angryk, “Gdclust: A graph-based document
Another contribution of this paper is related to how to clustering technique,” in Proc. 7th IEEE Int. Conf. Data Mining
process huge smart card OD data with high dimension, and to Workshops, Omaha, Nebraska, 2007, pp. 417-422.
divide data of different trip days into different groups which [18] D. L. Davies and D. W. Bouldin, “A cluster separation measure,” IEEE
were not obvious before. Dimensionality reduction or clustering Trans. Pattern Anal. Mach. Intell., vol. 2, pp. 224-227, Apr. 1979.
solely is well-known and widely used in trip pattern recognition. [19] T. Caliński and J. Harabasz, “A dendrite method for cluster analysis,”
However, the integration of dimensionality reduction and Commun. Stat.-Theor. M., vol. 3, no. 1, pp. 1-27, 1974.

553

Deed of Sale Corp
67% (3)
Deed of Sale Corp
2 pages
Analysis of Spatialtemporal Validation Patterns in Fortalezas Public Transport Systems A Data Mining Approach
No ratings yet
Analysis of Spatialtemporal Validation Patterns in Fortalezas Public Transport Systems A Data Mining Approach
13 pages
Briand2016 Article AMixtureModelClusteringApproac
No ratings yet
Briand2016 Article AMixtureModelClusteringApproac
14 pages
Journal of Advanced Transportation - 2018 - Sun - Identifying Public Transit Commuters Based on Both the Smartcard Data And
No ratings yet
Journal of Advanced Transportation - 2018 - Sun - Identifying Public Transit Commuters Based on Both the Smartcard Data And
10 pages
1 s2.0 S0957417424005128 Main
No ratings yet
1 s2.0 S0957417424005128 Main
17 pages
Telecommunications Traffic : Technical and Business Considerations
From Everand
Telecommunications Traffic : Technical and Business Considerations
Sigit Haryadi
No ratings yet
A Holistic Data-Driven Framework For Developing A Complete Profile of Bus Passengers
No ratings yet
A Holistic Data-Driven Framework For Developing A Complete Profile of Bus Passengers
18 pages
Transportation Research Part C: Sciencedirect
No ratings yet
Transportation Research Part C: Sciencedirect
26 pages
Developing A Microscopic City Model in SUMO Simulation System
No ratings yet
Developing A Microscopic City Model in SUMO Simulation System
9 pages
Geometric Primitive: Exploring Foundations and Applications in Computer Vision
From Everand
Geometric Primitive: Exploring Foundations and Applications in Computer Vision
Fouad Sabry
No ratings yet
DCA Exm
No ratings yet
DCA Exm
8 pages
Bus
No ratings yet
Bus
19 pages
Etikaf Hussain Thesis
No ratings yet
Etikaf Hussain Thesis
236 pages
s00521-021-06522-5 (1)
No ratings yet
s00521-021-06522-5 (1)
12 pages
Transport Modelling in The Age of Big Data: Cuauhtémoc Anda Pieter Fourie Alexander Erath
No ratings yet
Transport Modelling in The Age of Big Data: Cuauhtémoc Anda Pieter Fourie Alexander Erath
45 pages
aixibv3
No ratings yet
aixibv3
31 pages
Research On Traffic Situation Analysis For Urban Road Network Through Spatiotemporal Data Mining: A Case Study of Xi'an, China
No ratings yet
Research On Traffic Situation Analysis For Urban Road Network Through Spatiotemporal Data Mining: A Case Study of Xi'an, China
15 pages
IEEE_Format_V3-1_241124_120111
No ratings yet
IEEE_Format_V3-1_241124_120111
10 pages
Origin-Destination Estimation of Bus Users by Smart Card Data
No ratings yet
Origin-Destination Estimation of Bus Users by Smart Card Data
16 pages
A Framework For Passengers Demand Prediction and Recommendation
No ratings yet
A Framework For Passengers Demand Prediction and Recommendation
8 pages
Spatio-Temporal-Decoupled Masked Pre-Training: Benchmarked On Traffic Forecasting
No ratings yet
Spatio-Temporal-Decoupled Masked Pre-Training: Benchmarked On Traffic Forecasting
10 pages
Presentation+Slides_(2)[1]
No ratings yet
Presentation+Slides_(2)[1]
23 pages
Transportation Analytics in The Era of Big Data: Satish V. Ukkusuri Chao Yang Editors
No ratings yet
Transportation Analytics in The Era of Big Data: Satish V. Ukkusuri Chao Yang Editors
240 pages
Highway Sensors
From Everand
Highway Sensors
Lucas Lee
No ratings yet
Edge Computing Applications in Supply Chain Management
From Everand
Edge Computing Applications in Supply Chain Management
Bo Li
No ratings yet
Multi-Criteria Optimization of Public Transport On Demand
No ratings yet
Multi-Criteria Optimization of Public Transport On Demand
10 pages
Smart Railway Tracks
From Everand
Smart Railway Tracks
Serena Vaughn
No ratings yet
Visualization and Interpretation: Humanistic Approaches to Display
From Everand
Visualization and Interpretation: Humanistic Approaches to Display
Johanna Drucker
No ratings yet
(2020) Ramli - 2020 - IOP - Conf. - Ser. - Mater. - Sci. - Eng. - 875 - 012027
No ratings yet
(2020) Ramli - 2020 - IOP - Conf. - Ser. - Mater. - Sci. - Eng. - 875 - 012027
8 pages
Transportation Network Design
From Everand
Transportation Network Design
Benjamin Ramirez
No ratings yet
Inferring Temporal Motifs For Travel Pattern Analysis Using Large Scale Smart Card Data
No ratings yet
Inferring Temporal Motifs For Travel Pattern Analysis Using Large Scale Smart Card Data
21 pages
Cryptocurrency Market Forecasting With Catboost Models
From Everand
Cryptocurrency Market Forecasting With Catboost Models
Heng Chen
No ratings yet
Analysis of Mode Choice Affects From The Introduction of Doha Metro Using Machine Learning and Statistical Analysis (2023)
No ratings yet
Analysis of Mode Choice Affects From The Introduction of Doha Metro Using Machine Learning and Statistical Analysis (2023)
16 pages
An Origin Destination Matrix Estimate Fo
No ratings yet
An Origin Destination Matrix Estimate Fo
12 pages
CO5 notes
No ratings yet
CO5 notes
11 pages
Smart City Technologies
From Everand
Smart City Technologies
Everett Sinclair
No ratings yet
Neues verkehrswissenschaftliches Journal - Ausgabe 16: Capacity Research in Urban Rail-Bound Transportation with Special Consideration of Mixed Traffic
From Everand
Neues verkehrswissenschaftliches Journal - Ausgabe 16: Capacity Research in Urban Rail-Bound Transportation with Special Consideration of Mixed Traffic
Ullrich Martin
No ratings yet
Apriori Paper Important
No ratings yet
Apriori Paper Important
12 pages
386 1492 1 PB
No ratings yet
386 1492 1 PB
14 pages
1-s2.0-S0198971523000844-main
No ratings yet
1-s2.0-S0198971523000844-main
10 pages
Zannat-Choudhury2019 Article EmergingBigDataSourcesForPubli
No ratings yet
Zannat-Choudhury2019 Article EmergingBigDataSourcesForPubli
19 pages
transportion Enginnering
No ratings yet
transportion Enginnering
26 pages
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
From Everand
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
WINTON CLEM
No ratings yet
Highway-Rail Grade Crossing Identification and Prioritizing Model Development
From Everand
Highway-Rail Grade Crossing Identification and Prioritizing Model Development
Maxim A. Dulebenets
No ratings yet
4g6
No ratings yet
4g6
13 pages
Review paper diagrams
No ratings yet
Review paper diagrams
20 pages
Estimation of A Disaggregate Multimodal Public Transport
No ratings yet
Estimation of A Disaggregate Multimodal Public Transport
10 pages
Uber and Taxi Demand Prediction in Cities
No ratings yet
Uber and Taxi Demand Prediction in Cities
5 pages
Conference Paper LATENT DIRICHLET ALLOCATION (LDA)
No ratings yet
Conference Paper LATENT DIRICHLET ALLOCATION (LDA)
9 pages
MunizagayPalma2012pre Print
No ratings yet
MunizagayPalma2012pre Print
33 pages
Urban Transport
No ratings yet
Urban Transport
6 pages
Taxi Data Analysis Using K-Mean Clustering Algorithm
No ratings yet
Taxi Data Analysis Using K-Mean Clustering Algorithm
6 pages
TUGAS1 Kusumastuti R 1806243916
No ratings yet
TUGAS1 Kusumastuti R 1806243916
8 pages
Active Appearance Model: Unlocking the Power of Active Appearance Models in Computer Vision
From Everand
Active Appearance Model: Unlocking the Power of Active Appearance Models in Computer Vision
Fouad Sabry
No ratings yet
Dynamic spatio-temporal pattern discovery: a novel grid and density-based clustering algorithm
No ratings yet
Dynamic spatio-temporal pattern discovery: a novel grid and density-based clustering algorithm
11 pages
Structure For Temporal Granularity Spatial Resolution and Scalability
No ratings yet
Structure For Temporal Granularity Spatial Resolution and Scalability
11 pages
Gis-Based Solution of Scheduling and Routing School Buses-A Theoretical Approach
No ratings yet
Gis-Based Solution of Scheduling and Routing School Buses-A Theoretical Approach
4 pages
From Data To Knowledge To Action: A Taxi Business Intelligence System
No ratings yet
From Data To Knowledge To Action: A Taxi Business Intelligence System
6 pages
Predicting Hourly Boarding Demand of Bus Passengers 3.6.2
100% (1)
Predicting Hourly Boarding Demand of Bus Passengers 3.6.2
81 pages
Data Mining Full
No ratings yet
Data Mining Full
19 pages
The Analysis of Trajectories in Moscow Subway
No ratings yet
The Analysis of Trajectories in Moscow Subway
13 pages
Punong Barangay Tasks and Responsibilities 2018 PDF
100% (1)
Punong Barangay Tasks and Responsibilities 2018 PDF
56 pages
Tda7294 Letak Components
No ratings yet
Tda7294 Letak Components
51 pages
Lopsa Feb 2016
No ratings yet
Lopsa Feb 2016
63 pages
Fisiha Fikiru - The Effect of HRM Practices On HCQ at TASH-1
No ratings yet
Fisiha Fikiru - The Effect of HRM Practices On HCQ at TASH-1
110 pages
Lecture Notes: Neural Network & Fuzzy Logic
No ratings yet
Lecture Notes: Neural Network & Fuzzy Logic
82 pages
MONEY AND INFLATION
No ratings yet
MONEY AND INFLATION
42 pages
HPLC Training New Latest
No ratings yet
HPLC Training New Latest
60 pages
Lecture#7 - Chap#2 (Syntax Directed Translator (Part-III) )
No ratings yet
Lecture#7 - Chap#2 (Syntax Directed Translator (Part-III) )
26 pages
S35-4 Manual
No ratings yet
S35-4 Manual
48 pages
The MRT 3 Contract: Atty. Helena C. Tolentino 1 June 2012
100% (4)
The MRT 3 Contract: Atty. Helena C. Tolentino 1 June 2012
39 pages
Ads Unit III
No ratings yet
Ads Unit III
37 pages
1581 en
No ratings yet
1581 en
2 pages
Internship Report of MTB
No ratings yet
Internship Report of MTB
26 pages
Charter Act of 1833
No ratings yet
Charter Act of 1833
15 pages
Procurement Process
No ratings yet
Procurement Process
9 pages
Mirekold TOPCon Solar Panel-200W
No ratings yet
Mirekold TOPCon Solar Panel-200W
2 pages
Osmena v. COMELEC
No ratings yet
Osmena v. COMELEC
3 pages
Sales_DC_24-25_5602
No ratings yet
Sales_DC_24-25_5602
1 page
2ES01
No ratings yet
2ES01
2 pages
Crave Syclo Solution PDF
No ratings yet
Crave Syclo Solution PDF
4 pages
Final Copy of Translated Nepal Labour Act 2074 (2017) Sept 23 - English
No ratings yet
Final Copy of Translated Nepal Labour Act 2074 (2017) Sept 23 - English
102 pages
Fusing EDC Specifications and Design Veeva Vault CDMS
No ratings yet
Fusing EDC Specifications and Design Veeva Vault CDMS
9 pages
Usage of Organic Manure
No ratings yet
Usage of Organic Manure
4 pages
Null
No ratings yet
Null
24 pages
Volvo FM 420 4x2 Tractor
100% (1)
Volvo FM 420 4x2 Tractor
8 pages
Contact: Muhammad Ahmed Raza Khan
No ratings yet
Contact: Muhammad Ahmed Raza Khan
5 pages
Chapter 8 - Short Term Financing
100% (1)
Chapter 8 - Short Term Financing
45 pages
Hamza Masoud C.V - "I&C" Instrument and Control Specialist
No ratings yet
Hamza Masoud C.V - "I&C" Instrument and Control Specialist
3 pages
Action Plan in Reading English
No ratings yet
Action Plan in Reading English
3 pages

Daily Metro Origin-Destination Pattern Recognition Using Dimensionality Reduction and Clustering Methods

Uploaded by

Daily Metro Origin-Destination Pattern Recognition Using Dimensionality Reduction and Clustering Methods

Uploaded by

2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC): Workshop

Daily Metro Origin-Destination Pattern

978-1-5386-1526-3/17/$31.00 ©2017 IEEE 548

million per annual. M 290*13806 U 290*290 * ¦ 290*13806 *V13806*13806

transformation procedures of PCA and SVD have the advantage

SVD factorizes a specific matrix M in the form that Principal Components

former data [13]. algorithms.

M 290*175 M 290*13806 *V13806*175 U 290*175 * 6175*175 

points) for data point i [14]. To minimize the sum of squared       

290*175 matrix. PCA reduces the original DDM to a rather low

Silhouette 0.2833 0.3766 0.2422

You might also like

million per annual. M 29013806 U 290290 * ¦ 29013806 V13806*13806

M 290175 M 29013806 V13806175 U 290175 6175*175

points) for data point i [14]. To minimize the sum of squared