A Generic and Flexible Geospatial Data Warehousing and Analy - 2019 - Procedia C
A Generic and Flexible Geospatial Data Warehousing and Analy - 2019 - Procedia C
com
Available online at www.sciencedirect.com
ScienceDirect
ScienceDirect
Procedia Computer Science 00 (2018) 000–000
Available online at www.sciencedirect.com
Procedia Computer Science 00 (2018) 000–000
www.elsevier.com/locate/procedia
www.elsevier.com/locate/procedia
ScienceDirect
Procedia Computer Science 155 (2019) 226–233
The 16th International Conference on Mobile Systems and Pervasive Computing (MobiSPC)
The 16th International Conference on Mobile
August 19-21, Systems
2019, and
Halifax, Pervasive Computing (MobiSPC)
Canada
August 19-21, 2019, Halifax, Canada
A Generic and Flexible Geospatial Data Warehousing and Analysis
A Generic and
Framework forFlexible Geospatial
Transportation Data Warehousing
Performance and Analysis
Measurement in Smart
Framework for Transportation Performance
Connected Cities Measurement in Smart
Connected Cities
Patricio Vicunaa,b* , Sandeep Mudigondabb, Camille Kamgabb,
Patricio Vicuna , Sandeep Mudigonda , Camille Kamga ,
a,b*
Abstract
Abstract
Rapid rise in information and communication technology in various walks of life has helped digitization of human services,
Rapid risetransportation.
including in informationThe andresult
communication technology
of digitization in various
is vast amount walks and
of location of life
timehas helped
data digitization
on humans of human
and goods, whichservices,
in turn
including
provide a transportation.
valuable resourceThefor result of digitization
transportation is vast
system amount offor
performance location and timeand
the managing dataoperating
on humans and goods,
agencies. which
A wide in turn
variety of
provide a valuable
transportation resourcemetrics
performance for transportation system performance
(TPMs) to characterize, evaluateforandthe managing
improve and operatingtowards
the performance agencies. A wide
making the variety of
operating
transportation performance
agencies, and thereby metrics
the cities, (TPMs)
‘smart’ musttobe
characterize, evaluatetoand
presented suitably improve
policy the performance
and decision towardsand
makers. TPMs making the operating
data sources are of
agencies, and thereby theextent
varying spatiotemporal cities, ‘smart’
and data must be presented
formats, suitably interoperability.
necessitating to policy and decision makers.
Estimating TPMsrequires
TPMs and dataa sources
genericare of
data
varying
warehousingspatiotemporal extent and
framework handling data datasets,
various formats, preferably,
necessitating
builtinteroperability.
in-house for the Estimating
agencies to TPMs requires
sustainably movea towards
genericsmart
data
warehousing
city goals andframework handling
meeting federal datavarious datasets,
reporting preferably,
mandates. In thisbuilt in-house
study, such afor the agencies
flexible to sustainably
data analytics moveistowards
framework smart
demonstrated
city novel
via goals data
and meeting
ingestionfederal data reporting
and visualization mandates.
outputs. In this
Several use study, suchevaluating
cases for a flexible efficiency,
data analytics framework
reliability is demonstrated
and sustainability of
via novel dataprojects
transportation ingestionandand visualization
system outputs. Several use cases for evaluating efficiency, reliability and sustainability of
are presented.
transportation projects and system are presented.
© 2019 The Authors. Published by Elsevier B.V.
©© 2019
2019 The
The Authors.
Authors. Published
Published by by Elsevier
Elsevier B.V.
B.V.
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
This is an open access
Peer-review article under theConference
CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Peer-review under responsibility
under responsibility of the
of the Program
Conference Program Chairs.
Chairs.
Peer-review under responsibility of the Conference Program Chairs.
Keywords: "GPS data; Transportation System Management; Geospatial Analysis; Data warehouse; Visualization"
Keywords: "GPS data; Transportation System Management; Geospatial Analysis; Data warehouse; Visualization"
Digitization of services has resulted in massive changes to how services are consumed. Transportation is
one such service that has been revolutionized by integrated Information and Communication Technologies (ICT).
The revolution is being realized through speed of access to transportation through connected vehicles providing real-
time vehicle location. The vast amount of ‘Big Data’ generated through this automated vehicle location (AVL) can
provide a better understanding for transportation system managers of current and anticipated of state. This
understanding can help manage resources better and improve efficiency, safety, reliability and sustainability of the
transportation system. All these objectives fall under the purview of Transportation System Management and
Operations (TSMO) [3]. Effective TSMO through technology and data analytics can help realize the goals of smart
cities [10]
The state of the transportation system can be quantified through a wide range transportation performance
metrics (TPMs) listed in Table 1 [3]. Each of these TPMs involve varying spatiotemporal resolution and data
requirements to be derived from AVL data – each TPM has associated statistical measures such as mean, variance,
percentiles, minima and maxima. Furthermore, the AVL data and formats could themselves varying in sampling and
spatiotemporal extent. Additionally, under the current Fixing America's Surface Transportation Act or "FAST Act",
the U.S. Federal Highway Administration has provided funding to state and local transportation governments for
projects which will reduce traffic congestion and vehicular emissions. The FAST Act requires that agencies with a
transportation management area of more than one million in population representing a nonattainment or
maintenance area, develop and update biennially, a performance plan to achieve air quality and congestion reduction
targets. These agencies are required to develop procedures to report several of these metrics to USDOT.
Average Speed Distance divided by travel time Point / Link To estimate fuel consumption, air quality
impact; to evaluate alternatives for action
and non-action projects
Average Delay Delay over the length of a road Link To compare the different degrees of
segment congestions
Average Queue Number of vehicles by lane Link To identify operational problems
To identify hot spots
Average Density Vehicles per lane-mile Link For interrupted flow to calculate Level of
Service
Travel-Time Variance Travel time variability Path & network To measure the travel time variability in
traffic operations
Travel time reliability Variability of travel times Path & network For the assessment of the benefits of traffic
Percent of travelers who arrive at operations improvements.
their destination within an acceptable
time
While, the TPMs provide direct insight for planning and operating smart and connected cities through
TSMO, there is a need for a flexible data analytics framework for the agencies and departments of transportation
(DOTs) for generating different types (point, link, path and network-level) of metrics from similar datasets. Metrics
such as speed, travel time, delay, queues, can be filtered through by points, segments, zones, time-of-day, among
other filters. Despite the ubiquitous presence of smartphone and GPS-enabled devices and associated ‘Big Data’, the
agencies and DOTs have limited resources to tap these vast datasets for achieving smart city goals. In most cases,
the agencies have to purchase AVL datasets. Investing more resources for ready-made, off-the-shelf data analytics
228 Patricio Vicuna et al. / Procedia Computer Science 155 (2019) 226–233
Author name / Procedia Computer Science 00 (2018) 000–000 3
solutions could further constrain resources. Alternative solutions of in-house development of data analytics solutions
are more sustainable given the rapidly dropping cost of computing infrastructure.
In this study, the development of generic and flexible data analytics framework that can derive several TPMs
from different AVL data sources are demonstrated. The flexibility in data ingestion into the framework is achieved
through geofence-based network segmentation approach that can be extended to multiple data types and sources.
The output format also provides ample flexibility to adapt visualization methodologies to present information
relevant to operating and managing personnel and decision makers. Three use cases are presented where show the
flexibility and utility of the framework towards achieving smart city goals.
Different agencies and developers around the US have developed data analytics platforms for analyzing AVL and
sensors data. Only a limited number are described here due to limited space. California Performance Measurement
System (PeMS) [9, 13], uses approximately 40,000 traffic detectors allocated across all major metropolitan areas, as
well PeMS uses incidents data, lane closures data, toll tags data, census traffic counts, vehicle classification, Weigh
in motion, road inventory. PeMS generate multiple performance measures such as VHT, VMT, speeds, travel times,
delays, incidents, bottleneck based on congestion maps filtered by day, directional travel.
CATT Lab’s [2] work Regional Integrated Transportation Information Systems (RITIS) is an automated data
sharing dissemination and archiving system that includes many performance measures, dashboards, and visual
analytics tools. RITIS generate multiple performance measures such as travel time comparison metric, travel time
delta ranking, tool metric VMT, trajectory / Trip data analytics/ OD Matrix, top x movements, delay severity metric
(normalized TT = median TT / speed limit TT), Reliability Metric (normalized IQR = (75th percentile TT – 25th
percentile TT)/ speed limit TT).
Digital Roadway Interactive Visualization and Evaluation Network Application [12] (DRIVE Net), is a tool for
Washington State Department of Transportation (WSDOT) operational data usage using third party AVL data. The
main key features are level of services based on the highway capacity manual (HCM), travel time reliability per
corridors, multimodal analysis, safety performance, freeway elevation analysis. The travel time reliability module
allows to analyze variations in travel times by time of day, effect on travel rates of congestion and incidents, and the
travel time statistics.
The University at Albany Visualization and Informatics Lab (AVAIL) [1] set up a data analytics platform, the
National Performance Management Research Data Set (NPMRDS) Performance Measurement Tool Suite with the
metrics: speeds, average hours of delay, total hours of delay, Truck Travel Time Reliability (TTTR) Index, Level of
Travel Time Reliability (LOTTR) by interstate and non-interstate, Travel Time Index, and Planning Time Index,
mileage uncongested. The following table shows the seven basic measures of effectiveness, the ones are useful for
evaluation of traffic operations performance of highways.
In addition to analytic platforms developed by institutions, commercial software solutions are also available.
Streetlight Big Data for Mobility is a commercial solution that process large sets of geospatial data points to
measure the interaction of pedestrians, bikes, and vehicles, using as data source Navigation-GPS and location base
services (LBS). This web application provides different kind of analysis such as origin-destination (OD), segment
analysis, AADT estimations. The main performance metrics provided are: speed in miles per hour, travel time in
second, origin-destination streetlight index (relative volume between O&D for a time period). [11]
DB4IOT, it a database engine purpose-built for the internet of moving things, as well, a geospatial visualization
tool, DB4IOT uses GPS data from vehicular navigation devices as a data source and generates metrics such as
origin-destinations over time maps, charts, and graphs; DB4IoT can integrate datasets such a bluetooth data, weigh
in motion data, automatic traffic recorders. The main traffic performance metrics are speed, travel time, O-D pairs
relative frequency by time of day. [3]
INRIX Performance Measures, it is a cloud-based analytics suite that use INRIX traffic data. The main traffic
performance measures used are travel time in minutes, speed in miles per hour, by location, dates, day or week. [7]
In general, for GPS data, the granularity is a ping (latitude, longitude) for about every 30/60/120 seconds.
Patricio Vicuna et al. / Procedia Computer Science 155 (2019) 226–233 229
4 Vicuna et al./ Procedia Computer Science 00 (2018) 000–000
3. Methodology
To estimate TPMs in different segments of a transportation network an application to automate the data extraction
and analysis process is created using a set of technologies such as geographical information system (GIS) software,
an high-level, interpreter-based, general-purpose programming language, a relational database management system
(RDMS), and interactive visualizations and business intelligence. TPMs are calculated based on the following five
stages illustrated in Fig. 1. Flowchart to process TPMs.
Three use cases for the proposed data analytics framework are presented using AVL dataset from INRIX® and truck
AVL datasets. These data and metadata are as below and shown in Table 2:
INRIX® dataset: INRIX® Data for May 2017 included trip records for approximately 984,207 device-IDs
summing up to almost 204,680,565 waypoints were used. Personal and Commercial vehicles have connected
devices that are gathering latitude and longitude New York area. The following table shows considered fields.
Truck AVL dataset: The truck AVL database have 45,057,172 records, from 08/2013 – 09/2016 with a ping
frequency every two minutes, the fleet size is 500 vehicles, the AVL devices are installed in commercial vehicles
230 Patricio Vicuna et al. / Procedia Computer Science 155 (2019) 226–233
Author name / Procedia Computer Science 00 (2018) 000–000 5
that operates in Hunts Point Market, Bronx, New York. The following table shows considered fields.
Flowchart to process TMPs
CLIENT SIDE
Visual AZURE/
PowerBI R Phtyton Tableu
Studio AWS/ GCP
ͷ Visualization process
Connection
Multidimensio
• DOT MODEL Tabular Model
nal Model
• Fact Tables
SERVER SIDE
• Dimensions OLAP SSAS
• Geospatial Databases
Geospatial
ETL SSIS
Analysis
Ͷ
Data
Selection
50th (median) and 75th percentile, standard deviation based on 2-minute time interval per hour.
The chart shows the average speed every two minutes for the hour between 08:00am and 09:00am. The map
highlights the traffic analysis zone. The pie chart shows speed categories such as: low, low-moderate, high-
moderate, and high speed. Fig. 2 highlights congestion in the northbound corridor with an average speed of 10.73
mph in one regular weekday between Tuesday and Thursday.
Table 2. INRIX and Truck AVL Datasets
INRIX® Fields Description Truck AVL Fields Description
Trip id A trip's unique identifier VIN Vehicle Identification
(2,733,310 records). Number (500 commercial
vehicles)
Waypoint The order of the waypoint MsgTime Timestamp
sequence
Time Stamp The capture date and time of the TimeZone GMT Greenwich Mean
waypoint in UTC. Time
Latitude Decimal degree latitude GPSLat Location latitude
coordinates of the waypoint.
Longitude Decimal degree longitude GPSLong Location longitude
coordinates of the waypoint.
Device-Id A device's unique identifier Ignition_Status If the engine is on or off
Speed Vehicle speed
Odometer distance traveled by a
vehicle
VehBus Light or empty space
The three cases are dynamic visualizations that allow the agency to make policy decisions to improve traffic
conditions, safety. The analyst can also adapt the information to assess air quality changes due to policy measures. A
real use case is for the Hunts Point Clean Truck program (HPCTP) where the AVL data was geofenced to
demonstrate that truck engines upgrades has reduced NOx emissions in particular neighborhoods. This is a
significant sustainability and quality of life goal achieved, particularly, because the South Bronx neighborhood has
more than double the asthma rates than other neighborhoods in the NYC. (NYCDOT, 2015)
Technical specifications for Software and Hardware: For the previous studied cases were used the following
software: Python version 2.7, SQL server 2016, Windows Server 2016, Power BI. The hardware used was: Server
Intel Xeon GOLD 6148 CPU 2.4GHz (2 processors), 896 GB RAM, SSD 276 GB and 7TB HHD.
5. Conclusions
This study presents a generic data analytics framework for in NYC to analyse different AVL datasets for generating
different kinds of TPMs. This framework provides a flexible approach to data ingestion via customized geofence-
232 Patricio Vicuna et al. / Procedia Computer Science 155 (2019) 226–233
Author name / Procedia Computer Science 00 (2018) 000–000 7
based network segmentation to evaluate TPMs on different spatial scales with variety of metrics and visualizations.
Fig. 2. Traffic performance measures using INRIX dataset for Highways - Case 1.
Fig. 3. Traffic performance measures using INRIX dataset for local streets - Case 2 -
Patricio Vicuna et al. / Procedia Computer Science 155 (2019) 226–233 233
8 Vicuna et al./ Procedia Computer Science 00 (2018) 000–000
Fig. 4. Traffic performance measures using AVL dataset for Regional Analysis - Case 3 -
Through three use cases, various TPMs on roadways of different kinds are shown. Real-world success story is also
presented for utility of the data to demonstrate policy measures for enhancing sustainability and quality of life – key
smart city goals. Of the several utilities of the generic TPM data analytics framework are identification of
congestion and air quality hot spots and before-after analysis of policy measures. The output data format provides
flexibility for various visualization for suitable communication with policy makers and public.
As a part of future work, the analytics framework will be expanded to incorporate additional datasets from other
sources such as crash data, construction data. A modern data warehousing and data analysis hardware/software
platform to develop automated and semi-automated data related reports will be established to support NYC DOT
management and operations and planning. Comparison of our results with other available transportation analytic
systems will also be performed.
References