0% found this document useful (0 votes)
178 views

An Open Source Tool To Extract Traffic Data From Google Maps Limitations and Challenges

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
178 views

An Open Source Tool To Extract Traffic Data From Google Maps Limitations and Challenges

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

An Open Source Tool to Extract Traffic Data from

Google Maps: Limitations and Challenges


2021 International Symposium on Networks, Computers and Communications (ISNCC) | 978-1-6654-0304-7/21/$31.00 ©2021 IEEE | DOI: 10.1109/ISNCC52172.2021.9615680

Sifatul Mostafi, Khalid Elgazzar


Electrical, Computer & Software Engineering
Ontario Tech University
Oshawa, ON, Canada
{sifatul.mostafi, khalid.elgazzar}@ontariotechu.ca

Abstract—Road traffic modelling, analysis, and prediction system. Mobile phone users all around the world support this
require accurate and preprocessed spatiotemporal traffic data crowdsourced hybrid geolocation system of Google Maps.
including measurements like traffic speed and count. Many Google Maps also integrates region-specific local information
existing and emerging surveillance systems are currently used to
facilitate traffic data collection. Google Maps is a web mapping to validate information gathered from GPS crowdsourcing.
service that leverages GPS crowdsourcing to retrieve accurate Studies reveal that data retrieved from Google Maps pro-
traffic data verified by both the research community and industry. vides reasonably accurate and feasible traffic data [3]–[5].
Google Maps facilitates APIs to provide access to this data with Thus, Google Maps traffic data has been widely used both
a paid subscription. Google Maps also make this traffic data in academia and industry in application domains like traffic
publicly available through their web interface, but with limited
features and requires further pre-processing. All existing tools to monitoring systems, vehicle routing, travel time prediction,
facilitate these publicly available traffic data through the Google smart parking systems and so on. The Google Maps Ap-
Maps web interface is either lack essential functionalities or are plication Programming Interfaces (APIs) provide a variety
proprietary. We have developed an open-source web-based data of functionalities for accessing content from Google Maps
scraper tool to extract and export available traffic data from the that lead to the exploration of different applications based on
Google Maps web interface in multiple usable formats. The tool
provides a user-friendly interface that enables users to visually Google Maps APIs [6]. These APIs are not free and come
mark the locations of interests and to flexibly determine the with a paid subscription. Google Maps also provide traffic
required periods for data collections. Performance evaluation data with limited features through their publicly available web
shows that the tool can retrieve traffic data from Google Maps interface which is not in a usable format and requires manual
in a linear time complexity with no significant computational efforts to apply in research and development. An open-source,
overhead. Limitations and challenges to develop such tools are
also investigated. and publicly available tool to retrieve and preprocess these
Index Terms—Traffic data extraction, Google Maps publicly available traffic data from Google Maps would benefit
the research community to access and utilize this data in their
I. I NTRODUCTION research and development efforts.
There are open map service providers to offer traffic data
With the increase of vehicles, passengers, roads and the through their free APIs which are not reasonably accurate to
number of trips per day, a large volume of traffic data is use in research [7]. Many third-party tools are also available
generated every day. These traffic data can be useful in plan- online that provide utilities to format traffic data from the
ning optimal routes, designing traffic control and monitoring Google Maps web interface. These online tools either lack
systems, implementing intelligent transportation systems, and functionalities or do not have a free and publicly accessible
many more. Researchers have used different data collection option. Manual retrieval and pre-processing of these publicly
techniques to collect and preprocess traffic data with different available traffic data from the Google Maps web interface
technical aspects and operational characteristics. Global Posi- is very time-consuming, inefficient and error-prone. A pixel
tioning System (GPS) based crowd-sourcing technologies have positioning-based image processing technique is proposed to
been widely used to collect traffic data as it leverages collective extract traffic layer data from the Google Maps web interface
traffic information from the crowd through their internet- which is computationally expensive and lacks in usability [8].
enabled devices in real-time, which has enormous potential In this work, we aim to develop a web-based open-source
to be an alternative to the traditional traffic data collection tool to acquire and preprocess publicly available traffic data
techniques [1]. Studies show that traffic data collection through efficiently from the Google Maps web interface. We design
crowdsourcing provides reliable travel time estimates with our tool to provide good usability and offer multiple features
reasonable accuracy [2]. that are either available in Google Maps or can be extracted
Google Maps uses a hybrid positioning system to collect from existing features. Lastly, we evaluate the performance of
user location as passive crowdsourced information for their the tool for further optimization and enlist the limitation and
platforms from mobile phones running the Android operating challenges of developing such a tool.
978-1-6654-0304-7/21/$31.00 ©2021 IEEE The remainder of this paper is organized as follows. Section

Authorized licensed use limited to: Institut Teknologi Bandung. Downloaded on July 21,2023 at 03:34:51 UTC from IEEE Xplore. Restrictions apply.
II provides a comprehensive review of Google Maps as an provided by Google Maps are compared with a traffic dataset
authenticated source of traffic data. In this section, we further collected through sensors installed on different road segments
investigate and identify the barriers to access traffic data due in the city of Paris. Google Maps traffic data achieved an
to the limitations and drawbacks of existing tools and APIs. overall accuracy of 95.8% in fluid traffic situations [5]. Due
Section III demonstrates the working methodology of the to the credibility of Google Maps traffic data, these data have
proposed tool that includes a user interface, data extraction been used in a variety of traffic-related research including
techniques and data description. Section IV reports the perfor- traffic visualization, monitoring and simulation, travel time
mance analysis of the tool. Section V enlists the limitation and prediction, congestion analysis, route planning, traffic light
challenges involved in developing such a tool. Lastly, section control, accident detection, traffic impact analysis, etc.
VI provides the final remarks on the work. Google Maps provides Distance Matrix API1 to provide the
functionality to users and developers to access their traffic data
II. BACKGROUND AND R ELATED W ORK including travel time from source to destination. Accessing
Google Maps is a web mapping service developed by these APIs requires a paid subscription. OpenStreetMap2 is
Google which is the dominant provider of transport infor- an open-source alternative to Google Maps traffic data, but
mation and innovation. Google Maps works as a search tool it is less efficient in terms of accessibility and geospatial
to provide location-based utilities [9]. Google Maps utilizes accuracy [7]. OutSrcaper3 is a web-based and third-party tool
GPS data from the Google Maps application on smartphones to extract traffic data from Google Maps which also has pricing
through crowdsourcing [10], [11]. Google Maps also has tags. As a result, researchers have been looking for ways to
access to local municipality data through contracts such as extract publicly available traffic data from the client-side of the
road-specific information, road types, road works, and speed Google Maps web interface without using external tools like
limits. Google uses this data to design algorithms to calibrate OutScraper. Traffic layers in Google Maps are provided in the
and fine-tune predicted travel time constantly. Therefore, the form of rendered images that show the state of the traffic con-
travel time predictions by Google Maps provide accurate travel gestion on different road segments using four different colours
time predictions than the travel times that are systematically where each colour represents a specific traffic congestion level
recorded by Uber [12]. O–D travel time matrix can be retrieved [5]. Caiza et al. [8] shows how image processing techniques
through Google Maps API that refers to an organized format could be applied to extract congestion data from the Google
of travel time between multiple origins and destinations for Maps traffic layer by adopting a relationship between pixel
many spatial analysis tasks [13]. Google Maps is considered positions of the display to geographical coordinates.
the most popular type of flexible transit service provider as it A comparison among the Google Maps Distance Matrix
does not have to rely on geographic information. Thus, Google API, OutScraper and our proposed tool is listed in Table I
Maps Web technologies establish a smooth scheduling system in terms of some key measurements. Unlike Google Maps
to retrieve precise data of road networks from Google servers Distance Matrix API, the tool does not have an API to access
[14]. This includes the development of modern WebGIS appli- data as it sends requests to Google Maps server from the front
cations, path planning, and induction of traffic congestion. Due end and performs data scraping once desired data is available
to an enriched programming API, Google’s Web map service in the document object model (DOM). A user has to figure
is one of the most widely used mapping services [15]. It also out and provide the address string separately in the Outscraper.
offers the functionality to customize and configure selected On the other hand, our tool provides an overlay on top of the
maps in any web browser through graphical visualization [16]. Google Maps web interface to let users select input parameters
Google Maps has been proved to provide high-quality real- directly from Google Maps and interact with the tool in real-
time traffic data through an experiment where data obtained time which is more convenient and user friendly. The tool
from an intelligent transportation system are compared with also serves the dataset in JSON, XML, CSV and XLXS
Google Maps traffic data in Hong Kong. The outcomes demon- format based on the user preference which is not available
strate that the evaluated journey time was consistent in most in the other options. The tool is not a standalone traffic data
routes throughout the entire day in both sources. The numer- provider. Rather it only automates the process of acquiring,
ical differences in terms of statistical p-value measurements pre-processing and formating publicly available data in Google
were also acceptable. Google Maps surpasses ITS in accessing maps. Any data that is not publicly accessible through the
real-time journey time data for location-based applications Google Maps web interface is beyond the scope of the tool.
due to the high deployment and maintenance cost of ITS
III. P ROPOSED M ETHODOLOGY
[3]. The researcher has also shown that big data retrieved
from Google Maps traffic API is a feasible data source to A. Data Extraction
conduct advanced research on a city road system as these Data interfacing, acquisition, and pre-processing are the
big data hold high spatial and temporal aspects of the traffic major steps of any data collection technique [17]. In our
situation. Although data provided by Google Maps may not 1 https://developers.google.com/maps/documentation/distance-matrix/
reflect the whole traffic congestion, these data with high spatial overview
resolution results in a very tiny deviation in finding traffic 2 https://www.openstreetmap.org/

congestion patterns [4]. In another experiment, the traffic data 3 https://outscraper.com/google-maps-traffic-extractor/

Authorized licensed use limited to: Institut Teknologi Bandung. Downloaded on July 21,2023 at 03:34:51 UTC from IEEE Xplore. Restrictions apply.
TABLE I
A COMPARISON OF G OOGLE M APS TRAFFIC DATA ACCESS TOOLS

Comparison Parameters Google Maps Distance Matrix API Outscraper Proposed Tool
1. Address String
2. Latitude/Longitude Coordinate 1. Real time location search
Required Input Parameters Origin 3. Place ID 1. Address String 2. Latitude/Longitude Coordinate
4. Plus Code 3. Select on Google Maps
5. Encoded Polylines
Destination Same format as for the origin Same format as for the origin Same format as for the origin
1. Metric 1. Metric
Distance Unit 1. Metric
2. Imperial 2. Imperial
Arrival time arrival time N/A Select ”Arrive by”
Departure Time departure time Select from UI Select ”Depart at”
1. best guess
Traffic Model 2. pessimist N/A N/A
3. optimist
Output File Format JSON or XML XLXS CSV
Parsing Needed Yes Yes Yes
Features Datetime Yes Yes Yes
Start Location Yes Yes Yes
End Location Yes Yes Yes
Start Latitude/Longitude Yes Yes Yes
End Latitude/Longitude Yes Yes Yes
Mid Latitude/longitude Yes Yes Yes
Maximum, Minimum and Yes (In terms of Best guess, Pessimist
Yes Yes
Average Duration and Optimist Traffic Model)
Distance Yes Yes Yes
Congestion Unit Yes Yes Yes
Time Zone Yes UTC No
Fare Yes No No
Open Source No No Yes

proposed methodology Data interfacing involves the selection in the system through performance evaluation. In the data
of input parameters and features. Data acquisition is processed validation phase, the system checks for data duplicacy for
by sending an HTTP request to Google Server and data multiple time steps. After data validation, the system extracts
retrieval from the document object model. The data pre- new features, formats data in multiple format and saves it.
processing part consists of data validation, feature extraction, Algorithm 1 represents the pseudo-code of the callback
data formation and data storage. Each one of these operations function. The function takes three input parameters n, r, t
is dependent on its predecessor operations and can not proceed where n is the number of roads, r is the road segment and t
until the previous operation is completed as shown in Figure is the starting time. The output dataset file is represented by
1. c. Here, 1 ms is assigned to the variable callbackInterval
as a hyperparameter. In section IV, we demonstrate further
analysis to find the optimal value of the callback interval.
B. User Interface
The tool provides an interactive interface for users to
provide input parameters and interact with the system.
1) The tool requires a user to select the starting point,
endpoint of a road segment, departure time and departure
date as input parameters.
2) The tool provides a user interface to specify the name
of the dataset file and the number of days and a list of
checkboxes where each checkbox represents a feature of
the dataset 2(a).
Fig. 1. Processing Pipeline 3) The tool validates input parameters and provides an error
message using an alert box.
An asynchronous callback function operates at the core 4) The tool shows the ongoing process and provides a
of the system that integrates the data acquisition and pre- success message once the data extraction process is
processing part. To extract observation of a time step, the completed 2(b).
function generates a DOM event to send a data request to
Google Server and waits until returned by the server and C. Data Description
become available in the DOM. There is an unequal time lag The tool provides 12 data features with different data
between the moment the click event is fired and the data being types, units and ranges as shown in Table II. Among these
available in DOM. To address this issue, the callback function features, Datetime, AvgDuration, and CongestionIndex
operates after a certain interval specified as a hyperparameter features are generated through feature extraction. Datetime

Authorized licensed use limited to: Institut Teknologi Bandung. Downloaded on July 21,2023 at 03:34:51 UTC from IEEE Xplore. Restrictions apply.
Algorithm 1: Traffic data extraction of a road segment
1 function extract (n, r, t) Input : Number of days, n
Road Segment, r
Time, t
Output: Dataset File, c
2 obs ← n * 4 * 24
3 previousM in ← N U LL
4 lapStartT ime ← CurrentT ime()
5 lapEndT ime ← N U LL
6 lapT ime ← N U LL
7 totalElapsedT ime ← N U LL
(a) Asking for input parameters for data extraction
8 totalIteration ← 0
9 obsCount ← 0
10 callbackInterval ← 1
11 while Interval(callbackInterval) is T rue do
12 totalIteration + +
13 if ContentLoads() then
14 currentM in ← GetM inute()
15 if previousM in != currentM in then
16 previousM in ← currentM in
17 if obsCount < obs then
18 Download(c)
19 ClearInterval()
20 end (b) Success message once data extraction is completed
21 c[obsCount] ← Extract(r, s) Fig. 2. User interfaces of the tool
22 c[obsCount].append(N ewLine)
23 lapEndT ime ← currentT ime()
24 lapT ime ← lapEndT ime -
lapStartT ime
25 totalElapsedT ime ← totalElapsedT ime
+ lapT ime
26 lapStartT ime ← lapEndT ime
27 obsCount + +
28 t ← t + 15
29 end
30 end
31 end
32 return c

feature consists of all the temporal features like Day, M onth,


Y ear, Hour, M inute, and M eridie. The AvgDuration Fig. 3. Time series data
is the average of M inDuration and M axDuration. The
CongestionIndex feature represents the average speed of a
15 minutes as provided by Google Maps.
road in terms of kilometres per hour and is extracted from
AvgDuration and Distance feature.
IV. P ERFORMANCE E VALUATION
Figure 3 shows the time series data of a road segment
starting from Sunnyside station, Toronto, ON to Queen St The performance of the tool mostly depends on how fast the
E and River St, Toronto, ON. The data was taken for two requested data is loaded in the DOM element of the Google
consecutive days, from 12:00 am on the first of October Maps web interface once the request is sent to the server.
to 11:45 pm on the second of October. The figure shows Regardless of how fast the data extraction process is, the tool
how the Congestion Index, which is the average speed in has to wait until the requested data is retrieved and available on
Kilometer/hour, changes over two days. A high average speed the Google Maps web interface. Nevertheless, the system can
indicates low congestion and a low average speed indicates be further optimized so that it can gather data as soon as it
high congestion. The difference between the two-time steps is is available without any significant computational overhead.

Authorized licensed use limited to: Institut Teknologi Bandung. Downloaded on July 21,2023 at 03:34:51 UTC from IEEE Xplore. Restrictions apply.
TABLE II
DATA D ESCRIPTION

pre- Feature-
SL Feature Description Unit Range Type
processing Extraction
1 Datetime The DateTime index of time series Any DateTime Yes Yes
The location of the starting point of a
2 StartLocation N/A N/A Text No No
road segment with Postal Code
The location of the ending point of a
3 EndLocation N/A N/A Text No No
road segment with Postal Code
The latitude of the starting point of a
4 StartLatitude DD (-)90 - (+)90 Numerical No No
road segment
The latitude of the ending point of a
5 EndLatitude DD (-)90 - (+)90 Numerical No No
road segment
The longitude of the starting point of a
6 StartLongitude DD (-)180 - (+)180 Numerical No No
road segment
The longitude of the ending point of a
7 EndLongitude DD (-)180 - (+)180 Numerical No No
road segment
The minimum duration it may take to
8 MinDuration Km / h 0-∞ Numerical Yes No
cross a road segment
The maximum duration it may take to
9 MaxDuration Km / h 0-∞ Numerical Yes No
cross a road segment
The average of minimum and maximum
10 AvgDuration Km / h 0-∞ Numerical Yes Yes
duration to cross a road segment
The distance of the road segment in
11 Distance Km 0-∞ Numerical Yes No
terms of kilometres
12 Congestion Index The traffic speed on a road segment Km / h 0-∞ Numerical Yes Yes

Finding the right value for the callbackInterval hyperpa- The callback function can collect a data observation in each
rameter plays a crucial role in performance optimization as iteration as the data loading time is the same as the interval of
it modifies two important factors of the tool: the number of the callback function (1). There is no overhead as the callback
iteration a callback function needs to perform to extract data function takes only one iteration to collect a data observation
of a time step and any delay in the data extraction process (2). It is not possible to predetermine the exact value of
once the data is available. lapT ime and set the value of callbackInterval accordingly to
Thorough sensitivity analysis of data extraction process with avoid any overhead. Because lapT ime is a random variable
respect to different hyper-parameters shows that tuning the that varies between observations and the callbackInterval
hyper-parameters may affect the performance of the system. remains constant as it is set as a hyperparameter at the
beginning of the iteration.
A. Mathematical Modelling 2) callbackInterval < lapT ime: If the value of
The number of iterations that the callback function may callbackInterval becomes less than lapT ime then the call-
go through to data observation of a time step depends on back function iterates more than once. The callback func-
the length of the callback interval callbackInterval and tion continues iteration after every callbackInterval until it
lapT ime. lapT ime is the time duration to load the requested reaches the point when the requested data is loaded. Although
data once the click event is created as shown in Algorithm lapT ime is greater than callbakInterval, it is not necessarily
1. There is no way to know the exact duration of lapT ime a multiple of callbackInterval. The number of iteration of
for a specific observation in advance as it depends on many the callback function is determined by ceiling the fraction of
external factors like internet speed, network congestion, web lapT ime and callbackInterval as shown in Eq. (1). Eq. (2)
browser version as well as the specification of the machine on shows the general formula to calculate the overhead to collect
which the callback is executed. All these external factors are a data observation.
beyond the scope of the development of the tool. The lapT ime If the callbackInterval value is smaller than the value
values are ordinal as could be different for two different sets of of lapT ime, the overhead is lower and vice versa as per
observations. We can set the value of the only hyperparameter Eq. (4). Since the lapT ime values show high variance in
callbackInterval in such a way so that the tool can generate different observations, we need to keep callbackInterval as
an optimal performance for any value of lapT ime. small as possible so that the callback function can collect a
The callbackInterval can be equal, smaller or greater data observation as soon as it is loaded. Smaller values of
than the lapT ime. In the following subsections, we discuss callbackInterval results in an increasing number of iterations
the possible outcome in terms of iteration and overhead (3).
for different values of callbackIntervals in comparison with 3) callbackInterval > lapT ime: If the value of
lapT ime. callbackInterval is greater than lapT ime, the callback func-
1) callbackInterval = lapT ime: This is the ideal sce- tion iterate only once to collect a data observation (1). A
nario where the callbackInterval is equal to the lapT ime. data observation is already be loaded by the time the call-

Authorized licensed use limited to: Institut Teknologi Bandung. Downloaded on July 21,2023 at 03:34:51 UTC from IEEE Xplore. Restrictions apply.
back function completes an iteration and generates overhead the 192 observations and different callbackInterval values,
as shown in Eq. (2). The higher the difference between the cumulative number of iterations and elapsed time (in ms)
callbackInterval and lapT ime, the bigger the overhead and is calculated and then compared between these for different
vice versa, see Eq. (4). callbackInterval.
Eq. (5) can be derived from Eq. (3) and (4) which shows that Our experiments and results support the desired outcome
iteration is inversely proportional to overhead. Therefore, as hypothesized in the previous subsection. The experiment
an increment in iteration of a callback function to collect shows that the tool runs in a linear time with respect to the
a data observation always leads to a decrement in timing number of data observations and can be optimized by changing
overhead and vice versa. This conclusion emphasizes the the value of callbackInterval to the optimal point to avoid
trade-off between time complexity (here in terms of time any computational overhead.
overhead) and computational complexity (here in terms of
the number of iteration of the callback function). In the next C. Results
subsections, we illustrate the experimental setup and results to 1) Iteration with Different Callback Intervals: Figure 4
support these hypothesizes. shows the number of iterations cumulatively added with ob-
servations for different values of callbackInterval. The lower
1, if callbackInterval ≥ lapT ime the value of callbackInterval is, the steeper the line is as the


callback function went through more iterations to collect a
  
lapT ime

iteration = Ceiling , (1)

 callbackInterval data observation. On the other hand, the higher the value of
callbackInterval, the flatter the line, as the callback function

if callbackInterval < lapT ime
went through fewer iterations to collect a data observation.
0, if callbackInterval = lapT ime There are a few important aspects to notice in the graph.






 (callbackInterval ∗ iteration) − lapT ime, Changes of callbackInterval from 1 ms to 10 ms results
overhead = if callbackInterval < lapT ime (2) in a higher change of slope in between the corresponding



callbackInterval − lapT ime, lines than changes of callbackInterval from 500 ms to
1000 ms. The change of slope for lines corresponding to


if callbackInterval > lapT ime
different callbackInterval is not the same across different

lapT ime − callbackInterval, values of callbackInterval. Lines corresponding to lower
callbackInterval are comparatively more scattered than the




 if callbackInterval < lapT ime
iteration ∝ 1 (3) lines corresponding to higher callbackInterval. The higher
 , the value of callbackInterval, the lower the change in the


 callbackInterval − lapT ime

if callbackInterval > lapT ime cumulative number of iterations in between consecutive lines
and the lower the value of callbackInterval, the higher the
 1 change in between consecutive lines. The number of total itera-
 ,



 lapT ime − callbackInterval tions decreases by a significant amount after a certain threshold
overhead ∝ if callbackInterval < lapT ime (4) even if the value of callbackInterval keeps increasing.



callbackInterval − lapT ime, Sometimes a line may have a sudden rise in terms of

if callbackInterval > lapT ime the number of iterations within a very short number of
observations. For instance, the blue line corresponding to
1
overhead ∝ (5) the callbackInterval of 1 ms has one sudden spike around
iteration observation 135. While collecting data of the 135th obser-
B. Experimental Setup vation, the callback function was stuck either because data
We run our experiment in a machine with Intel(R) Core(TM) was not loading or there was a duplication in the data.
i3-4005U CPU that has 1.70 GHz and 8 GB DDR3 Random Therefore the callback function had to iterate more times as
Access Memory. The machine runs a 64-Bit Windows 10 the callbackInterval for this line is only 1 ms that results in
operating system with the latest version of google chrome a sudden rise in the corresponding line. Lines corresponding
installed at the time of the experiment. The road segment to comparatively lower callbackInterval also expose similar
starts from the junction of Conlin Road East and Simcoe characteristics. The sudden rise is not as high as compared to
Street North (Located at Oshawa, Ontario, Canada, L1H 7K4) the lines corresponding callbackInterval of smaller values. It
towards 1352-1340 Durham Regional Rd 2, Oshawa, ON. is not possible to determine how many sudden rises a line may
We collect time-series data of that road segment for two have in advance. There could be none, one or more sudden
consecutive days from 12:00 am on the 1st of October to 11:45 rises in any line representing any value of callbackInterval.
pm on the 2nd of October 2020. The final dataset contains a 2) Time Complexity with Different Values of
total of 192 observations for two consecutive days as each day callbackInterval: The same experiment is repeated to
contains 96 observations. We conduct the experiments 9 times measure the total elapsed time with a different number of
with 9 different callback intervals between 1 ms and 1000 ms. observations. The overhead time of the observations is
Rather than comparing iteration and overhead for each of cumulatively added and plotted the total elapsed time with

Authorized licensed use limited to: Institut Teknologi Bandung. Downloaded on July 21,2023 at 03:34:51 UTC from IEEE Xplore. Restrictions apply.
Fig. 4. Number of iteration with different values of callbackInterval Fig. 5. Time Complexity for Different Callback Intervals (Milliseconds)

different observations as shown in Figure 5. The higher the


value of callbackInterval, the steeper the line, as more
overhead timing are added on top of callbackInterval to
collect a data observation. On the other hand, the lower the
value of callbackInterval, the flatter the line. This behaviour
shows that the tool can run in a linear time complexity with
respect to the number of observations.
Figure 5 also shows similar kinds of characteristics as
Figure 4. Changes of callbackInterval from 1000 ms to
500 ms results in a lower change of slope in between the
corresponding lines than changes of callbackInterval from
10 ms to 1 ms. The slope changes for the lines corresponding
to different callbackInterval are not the same across different
values of callbackInterval. The lines corresponding to higher
callbackInterval are comparatively more scattered than the
lines corresponding to lower callbackInterval. The higher Fig. 6. Trade off between Number of Iteration and Time Complexity
the value of callbackInterval, the lower the change is in the (Milliseconds)
cumulative elapsed time in between consecutive lines and vice
versa. Total elapsed time decreases by a significant amount
after a certain threshold even if we keep decreasing the value the highest number of cumulative iteration with the lowest
of callbackInterval. elapsed time. On the other hand, the data point representing
3) Trade off between the number of Iteration and Elapsed callbackInterval of 1000 ms provides the lowest number
Time: The objective of our analysis is to determine an of cumulative iteration with the highest elapsed time. The
equilibrium point where the value of the hyperparameter difference between the callbackInterval of 1000 ms and 100
callbackInterval results in the desired output which is less ms is comparatively lower with respect to the total number of
computational and less time. An increase in callbackInterval cumulative iterations than it is with respect to the total elapsed
decreases the total number of iterations of the callback time. Moreover, the difference between callbackInterval of
function, but increases the total elapsed time, thus delaying 100 ms and 1 ms is comparatively higher for the total number
the overall data collection process. Similarly, a decrease in of cumulative iterations than it is for the total elapsed time.
callbackInterval also decreases the total elapsed time and As a result, we can consider callbackInterval of 100 ms as
increases the total number of iterations of the callback func- an optimized equilibrium point for this dataset.
tion. Therefore, there is a trade-off between the cumulative
number of iterations and the total elapsed time. V. L IMITATIONS AND C HALLENGES
In Figure 6, we plot the total elapsed time vs. the total num- The tool does not have any access to the Google Maps
ber of iterations for different values of callbackInterval. The Distance Matrix API. It solely depends on how fast and consis-
data point representing callbackInterval of 1 ms provides tent the requested data of a time step is loaded on the Google

Authorized licensed use limited to: Institut Teknologi Bandung. Downloaded on July 21,2023 at 03:34:51 UTC from IEEE Xplore. Restrictions apply.
Maps web interface accessed through a web browser. Although Maps user-side web interface. We believe that this open-source
the working procedure of the tool is technically correct, the tool accelerates research using Google Maps traffic data and
feasibility of the tool needs to be addressed. Excessive use of encourages the research community to promote open data
the tool results in a high volume of frequent HTTP requests sharing for traffic-related research and developments.
from the same IP address that may raise security concerns.
Frequent click events generated on the Google Maps web R EFERENCES
interface may provoke some ISP to block the process assuming [1] A. Misra, A. Gooze, K. Watkins, M. Asad, and C. A. Le Dantec,
that it could be malicious activity. Limited usage of the tool “Crowdsourcing and Its Application to Transportation Data Collection
and Management,” in Transportation Research Record, vol. 2414, no. 1,
from the same IP address might help but this is not a feasible pp. 1–8, Jan. 2014.
solution as it largely depends on the internet service provider [2] Shahram Tahmasseby, “Traffic Data: Bluetooth Sensors vs. Crowd-
and how Google tackles a high volume of data requests via sourcing—A Comparative Study to Calculate Travel Time Reliability
in Calgary, Alberta, Canada,” in Journal of Traffic and Transportation
the Google Maps web interface. Engineering, vol. 3, no. 2, Feb. 2015.
Comparing the performance of the proposed tool with [3] Z. He, C.-Y. Chow, and J.-D. Zhang, “A Comparative Analysis of Jour-
existing APIs and Tools would require further investigation ney Time from Google Maps and Intelligent Transport System in Hong
Kong,” in IEEE 21st International Conference on High Performance
for different types of queries in order to show the real value Computing and Communications, Zhangjiajie, China, Aug. 2019, pp.
of the proposed tool. It is very difficult to develop insightful 2610–2617.
quality, or performance metrics of the proposed tool as the [4] P. Baji, “Using Google Maps road traffic estimations to unfold spatial
and temporal inequalities of urban road congestion: A pilot study from
performance of the tool is largely based on Google’s data and Budapest,” in Hungarian Geographical Bulletin, vol. 67, no. 1, pp.
infrastructure quality. 61–74, Mar. 2018.
The premise of this work is not to get access to Google [5] H. Rezzouqi, I. Gryech, N. Sbihi, M. Ghogho, and H. Benbrahim, “Ana-
lyzing the Accuracy of Historical Average for Urban Traffic Forecasting
Maps data for free through the Google Maps web interface. Using Google Maps,” in Intelligent Systems and Applications, vol. 868,
Our intention is not to encourage the use of this tool to get K. Arai, S. Kapoor, and R. Bhatia, Eds. Cham: Springer International
circumvent paying for data which would be an ethically and Publishing, 2019, pp. 1145–1156.
[6] H. Li and L. Zhijian, ”The study and implementation of mobile GPS
possibly legally questionable stance. Using these techniques navigation system based on Google Maps,” International Conference on
to circumvent a pay-wall is almost certainly a violation of Computer and Information Application, Tianjin, 2010, pp. 87-90.
Google’s terms of service. To get access to this data for [7] M. Haklay, “How Good is Volunteered Geographical Information? A
Comparative Study of OpenStreetMap and Ordnance Survey Datasets,”
free through the Google Maps website against the wishes in Environment and Planning B: Planning and Design, vol. 37, no. 4,
of the data owner is ethically questionable. Traffic data with pp. 682–703, Aug. 2010.
limited feature is already publicly available on Google Maps [8] L. J. Caiza, R. Alvarez, L. Urquiza-Aguiar, X. Calderón-Hinojosa, and
A. Zambrano, “VTM: Vehicular Traffic Monitor via Images Processing
website which is open to all users for free. We can get of Google Maps,” in Proceedings of the 15th ACM International
access to this information without having violating Google’s Symposium on Performance Evaluation of Wireless Ad Hoc, Sensor, &
terms of service. The tool would help a user to facilitate and Ubiquitous Networks - PE-WASUN, Montreal, QC, Canada, 2018, pp.
40–46.
automate the process of accessing this publicly available traffic [9] H. Williams and D. Crawford, “Google’s Earth. Google Maps the
data through web scraping and reverse-engineering which is Future of Traffic and Travel Information?,” in ITS International,
certainly legal. vol. 18, no. 1, 2012, Accessed: Jan. 09, 2021. [Online]. Available:
https://trid.trb.org/view/1136681.
VI. C ONCLUSION [10] P. Tafidis et al., “Can Google Maps Popular Times Be an Alterna-
tive Source of Information to Estimate Traffic-Related Impacts?,” in
Google Maps collects real-time traffic data with high preci- Transportation Research Board 97th Annual Meeting Transportation
sion through GPS location-based crowdsourcing. The Google Research Board, 2018, Accessed: Jan. 09, 2021. [Online]. Available:
https://trid.trb.org/view/1494705.
Maps Distance Matrix API provides access to this traffic [11] S. Mishra, D. Bhattacharya, and A. Gupta, “Congestion Adaptive Traffic
data with paid subscriptions. Other map services and third- Light Control and Notification Architecture Using Google Maps APIs,”
party online tools either provide less accurate traffic data, in Data, vol. 3, no. 4, p. 67, Dec. 2018.
[12] H. Wu, “Comparing Google Maps and Uber Movement Travel Time
limited features or are proprietary. To accelerate the research Data,” in Transport Findings, Nov. 2018.
on the traffic domain, we have developed a lightweight tool [13] F. Wang and Y. Xu, “Estimating O–D travel time matrix by Google
to automate the process of formatting and extracting publicly Maps API: implementation, advantages, and implications,” in Annals of
GIS, vol. 17, no. 4, pp. 199–209, Dec. 2011.
available traffic data from the Google Maps web interface by [14] F. Qiu, W. Li, and C. An, “A Google Maps-Based Flex-Route Transit
leveraging web scrapping. Our developed tool is lightweight, Scheduling System,” in COTA International Conference of Transporta-
fast, and open source. Performance evaluation of our tool tion Professionals, Changsha, China, Jun. 2014, pp. 247–257.
[15] T. Dimitrov Berov, “A Vehicle Routing Planning System for Goods Dis-
shows that it is highly accurate and efficient and can be used tribution in Urban Areas Using Google Maps and Genetic Algorithm,”
on any latest version of a web browser. However, development in International Journal for Traffic and Transport Engineering, vol. 6,
and performance analysis of such a tool involves some key no. 2, pp. 159–167, Jun. 2016.
[16] P. Pokorný, ”Determining Traffic Levels in Cities Using Google Maps,”
limitations and challenges as it is based on web scraping in Fourth International Conference on Mathematics and Computers in
and largely depends on the Google Map web service. Our Sciences and in Industry (MCSI), Corfu, 2017, pp. 144-147.
proposed approach does not violate any data copyrights, [17] T. Bellemans, B. De Schutter, and B. De Moor, “Data Acquisition,
Interfacing and Pre-Processing of Highway Traffic Data,” in Telematics
policies or regulations of Google Maps as it only automates the Automotive, Birmingham, United Kingdom, 2000, Accessed: Jan. 09,
accumulation of publicly available traffic data from the Google 2021. [Online]. Available: https://trid.trb.org/view/669866.

Authorized licensed use limited to: Institut Teknologi Bandung. Downloaded on July 21,2023 at 03:34:51 UTC from IEEE Xplore. Restrictions apply.

You might also like