A Dynamic Neural Network Model For Predicting Risk of Zika in Real Time
A Dynamic Neural Network Model For Predicting Risk of Zika in Real Time
Abstract
Background: In 2015, the Zika virus spread from Brazil throughout the Americas, posing an unprecedented
challenge to the public health community. During the epidemic, international public health officials lacked
reliable predictions of the outbreak’s expected geographic scale and prevalence of cases, and were therefore
unable to plan and allocate surveillance resources in a timely and effective manner.
Methods: In this work, we present a dynamic neural network model to predict the geographic spread of
outbreaks in real time. The modeling framework is flexible in three main dimensions (i) selection of the
chosen risk indicator, i.e., case counts or incidence rate; (ii) risk classification scheme, which defines the high-
risk group based on a relative or absolute threshold; and (iii) prediction forecast window (1 up to 12 weeks).
The proposed model can be applied dynamically throughout the course of an outbreak to identify the
regions expected to be at greatest risk in the future.
Results: The model is applied to the recent Zika epidemic in the Americas at a weekly temporal resolution
and country spatial resolution, using epidemiological data, passenger air travel volumes, and vector habitat
suitability, socioeconomic, and population data for all affected countries and territories in the Americas. The
model performance is quantitatively evaluated based on the predictive accuracy of the model. We show that
the model can accurately predict the geographic expansion of Zika in the Americas with the overall average
accuracy remaining above 85% even for prediction windows of up to 12 weeks.
Conclusions: Sensitivity analysis illustrated the model performance to be robust across a range of features.
Critically, the model performed consistently well at various stages throughout the course of the outbreak,
indicating its potential value at any time during an epidemic. The predictive capability was superior for
shorter forecast windows and geographically isolated locations that are predominantly connected via air travel. The
highly flexible nature of the proposed modeling framework enables policy makers to develop and plan vector control
programs and case surveillance strategies which can be tailored to a range of objectives and resource constraints.
Keywords: Zika, Epidemic risk prediction, Dynamic neural network
© The Author(s). 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to
the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver
(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Akhtar et al. BMC Medicine (2019) 17:171 Page 2 of 16
In order to optimally allocate resources to suppress modeling framework is flexible in three main dimen-
vector populations, it is critical to accurately antici- sions: (1) the preferred risk indicator can be chosen
pate the occurrence and arrival time of arboviral in- by the policy maker, e.g., we consider outbreak size
fections to detect local transmission [15]. Whereas for and incidence rate as two primary indicators of risk
dengue, the most common arbovirus infection, predic- for a region; (2) five risk classification schemes are
tion has attracted wide attention from researchers defined, where each classification scheme varies in
employing statistical modeling and machine learning the (relative or absolute) threshold used to deter-
methods to guide vector control [16–21], global scale mine the set of countries deemed “high risk;” and
real-time machine learning-based models do not yet (3) it can be applied for a range of forecast windows
exist for Zika virus [22–29]. Specifically for dengue, (1–12 weeks). Model performance and robustness are
early warning systems for Thailand, Indonesia, evaluated for various combinations of risk indicator,
Ecuador, and Pakistan have been introduced and are risk classification level, and forecasting windows.
currently in use [30–34]. Further, in addition to Thus, our work represents the first flexible framework
conventional predictions based on epidemiological of neural networks for epidemic risk forecasting that
and meteorological data [20, 35, 36], more recent allows policy makers to evaluate and weigh the trade-
models have successfully incorporated search engines off in prediction accuracy between forecast window
[37, 38], land use [39], human mobility information and risk classification schemes. Given the availability
[40, 41], spatial dynamics [42–44], and various combi- of the necessary data, the modeling framework pro-
nations of the above [45] to improve predictions. posed here can be applied in real time to future out-
Whereas local spread may be mediated by overland breaks of Zika and other similar vector-borne
travel, continent widespread is mostly driven by air outbreaks.
passenger travel between climatically synchronous re-
gions [8, 46–51]. Materials and methods
The aims of our work are to (1) present recurrent Data
neural networks for the time ahead predictive modeling The model relies on socioeconomic, population, epi-
as a highly flexible tool for outbreak prediction and (2) demiological, travel, and mosquito vector suitability data.
implement and evaluate the model performance for the All data is aggregated to the country level and provided
Zika epidemic in the Americas. The application of for all countries and territories in the Americas at a weekly
neural networks for epidemic risk forecasting has previ- temporal resolution. Each data set and corresponding pro-
ously been applied to dengue forecasting and risk clas- cessing is described in detail below and summarized in
sification [52–57], detection of mosquito presence [58], Table 1. All input data is available as Additional files 1, 2,
temporal modeling of the oviposition of Aedes aegypti 3, 4, 5, 6, 7, 8, 9, 10, and 11.
mosquito [59], Aedes larva identification [60], and epi-
demiologic time-series modeling through fusion of Epidemiological data
neural networks, fuzzy systems, and genetic algorithms Weekly Zika infected cases for each country and
[61]. Recently, Jian et al. [62] performed a comparison territory in the Americas were extracted from the Pan
of different machine learning models to map the prob- American Health Organization (PAHO) [63], as
ability of Zika epidemic outbreak using publically avail- described in previous studies [48, 50] (data available:
able global Zika case data and other known covariates github.com/andersen-lab/Zika-cases-PAHO). The epi-
of transmission risk. Their study provides valuable demiological weeks 1–78 are labeled herein as EPI
insight into the potential role of machine learning weeks, corresponding to the dates 29 Jun 2015 to 19
models for understanding Zika transmission; however, Dec 2016, respectively. Although Zika cases in Brazil
it is static in nature, i.e., it does not account for time- were reported as early as May 2015, no case data is
series data and did not account for human mobility, available for all of 2015 from PAHO because the
both of which are incorporated in our modeling Brazil Ministry of Health did not declare the Zika
framework. cases and associated neurological and congenital syn-
Here, we apply a dynamic neural network model for drome as notifiable conditions until 17 Feb 2016 [63].
N-week ahead prediction for the 2015–2016 Zika epi- The missing numbers of cases from July to December
demic in the Americas. The model implemented in this 2015 for Brazil were estimated based on the positive
work relies on multi-dimensional time-series data at correlation between Ae. aegypti abundance (described
the country (or territory) level, specifically epidemio- below) and reported case counts as has been done
logical data, passenger air travel volumes, vector habi- previously [8, 50]. We used smoothing spline [71] to
tat suitability for the primary spreading vector Ae. estimate weekly case counts from the monthly
aegypti, and socioeconomic and population data. The reported counts. The weekly country-level case counts
Akhtar et al. BMC Medicine (2019) 17:171 Page 3 of 16
(Fig. 1a) were divided by the total population/100,000, previously used in [50, 72]. The data includes origin,
as previously described [50], to compute weekly inci- destination, and stopover airport paths for 84% of
dence rates (Fig. 1b). global air traffic and includes over 240 airlines and
3400 airports. The airport-level travel was aggregated
Travel data to a regional level, to compute monthly movements
Calibrated monthly passenger travel volumes for each between all countries and territories in the Americas.
airport-to-airport route in the world were provided by The incoming and outgoing travel volumes for each
the International Air Transport Associate [64], as country and territory, originally available from IATA
Fig. 1 Weekly distribution of case and connectivity-risk variables. a Zika cases, b incidence rates, c case-weighted travel risk CRtj , and d incidence-
weighted travel risk IRtj , for top 10 ranked countries and territories in the Americas for each respective variable
Akhtar et al. BMC Medicine (2019) 17:171 Page 4 of 16
at a monthly temporal resolution, were curve fitted, capture the dynamics and heterogeneity of the air-
again using smoothing spline method [71] to obtain traffic network in combination with real-time
corresponding weekly volumes to match with the outbreak status. Two variables are chosen, hereafter
temporal resolution of our model. In this study, travel referred to as case-weighted travel risk and inci-
data from 2015 were also used for 2016, as was done dence-weighted travel risk, as defined in Eqs. (1.a)
previously [50, 72, 73]. and (1.b), respectively.
X
Mosquito suitability data CRtj ¼ ðC ti :V ti; j Þ∀t; ∀ j; i≠j ð1:aÞ
i
The monthly vector suitability data sets were based
on habitat suitability for the principal Zika virus spe- X
IRtj ¼ ðI ti :V ti; j Þ∀t; ∀ j; i≠ j ð1:bÞ
cies Ae. aegypti, previously used in [50], and initially i
estimated using original high-resolution maps [65]
and then enriched to account for seasonal variation For each region j at time t, CRtj and IRtj are computed
in the geographical distribution of Ae. aegypti by as the sum of product between passenger volume travel-
using time-varying covariate such as temperature per- ing from origin i into destination j at time t (V ti; j ) and the
sistence, relative humidity, and precipitation as well state of the outbreak at origin i at time t, namely reported
as static covariates such as urban versus rural areas. cases, C ti ; or reported incidence rate, I ti . Each of these
The monthly data was translated into weekly data
two variables is computed for all 53 countries or territor-
using a smoothing spline [71].
ies for each of the 78 epidemiological weeks. The two
dynamic variables, CRtj and IRtj , are illustrated in Fig. 1c
Socioeconomic and human population data and d, below the raw case counts and incidence rates,
For a country, to prevent or manage an outbreak de- respectively.
pends on their ability to implement a successful sur-
veillance and vector control programs [74]. Due to a Neural network model
lack of global data to quantify vector control at a The proposed prediction problem is highly nonlinear
country level, we utilized alternative economic and and complex; thus, a class of neural architectures based
health-related country indicators which have previ- upon Nonlinear AutoRegressive models with eXogenous
ously been revealed to be critical risk factors for Zika inputs (NARX) known as NARX neural networks [77–
spread [50]. A country’s economic development can 79] is employed herein due to its suitability for modeling
be measured by the gross domestic product (GDP) per of a range of nonlinear systems [80]. The NARX net-
capita at purchasing power parity (PPP), in inter- works, as compared to other recurrent neural network
national dollars. The figures from the World Bank architectures, require limited feedback (i.e., feedback
[67] and the US Bureau of Economic Analysis [68] from the output neuron rather than from hidden
were used to collect GDP data for each country. The states) and converge much faster with a better
number of physicians and the number of hospital generalization [80, 81]. The NARX framework was
beds per 10,000 people were used to indicate the selected over simpler linear regression frameworks due
availability of health infrastructure in each country. to both the size and complexity of the set of input vari-
These figures for the USA and other regions in the ables and the demand for a nonlinear function
Americas were obtained from the Centre of Disease approximation. Specifically, in addition to the epi-
Control and Prevention (CDC) [69], WHO World demiological, environmental, and sociodemographic
Health Statistics report [75], and the PAHO [76]. Fi- variables, there are hundreds of travel-related variables
nally, the human population densities (people per sq. which may contribute to the risk prediction for each
km of land area) for each region were collected from region. The NARX model can be formalized as follows
the World Bank [70] and the US Bureau of Economic [80]:
Analysis [68].
yðt Þ ¼ f xðt Þ; xðt−1Þ; …; xðt−d x Þ; yðt−1Þ; …; y t−d y
Connectivity-risk variables ð2Þ
In addition to the raw input variables, novel
connectivity-risk variables are defined and computed where x(t) and y(t) denote, respectively, the input
for inclusion in the model. These variables are and output (or target that should be predicted) of
intended to capture the risk posed by potentially the model at discrete time t, while dx and dy (with
infected travelers arriving at a given destination at a dx ≥ 1, dy ≥ 1, and dx ≤ dy ) are input and output de-
given point in time and, in doing so, explicitly lays called memory orders (Fig. 2). In this work, a
Akhtar et al. BMC Medicine (2019) 17:171 Page 5 of 16
Fig. 2 Schematic of NARX network with dx input and dy output delays: Each neuron produces a single output based on several real-valued
inputs to that neuron by forming a linear combination using its input weights and sometimes passing the output through a nonlinear activation
P
function: z¼φð ni¼1 wi ui þbÞ¼φðwT xþbÞ, where w denotes the vector of weights, u is the vector of inputs, b is the bias, and φ is a linear or
nonlinear activation function (e.g., linear, sigmoid, and hyperbolic tangent [82])
NARX model is implemented to provide N-step In the context of this work, the desired output, yk(t + N),
ahead prediction of a time series, as defined below: is a binary risk classifier, i.e., classifying a region k as high
or low risk at time t + N, for each region, k, N weeks ahead
(of t). The vector of input variables for region m at time t
is xm(t) and includes both static and dynamic variables.
yk ðt þ NÞ ¼ f ð x1 ðtÞ; x1 ðt−1Þ; …; x1 ðt−d x Þ; …; xM ðtÞ; xM ðt−1Þ; …;
We consider various relative (R) and absolute (A) thresh-
xM ðt−d x Þ; y k ðtÞ; y k ðt−1Þ; …; y k ðt−d y ÞÞ
olds to define the set of “high-risk” countries at any point
ð3Þ in time. We define relative risk thresholds that range
uniformly between 10 and 50%, where the 10% scheme
Here, yk(t + N) is the risk classification predicted for classifies the 10% of countries reporting the highest
the kth region N weeks ahead (of present time t), which number of cases (or highest incidence rate) during a given
is estimated as a function of xm(t) inputs from all m = 1, week as high risk, and the other 90% as low risk, similar to
2, …, M regions for dx previous weeks, and the previous [45]. The relative risk schemes are referred herein as R =
risk classification state, yk(t) for region k for dy previous 0.1, R = 0.2, R = 0.3, R = 0.4, and R = 0.5. It is worth noting,
weeks. The prediction model is applied at time t, to for a given percentile, e.g., R = 0.1, the relative risk thresh-
predict for time t + N, and therefore relies on data avail- olds are dynamic and vary week to week as a function of
able up until week t. That is, to predict outbreak risk for the scale of the epidemic, while the size of the high-risk
epidemiological week X, N-weeks ahead, the model is group remains fixed over time, e.g., 10% of all countries.
trained and tested using data available up until week (X– We also consider absolute thresholds, which rely on case
N). For example, 12-week ahead prediction for Epi week incidence rates to define the high-risk group. Five absolute
40 is performed using data available up to week 28. The thresholds are selected based on the distribution of inci-
function f(∙) is an unknown nonlinear mapping function dence values over all countries and the entire epidemic.
that is approximated by a multilayer perceptron (MLP) Specifically, the 50th, 60th, 70th, 80th, and 90th percen-
to form the NARX recurrent neural network [78, 79]. In tiles were chosen and are referred herein as A = 50, A =
this work, series-parallel NARX neural network architec- 60, A = 70, A = 80, and A = 90. These five thresholds cor-
ture is implemented in Matlab R2018a (The Math- respond to weekly case incidence rates of 0.43, 1.47, 4.05,
Works, Inc., Natick, MA, USA) [57]. 9.5, and 32.35 (see Additional file 12: Figure S1),
Akhtar et al. BMC Medicine (2019) 17:171 Page 6 of 16
respectively. In contrast to the relative risk scheme, under To train and test the model, the actual risk classifi-
the absolute risk scheme for a given percentile, e.g., A = cation for each region at each week during the epi-
90, the threshold remains fixed but the size of the high demic, yk(t), was used. For each model run, e.g., a
(and low)-risk group varies week to week based on the specified risk indicator, risk classification scheme, and
scale of the epidemic. The fluctuation in group size for forecasting window, the input and target vectors are
each threshold is illustrated in Additional file 12: Figure randomly divided into three sets:
S1 for each classification scheme, A = 50 to A = 90.
Critically, our prediction approach differs from [45], in 1. Seventy percent for training, to tune model
that our model is trained to predict the risk level directly, parameters minimizing the mean square error
rather than predict the number of cases, which are post- between the outputs and targets
processed into risk categories. The performance of the 2. Fifteen percent for validation, to measure network
model is evaluated by comparing the estimated risk level generalization and to prevent overfitting, by halting
(high or low) to the actual risk level for all locations at a training when generalization stops improving (i.e.,
specified time. The actual risk level is simply defined at mean square error of validation samples starts
each time period t during the outbreak by ranking the increasing)
regions based on the number of reported case counts (or 3. Fifteen percent for testing, to provide an
incidence rates) and grouping them into high- and low- independent measure of network performance
risk groups according to the specified threshold and classi- during and after training
fication scheme.
The static variables used in the model include GDP The performance of the model is measured using two
PPP, population density, number of physicians, and the metrics: (1) prediction accuracy (ACC) and (2) receiver
number of hospital beds for each region. The dynamic operating characteristic (ROC) curves. Prediction accur-
variables include mosquito vector suitability, outbreak acy is defined as ACC = (TP + TN)/(TP + FP + TN + FN),
status (both reported case counts and reported incidence where true positive (TP) is the number of high-risk loca-
rates), total incoming travel volume, total outgoing travel tions correctly predicted as high risk, false negative (FN)
volume, and the two connectivity-risk variables defined is the number of high-risk locations incorrectly pre-
as in Eqs. (1.a) and (1.b), again for each region. Before dicted as low risk, true negative (TN) is the number of
applying to the NARX model, all data values are normal- low-risk locations correctly predicted as low risk, and
ized to the range [0, 1]. false positive (FP) is the number of low-risk locations in-
A major contribution of this work is the flexible nature correctly predicted as high risk. The second performance
of the model, which allows policy makers to be more or metric, ROC curve [83], explores the effects on TP and
less risk-averse in their planning and decision making. FP as the position of an arbitrary decision threshold is
Firstly, the risk indicator can be chosen by the modeler; varied, which in the context of this prediction problem
in this work, we consider two regional risk indicators, (i) distinguished low- and high-risk locations. ROC curve
the number of reported cases and (ii) incidence rate. can be characterized as a single number using the area
Second, we consider a range of risk classification under the ROC curve (AUC), with larger areas having
schemes, which define the set of high-risk countries an AUC that approaches one indicating a more accurate
based on either a relative or absolute threshold that can detection method. In addition to quantifying model
be chosen at the discretion of the modeler, i.e., R = 0.1, performance using these two metrics, we evaluate the
0.2, 0.3, 0.4, 0.5, and A = 90, 80, 70, 60, 50. Third, the robustness of the predictions by comparing the ACC
forecast window, N, is defined to range from N = 1, 2, 4, across multiple runs that vary in their selection of
8 to 12 weeks. Subsequently, any combination of risk in- testing and training sets (resulting from the randomized
dicator, risk classification scheme, and forecasting win- sampling).
dow can be modeled.
In initial settings of the series-parallel NARX neural
network, multiple numbers of hidden layer neurons Results
and numbers of tapped delay lines (Eq. [2]) were The model outcome reveals the set of locations expected
explored for training and testing of the model. Sensi- to be at high risk at a specified date in the future, i.e., N
tivity analysis revealed a minimal difference in the weeks ahead of when the prediction is made. We apply
performance of the model under different settings. the model for all epidemiological weeks throughout the
Therefore, for all experiments presented in this work, epidemic and evaluate performance under each combin-
the numbers of neural network hidden layer neurons ation of (i) risk indicator, (ii) classification scheme, and
and tapped delay lines are kept constant as two and (iii) forecast window. For each model run, both ACC
four, respectively. and ROC AUC are computed.
Akhtar et al. BMC Medicine (2019) 17:171 Page 7 of 16
Fig. 3 Country prediction accuracy by relative risk level. Panel a illustrates the actual relative risk level assigned to each country at Epi week 40 for
a fixed forecast window, N = 4. Panels b–e each correspond to a different classification scheme, specifically b R = 0.1, c R = 0.2, d R = 0.3, e R = 0.4,
and f R = 0.5. The inset shown by the small rectangle highlights the actual and predicted risk in the Caribbean islands. For panels b–e, green
indicates a correctly predicted low-risk country, light gray indicates an incorrectly predicted high-risk country, and dark gray indicates an
incorrectly predicted low-risk country. The risk indicator used is case counts
Model performance or not it was correct. For panels (b)–(e), green indicates
Figures 3 and 4 exemplify the output of the proposed a correctly predicted low-risk country (TN), light gray
model. Figure 3 illustrates the model predictions at a indicates an incorrectly predicted high-risk country (FP),
country level for a 4-week prediction window, specific- dark gray indicates an incorrectly predicted low-risk
ally for Epi week 40, i.e., using data available up until country (FN), and the remaining color indicates a
week 36. Figure 3a illustrates the actual risk percentile correctly predicted high-risk country (TP). The inset
each country is assigned to in week 40, based on re- highlights the results for the Caribbean islands. The
ported case counts. The results presented in the figure also presents the average ACC over all regions
remaining panels of Fig. 3 reveal the risk level (high or and ACC for just the Caribbean region (grouped similar
low) predicted for each country under the five relative to [10]) for each classification scheme.
risk classification schemes, namely (b) R = 0.1, (c) R = Figure 4 illustrates the model predictions at a coun-
0.2, (d) R = 0.3, (e) R = 0.4, and (f) R = 0.5, and whether try level for varying prediction windows, and a fixed
Akhtar et al. BMC Medicine (2019) 17:171 Page 8 of 16
Fig. 4 Country prediction accuracy by forecast window. Panel a illustrates the actual relative risk level assigned to each country at Epi week 40
for a fixed classification scheme, R = 0.2. Panels b–e each correspond to different forecast windows, specifically b N = 1, c N = 2, d N = 4, e N = 8,
and f N = 12. The inset shown by the small rectangle highlights the actual and predicted risk in the Caribbean islands. For panels b–e, the red
indicates a correctly predicted high-risk country and green indicates a correctly predicted low-risk country. Light gray indicates an incorrectly
predicted high-risk country, and dark gray indicates an incorrectly predicted low-risk country. The risk indicator used is case counts
classification scheme of R = 0.2, again for Epi week country (TN), light gray indicates an incorrectly pre-
40. Figure 4a illustrates the actual risk classification dicted high-risk country (FP), and dark gray indicates
(high or low) each country is assigned to in Epi week an incorrectly predicted low-risk country (FN). The
40, based on reported case counts. The results pre- inset highlights the results for the Caribbean islands.
sented in the remaining panels of Fig. 4 reveal the Similar to Fig. 3, for each forecast window, the re-
risk level (high or low) predicted for each country ported ACC is averaged both over all regions and for
under the five forecasting windows, specifically (b) just the Caribbean.
N = 1, (c) N = 2, (d) N = 4, (e) N = 8, and (f) N = 12, The model’s performance and sensitivity to the
and whether or not it was correct. For panels (b)–(e), complete range of input parameters are summarized in
red indicates a correctly predicted high-risk country Additional file 13: Table S2. ACC is presented for each
(TP), green indicates a correctly predicted low-risk combination of risk indicator (case count and incidence
Akhtar et al. BMC Medicine (2019) 17:171 Page 9 of 16
Fig. 5 Aggregate model performance measured by ACC (averaged over all locations and all weeks) for all combinations of relative risk
classification schemes (i.e., R = 0.1, 0.2, 0.3, 0.4, and 0.5) and forecast windows (i.e., N = 1, 2, 4, 8, and 12), where the risk indicator is case counts
rate), classification scheme (i.e., R = 0.1, 0.2, 0.3, 0.4, 0.5 combination of risk classification scheme (i.e., R = 0.1,
and A = 90, 80, 70, 60, 50) and forecast window (i.e., N = 0.2, 0.3, 0.4, and 0.5) and forecast window (i.e., N = 1, 2,
1, 2, 4, 8, and 12), for selected Epi weeks throughout the 4, 8, and 12). The aggregated ROC curves (averaged over
epidemic. ROC AUC (averaged over all locations and all locations and all epidemiological weeks) for R = 0.4
all EPI weeks) is computed for all combinations of are presented in Fig. 6 and reveal the (expected) in-
risk indicator (case count and incidence rate), classifi- creased accuracy of the model as the forecast window is
cation scheme (i.e., R = 0.1, 0.2, 0.3, 0.4, 0.5 and A = reduced. The ROC AUC results are consistent with
90, 80, 70, 60, 50), and forecast window (i.e., N = 1, 2, ACC results presented in Fig. 5, highlighting the super-
4, 8, and 12). ior performance of the 1- and 2-week ahead prediction
Figures 5 and 6 illustrate trends in the model perform- capability of the model. The ROC AUC value remains
ance as a function of classification scheme and forecast above 0.91 for N = 1, 2 and above 0.83 for N = 4, both in-
window, aggregated over space and time. Specifically, dicating high predictive accuracy of the model. The
Fig. 5 reveals the model performance (ACC, averaged ROC curves for the other relative risk classification
over all locations and all EPI weeks) for each schemes are presented in Additional file 14: Figure S2.
Fig. 6 Aggregate model performance measured by ROC AUC (averaged over all locations and all weeks) for a fixed relative risk classification
scheme, i.e., R = 0.4, and forecast windows (i.e., N = 1, 2, 4, 8, and 12), where the risk indicator is case counts
Akhtar et al. BMC Medicine (2019) 17:171 Page 10 of 16
Global and regional analysis risk classification schemes, and selected epidemiological
We further explore the model’s performance at a re- weeks (i.e., week number/starting date: 30/18 Jan 2016,
gional level by dividing the countries and territories in 40/28 Mar 2016, 50/6 Jun 2016, 60/15 Aug 2016, and
the Americas into three groups, namely the Caribbean, 70/24 Oct 2016). This time period represents a highly
South America, and Central America, as in [10], and complex period of the outbreak with country-level rank-
compare with the Global performance, i.e., all countries. ings fluctuating substantially, as evidenced in Fig. 1. Due
For each group, the average performance of the model to computation time, the sensitivity analysis was evalu-
in terms of ACC was evaluated and presented for each ated for only the 4-week forecast window. The size of
combination of risk indicator (case count and incidence the error bars illustrates the robustness of the proposed
rate), classification scheme (i.e., R = 0.1, 0.2, 0.3, 0.4, 0.5 modeling framework.
and A = 90, 80, 70, 60, 50) and forecast window (i.e., N =
1, 2, 4, 8, and 12), aggregated over then entire epidemic NARX feature selection
period (Table 2). While the NARX framework does not provide assigned
weights for each input feature as output, sensitivity ana-
Model robustness lysis can be conducted to help identify the key predictive
Figure 7a and b show how the ACC varies over 10 inde- features. We tested the performance of the NARX
pendent runs of the model. This sensitivity analysis was framework under three different combinations of input
conducted for all combinations’ risk indicator, relative features, with the particular objective of quantifying the
Fig. 7 Model performance and robustness. ACC is averaged over all locations for selected epidemiological weeks when risk indicator is a case
counts and b incidence rate, and a fixed forecast windows (i.e., N = 4). The error bars represent the variability in expected ACC across ten runs for
each combination
role of travel data in our outbreak prediction model. We in fact, sometimes performs worse than the baseline
considered (i) a simple “baseline” model using only case model. In contrast, the inclusion of travel data (weekly
count and incidence data; (ii) an expanded baseline case-weighted travel risk, weekly incidence-weighted
model that includes case and incidence data, and all travel risk, weekly incoming travel volume, weekly out-
non-travel related variables; and (iii) the proposed model going travel volume) is revealed to improve the predict-
which includes all features listed in Table 1. The results ive capability, especially for the shorter prediction
comparing the performance of these three models with windows, with a higher AUC ROC for a majority (20 of
the detailed list of input features for each is provided in the 25) of the scenarios tested. These results support the
Additional file 15: Table S1. The results reveal the case- inclusion of the dynamic travel-related variables, which
related data (regional case counts and incidence rates) to substantially increase the complexity of the model (in-
be the dominant explanatory variables for predicting puts) and, thus, justify the use of the NARX framework
outbreak risk in a region, as would be expected. The in- selected.
clusion of non-travel-related variables (regional suitabil-
ity, regional GDP, regional physicians, regional hospital Discussion
beds, regional population density) is not shown to im- Our model uses a range of environmental, socio-demo-
prove predictive capability over the baseline model and, graphic, and dynamic travel data to predict the spread of
Akhtar et al. BMC Medicine (2019) 17:171 Page 12 of 16
Zika in the Americas and the potential for local transmis- classified by the model. Paraguay and Suriname were the
sion. Therefore, our model expands on previous work by only regions incorrectly predicted as high risk. These re-
considering the static and dynamic aspects of Zika virus sults are consistent with the high reported accuracy of
transmission that were previously done in isolation [48, the model, i.e., overall ACC = 90.15%; Caribbean ACC =
67, 84]. Overall, the proposed model is shown to be accur- 96.15%.
ate and robust, especially for shorter prediction windows Figure 4 reveals that the performance of the model,
and higher risk thresholds. As would be expected, the per- expectedly, deteriorates as the forecast window in-
formance of the proposed model decreases as the predic- creases; however, the average accuracy remains above
tion window increases because of the inherent uncertainty 80% for prediction up to 8 weeks ahead and well about
in outbreak evolution over long periods of time. Specific- 90% for up to 4 weeks ahead. The prediction accuracy
ally, the model is almost 80% accurate for 4-week ahead for the Caribbean slightly lags the average performance
prediction for all classification schemes and almost 90% in the Americas. Specifically, for R = 0.2, 5 of the 11
accurate for all 2-week ahead prediction scenarios, i.e., the Caribbean regions were designated as high-risk locations
correct risk category of 9 out of 10 locations can always be at Epi week 40, i.e., Dominican Republic, Guadeloupe,
predicted, indicating strong performance. When the ob- Jamaica, Martinique, and Puerto Rico. For a 1-week pre-
jective is to identify the top 10% of at-risk regions, the diction window, N = 1, the model was able to correctly
average accuracy of the model remains above 87% for pre- predict 3 of the high-risk regions (i.e., Jamaica,
diction up to 12 weeks in advance. Generally, the model Martinique, Puerto Rico); for N = 2, it correctly identified
performance is shown to decrease as the risk threshold is two (i.e., Martinique, Puerto Rico); and for N = 4, it
reduced, e.g., the size of the high-risk group is increased, again correctly identified three (i.e., Guadeloupe,
representing a more risk-averse policy. The decrease in Martinique, Puerto Rico). However, the model did not
performance is likely due to the increased size and fluctu- correctly predict any high-risk locations in the Carib-
ation of the high-risk country set over time for lower bean at N = 8 and N = 12 window lengths. This error is
thresholds. For example, for the absolute risk threshold of due to the low and sporadic reporting of Zika cases in
A = 50, the number of countries classified as high risk fluc- the region around week 30 and the high variability of
tuates between 1 and 34 throughout the course of the epi- the outbreak over the 8- and 12-week period. Similar
demic, compared with A = 90, where the set only ranges prediction capability is illustrated for R = 0.5 (not shown
from 0 to 12 (see Additional file 12: Figure S1). These re- in the figure), in which case out of the 13 Caribbean
sults reveal the trade-off between desired forecast window high-risk locations, the model correctly identifies all lo-
and precision of the high-risk group. The quantifiable cations at N = 1, 2, and 4; 10 of the 13 locations at N = 8;
trade-off between the two model inputs (classification and only 1 of the 13 at N = 12.
scheme and forecast window) can be useful for policies When comparing performance across regions (see
which may vary in desired planning objectives. Table 2), results reveal the predictive accuracy is best
The results in Figs. 3 and 4, as well as Table 2, reveal a for the Caribbean region, while predictions for Cen-
similar trend at the regional level as was seen at the glo- tral America were consistently the worst; the discrep-
bal level, with a decrease in predictive accuracy as the ancy in performance between these groups increases
forecast window increases in length, and the high-risk as the forecast window increases. The difference in
group increases in size. As shown in Fig. 3, the ACC re- performance across regions can be attributed to the
mains above 90% for R < 0.3, indicating superior model high spatial heterogeneity of the outbreak patterns,
performance. For example, at Epi week 40, R = 0.3 and the relative ability of air travel to accurately capture
N = 4 (using outbreak data and other model variables up connectivity between locations, and errors in case
to Epi week 36), there were 16 total regions classified as reporting that may vary by region. For example, the
high risk, of which the model correctly identified 13. Caribbean, which consists of more than twice as
Furthermore, of the 16 high-risk regions, 8 were in the many locations as any other group, first reported
Caribbean (i.e., Aruba, Curacao, Dominican Republic, cases around week 25 and remained affected through-
Guadeloupe, Haiti, Jamaica, Martinique, and Puerto out the epidemic. In contrast, Central America
Rico), of which the model correctly identified 7. Only experienced a slow start to the outbreak (at least ac-
Aruba in the Caribbean and Honduras and Panama were cording to case reports) with two exceptions, namely
the only regions incorrectly predicted as low risk in this Honduras and El Salvador. The large number of af-
scenario; accurately classifying low-risk regions is also fected region in the Caribbean, with more reported
important (and assuring the model is not too risk- cases distributed over a longer time period, contrib-
averse). For the same scenario, i.e., Epi week 40, R = 0.3 uted to the training of the model, thus improving the
and N = 4, all 18 low-risk Caribbean locations and 17 of predictive capability for these regions. Additionally,
the 19 low-risk non-Caribbean locations were accurately the geographically isolated nature of Caribbean islands
Akhtar et al. BMC Medicine (2019) 17:171 Page 13 of 16
enables air travel to more accurately capture We again hypothesize that models will become better as
incoming travel risk, unlike countries in Central and the spatial resolution of available data increases.
South America, where individuals can also move
about using alternative modes, which are not Conclusions
accounted for in this study. These factors combined We have introduced a flexible, predictive modeling frame-
explain the higher predictive accuracy of the model work to forecast outbreak risk in real time that can be
for the Caribbean region and, importantly, help to scaled and readily applied in future outbreaks. An applica-
identify the critical features and types of settings tion of the model was applied to the Zika epidemic in the
under which this model is expected to perform best. Americas at a weekly temporal resolution and country-
Finally, the robustness of the model predictions is level spatial resolution, using a combination of population,
illustrated by the short error bars in Fig. 7. The model is socioeconomic, epidemiological, travel pattern, and vector
also demonstrated to perform consistently throughout the suitability data. The model performance was evaluated for
course of the epidemic, with the exception of week 30, at various risk classification schemes, forecast windows, and
which time there was limited information available to risk indicators and illustrated to be accurate and robust
train the model, e.g., the outbreak was not yet reported in across a broad range of these features. First, the model is
a majority of the affected countries. Comparing Fig. 7a more accurate for shorter prediction windows and
and b reveals relatively similar performance for both risk restrictive risk classification schemes. Secondly, regional
indicators, and Additional file 13: Table S2 demonstrates analysis reveals superior predictive accuracy for the
the model’s flexibility and adaptability with respect to both Caribbean, suggesting the model to be best suited to
the risk scheme chosen, i.e., relative or absolute, and the geographically isolated locations that are predominantly
metric used to classify outbreak risk, i.e., number of cases connected via air travel. Predicting the spread to areas that
or incidence rate in a region. are relatively isolated has previously been shown to be dif-
ficult due to the stochastic nature of infectious disease
Limitations spread [86]. Thirdly, the model performed consistently
There are several limitations in this work. The underlying well at various stages throughout the course of the
data on case reporting vary by country and may not repre- outbreak, indicating its potential value at the early stages
sent the true transmission patterns [85]. However, the of an epidemic. The model performance was not evaluated
framework presented was flexible enough to account for against simpler alternative statistical models such as linear
these biases, and we anticipate this will only be improved regression, which was not the aim of this work. We do,
as data become more robust. Additionally, 2015 travel however, encourage rigorous model comparisons in future
data was used in place of 2016 data, as has been done pre- work. The outcomes from the model can be used to better
viously [50, 65, 66], which may not be fully representative guide outbreak resource allocation decisions and can be
of travel behavior. Furthermore, air travel is the only mode easily adapted to model other vector-borne epidemics.
of travel accounted for; thus, additional person move-
ments between country pairs that share land borders are Additional files
unaccounted for, and as a result, the model likely underes-
timates the risk posed to some regions. This limitation Additional file 1: Data (cases). Country- or territory-level weekly Zika
may partially explain the increased model performance for cases. (XLSX 30 kb)
the geographically isolated Caribbean Islands, which rep- Additional file 2: Data (incidence). Country- or territory-level weekly
Zika incidence rates. (XLSX 40 kb)
resent a large proportion of ZIKV-affected regions. This
Additional file 3: Data (incoming_travel). Country- or territory-level
study does not account for species of mosquitos other weekly incoming travel volume. (XLSX 68 kb)
than Ae. Aegypti, such as Ae. Albopictus, which can also Additional file 4: Data (outgoing_travel). Country- or territory-level
spread ZIKV; however, Ae. Aegypti are known to be the weekly outgoing travel volume. (XLSX 68 kb)
primary spreading vector and responsible for the majority Additional file 5: Data (suitability). Country- or territory-level weekly
of the ZIKV epidemic in the Americas [66]. Additionally, Aedes vector suitability. (XLSX 68 kb)
alternative non-vector-borne mechanisms of transmission Additional file 6: Data (gdp). Country- or territory-level GDP per capita.
(XLSX 9 kb)
are ignored. Lastly, due to the lack of spatial resolution of
Additional file 7: Data (physicians). Country- or territory-level physicians
case reports, we were limited to make country to country per 1000 people. (XLSX 9 kb)
spread estimates. Our work neglects the vast heterogeneity Additional file 8: Data (beds). Country- or territory-level beds per 1000
in mosquito presence particularly in countries like Brazil. people. (XLSX 9 kb)
We do however appreciate that there is considerable Additional file 9: Data (pop_density). Country- or territory-level
spatial variation within countries that will bias our population densities (people per sq. km of land area). (XLSX 10 kb)
estimates (i.e., northern vs. southern Brazil) and that this Additional file 10: Data (case_weighted_travel_risk). Country- or
territory-level weekly case-weighted travel risk. (XLSX 66 kb)
may influence the weekly covariates used in this study.
Akhtar et al. BMC Medicine (2019) 17:171 Page 14 of 16
21. Shi Y, Liu X, Kok SY, Rajarethinam J, Liang S, Yap G, et al. Three-month 42. Zhu G, Xiao J, Zhang B, Liu T, Lin H, Li X, et al. The spatiotemporal
real-time dengue forecast models: an early warning system for outbreak transmission of dengue and its driving mechanism: a case study on the
alerts and policy decision support in Singapore. Environ Health 2014 dengue outbreak in Guangdong, China. Sci Total Environ. 2018;
Perspect. 2016;124(9):1369–75. 622–623:252–9.
22. Teng Y, Bi D, Xie G, Jin Y, Huang Y, Lin B, et al. Dynamic forecasting of Zika 43. Liu K, Zhu Y, Xia Y, Zhang Y, Huang X, Huang J, et al. Dynamic
epidemics using Google trends. PLoS One. 2017;12(1):e0165085. spatiotemporal analysis of indigenous dengue fever at street-level in
23. Althouse BM, Ng YY, Cummings DAT. Prediction of dengue incidence using Guangzhou city, China. PloS Negl Trop Dis. 2018;12(3):e0006318.
search query surveillance. PLoS Negl Trop Dis. 2011;5(8):e1258. 44. Li Q, Cao W, Ren H, Ji Z, Jiang H. Spatiotemporal responses of dengue fever
24. Morsy S, Dang TN, Kamel MG, Zayan AH, Makram OM, Elhady M, et al. transmission to the road network in an urban area. Acta Trop. 2018;183:8–13.
Prediction of Zika-confirmed cases in Brazil and Colombia using Google 45. Chen Y, Ong JHY, Rajarethinam J, Yap G, Ng LC, Cook AR. Neighbourhood
Trends. Epidemiol Infect. 2018;146(13):1625–7. level real-time forecasting of dengue cases in tropical urban Singapore.
25. Kraemer MUG, Faria NR, Reiner RC Jr, Golding N, Nikolay B, Stasse S, et al. BMC Med. 2018;16(1):129.
Spread of yellow fever virus outbreak in Angola and the Democratic 46. Gardner L, Sarkar S. A global airport-based risk model for the spread
Republic of the Congo 2015-16: a modelling study. Lancet Infect Dis. 2017; of dengue infection via the air transport network. PLoS One. 2013;
17(3):330–8. 8(8):e72129.
26. Zhang Q, Sun K, Chinazzi M, Pastore YPA, Dean NE, Rojas DP, et al. Spread of 47. Gardner L, Fajardo D, Waller ST, Wang O, Sarkar S. A predictive spatial
Zika virus in the Americas. Proc Natl Acad Sci U S A. 2017;114(22):E4334–E43. model to quantify the risk of air-travel-associated dengue importation into
27. Ahmadi S, Bempong N-E, De Santis O, Sheath D, Flahault A. The role of the United States and Europe. J Trop Med. 2012;2012:ID 103679 11pages.
digital technologies in tackling the Zika outbreak: a scoping review. J Public 48. Grubaugh ND, Ladner JT, Kraemer MUG, Dudas G, Tan AL, Gangavarapu K,
Health Emerg. 2018;2(20):1–15. et al. Genomic epidemiology reveals multiple introductions of Zika virus
28. Majumder MS, Santillana M, Mekaru SR, McGinnis DP, Khan K, Brownstein JS. into the United States. Nature. 2017;546:401.
Utilizing nontraditional data sources for near real-time estimation of 49. Wilder-Smith A, Gubler DJ. Geographic expansion of dengue: the impact of
transmission dynamics during the 2015-2016 Colombian Zika virus disease international travel. Med Clin North Am. 2008;92(6):1377–90 x.
outbreak. JMIR Public Health Surveill. 2016;2(1):e30. 50. Gardner LM, Bota A, Gangavarapu K, Kraemer MUG, Grubaugh ND. Inferring
29. Beltr JD, Boscor A, WPd S, Massoni T, Kostkova P. ZIKA: a new system to the risk factors behind the geographical spread and transmission of Zika in
empower health workers and local communities to improve surveillance the Americas. PLoS Negl Trop Dis. 2018;12(1):e0006194.
protocols by E-learning and to forecast Zika virus in real time in Brazil. In: 51. Tatem AJ, Hay SI. Climatic similarity and biological exchange in the worldwide
Proceedings of the 2018 International Conference on Digital Health, vol. airline transportation network. Proc R Soc B Biol Sci. 2007;274(1617):1489.
3194683. Lyon: ACM; 2018. p. 90–4. 52. Siriyasatien P, Phumee A, Ongruk P, Jampachaisri K, Kesorn K. Analysis of
30. Cortes F, Turchi Martelli CM, Arraes de Alencar Ximenes R, Montarroyos UR, significant factors for dengue fever incidence prediction. BMC
Siqueira Junior JB, Goncalves Cruz O, et al. Time series analysis of dengue Bioinformatics. 2016;17(1):166.
surveillance data in two Brazilian cities. Acta Trop. 2018;182:190–7. 53. Nishanthi PHM, Perera AAI, Wijekoon HP. Prediction of dengue outbreaks in Sri
31. Abdur Rehman N, Kalyanaraman S, Ahmad T, Pervaiz F, Saif U, Subramanian Lanka using artificial neural networks. Int J Comput Appl. 2014;101(15):1–5.
L. Fine-grained dengue forecasting using telephone triage services. Sci Adv. 54. Aburas HM, Cetiner BG, Sari M. Dengue confirmed-cases prediction: a neural
2016;2(7):e1501215. network model. Expert Syst Appl. 2010;37(6):4256–60.
32. Lowe R, Stewart-Ibarra AM, Petrova D, Garcia-Diez M, Borbor-Cordova 55. Baquero OS, Santana LMR, Chiaravalloti-Neto F. Dengue forecasting in São
MJ, Mejia R, et al. Climate services for health: predicting the evolution Paulo city with generalized additive models, artificial neural networks and
of the 2016 dengue season in Machala, Ecuador. Lancet Planet Health. seasonal autoregressive integrated moving average models. PLoS One.
2017;1(4):e142–e51. 2018;13(4):e0195065.
33. Ramadona AL, Lazuardi L, Hii YL, Holmner A, Kusnanto H, Rocklov J. 56. Faisal T, Taib MN, Ibrahim F. Neural network diagnostic system for dengue
Prediction of dengue outbreaks based on disease surveillance and patients risk classification. J Med Syst. 2012;36(2):661–76.
meteorological data. PLoS One. 2016;11(3):e0152688. 57. Laureano-Rosario EA, Duncan PA, Mendez-Lazaro AP, Garcia-Rejon EJ,
34. Lauer SA, Sakrejda K, Ray EL, Keegan LT, Bi Q, Suangtho P, et al. Gomez-Carro S, Farfan-Ale J, et al. Application of artificial neural networks
Prospective forecasts of annual dengue hemorrhagic fever for dengue fever outbreak predictions in the northwest coast of Yucatan,
incidence in Thailand, 2010-2014. Proc Natl Acad Sci U S A. 2018; Mexico and San Juan, Puerto Rico. Trop Med Infect Dis. 2018;3(1):5.
115(10):E2175–E82. 58. Kiskin IOB, Windebank T, Zilli D, Sinka M, Willis K, Roberts S. Mosquito
35. Baquero OS, Santana LMR, Chiaravalloti-Neto F. Dengue forecasting in Sao detection with neural networks: the buzz of deep learning. arXiv:1705.05180.
Paulo city with generalized additive models, artificial neural networks and 59. Scavuzzo JM, Trucco FC, Tauro CB, German A, Espinosa M, Abril M.
seasonal autoregressive integrated moving average models. PLoS One. Modeling the temporal pattern of dengue, Chicungunya and Zika vector
2018;13(4):e0195065. using satellite data and neural networks. In: 2017 XVII Workshop on
36. Sirisena P, Noordeen F, Kurukulasuriya H, Romesh TA, Fernando L. Effect of Information Processing and Control (RPIC); 2017. 20–22 Sept. 2017.
climatic factors and population density on the distribution of dengue in Sri 60. Sanchez-Ortiz A, Fierro-Radilla A, Arista-Jalife A, Cedillo-Hernandez M,
Lanka: a GIS based evaluation for prediction of outbreaks. PLoS One. 2017; Nakano-Miyatake M, Robles-Camarillo D, et al. Mosquito larva classification
12(1):e0166806. method based on convolutional neural networks. In: 2017 International
37. Anggraeni W, Aristiani L. Using Google Trend data in forecasting number of Conference on Electronics, Communications and Computers
dengue fever cases with ARIMAX method case study: Surabaya, Indonesia. (CONIELECOMP); 2017. 22–24 Feb. 2017.
In: 2016 International Conference on Information & Communication 61. Nguyen T, Khosravi A, Creighton D, Nahavandi S. Epidemiological dynamics
Technology and Systems (ICTS); 2016. 12–12 Oct. 2016. modeling by fusion of soft computing techniques. In: The 2013 International
38. Marques-Toledo CA, Degener CM, Vinhal L, Coelho G, Meira W, Codeco CT, Joint Conference on Neural Networks (IJCNN); 2013. 4–9 Aug. 2013.
et al. Dengue prediction by the web: tweets are a useful tool for estimating 62. Jiang D, Hao M, Ding F, Fu J, Li M. Mapping the transmission risk of Zika
and forecasting dengue at country and city level. PLoS Negl Trop Dis. 2017; virus using machine learning models. Acta Trop. 2018;185:391–9.
11(7):e0005729. 63. Wahba G. Spline models for observational data: Society for Industrial and
39. Cheong YL, Leitão PJ, Lakes T. Assessment of land use factors associated Applied Mathematics; 1990. p. 177.
with dengue cases in Malaysia using boosted regression trees. Spat 64. PAHO. Countries and territories with autochthonous transmission in the
Spatiotemporal Epidemiol. 2014;10:75–84. Americas reported in 2015-2017. Washington DC: World Health
40. Wesolowski A, Qureshi T, Boni MF, Sundsoy PR, Johansson MA, Rasheed SB, Organization, Pan American Health Organization; 2017. Available from:
et al. Impact of human mobility on the emergence of dengue epidemics in http://www.paho.org/hq/index.php?option=com_content&view=
Pakistan. Proc Natl Acad Sci U S A. 2015;112(38):11887–92. article&id=11603&Itemid=41696&lang=en
41. Zhu G, Liu J, Tan Q, Shi B. Inferring the spatio-temporal patterns of dengue 65. Gardner L, Chen N, Sarkar S. Vector status of Aedes species determines
transmission from surveillance data in Guangzhou, China. PLoS Negl Trop geographical risk of autochthonous Zika virus establishment. PLoS Negl
Dis. 2016;10(4):e0004633. Trop Dis. 2017;11(3):e0005487.
Akhtar et al. BMC Medicine (2019) 17:171 Page 16 of 16
66. Gardner LM, Chen N, Sarkar S. Global risk of Zika virus depends critically on
vector status of Aedes albopictus. Lancet Infect Dis. 2016;16(5):522–3.
67. Kraemer MU, Sinka ME, Duda KA, Mylne AQ, Shearer FM, Barker CM, et al.
The global distribution of the arbovirus vectors Aedes aegypti and Ae.
albopictus. Elife. 2015;4:e08347.
68. Theze J, Li T, du Plessis L, Bouquet J, Kraemer MUG, Somasekar S, et al.
Genomic epidemiology reconstructs the introduction and spread of Zika virus
in Central America and Mexico. Cell Host Microbe. 2018;23(6):855–64 e7.
69. WorldBank. International Comparison Program database. GDP per capita,
PPP 2016. Available from: https://data.worldbank.org/indicator/NY.GDP.PCAP.
PP.CD.
70. U.S. Bureau of Economic Analysis. GDP by State. Available from: https://
www.bea.gov/data/gdp/gdp-state.
71. U.S. Department of Health and Human Services. Health, United States. 2015.
Available from: https://www.cdc.gov/nchs/data/hus/hus15.pdf.
72. World Health Organization (WHO). WHO World Health Statistics. Available
from: http://www.who.int/gho/publications/world_health_statistics/2015/
en/.
73. World Health Organization (WHO)/Pan American Health Organization
(PAHO). PLISA Health Indication Platform for the Americas. 2017. Available
from: http://www.paho.org/data/index.php/en/.
74. World Bank Open Data. Population density (people per sq. km of land area).
2016. Available from: http://data.worldbank.org/indicator/EN.POP.DNST.
75. International Air Travel Association (IATA)- Passenger Intelligence Services
(PaxIS): http://www.iata.org/services/statistics/intelligence/paxis/Pages/index.
aspx.
76. Pigott D, Deshpande A, Letourneau I, Morozoff C, Reiner R Jr, Kraemer M, et
al. Local, national, and regional viral haemorrhagic fever pandemic potential
in Africa: a multistage analysis. Lancet. 2017;390(10113):2662–72.
77. Leontaritis IJ, Billings SA. Input-output parametric models for non-linear systems
part I: deterministic non-linear systems. Int J Control. 1985;41(2):303–28.
78. Narendra KS, Parthasarathy K. Identification and control of dynamical
systems using neural networks. IEEE Trans Neural Netw. 1990;1(1):4–27.
79. Chen S, Billings SA, Grant PM. Non-linear system identification using neural
networks. Int J Control. 1990;51(6):1191–214.
80. Siegelmann HT, Horne BG, Giles CL. Computational capabilities of recurrent NARX
neural networks. IEEE Trans Syst Man Cybern B Cybern. 1997;27(2):208–15.
81. Tsungnan L, Bill GH, Peter T, Giles CL. Learning long-term dependencies is
not as difficult with NARX recurrent neural networks. College Park:
University of Maryland; 1995. p. 23.
82. Boussaada Z, Curea O, Remaci A, Camblong H, Mrabet Bellaaj N. A nonlinear
autoregressive exogenous (NARX) neural network model for the prediction
of the daily direct solar radiation. Energies. 2018;11(3):620.
83. Fawcett T. ROC graphs: notes and practical considerations for researchers.
Mach Learn. 2004;31:1–38.
84. Bogoch II, Brady OJ, Kraemer MUG, German M, Creatore MI, Kulkarni MA, et
al. Anticipating the international spread of Zika virus from Brazil. Lancet.
2016;387(10016):335–6.
85. Faria NR, Quick J, Claro IM, Thézé J, de Jesus JG, Giovanetti M, et al.
Establishment and cryptic transmission of Zika virus in Brazil and the
Americas. Nature. 2017;546:406.
86. Brockmann D, Helbing D. The hidden geometry of complex, network-driven
contagion phenomena. Science. 2013;342:1337–42.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations.