Modelling COVID-19 Outbreak: Segmented Regression To Assess Lockdown Effectiveness
Modelling COVID-19 Outbreak: Segmented Regression To Assess Lockdown Effectiveness
net/publication/340664370
CITATIONS
3 authors, including:
Vito Muggeo
Università degli Studi di Palermo
76 PUBLICATIONS 2,545 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Vito Muggeo on 16 April 2020.
Abstract
We discuss a statistical framework to monitor COVID-19 epidemic outbreak.
More specifically we use segmented regression to quantify the deceleration of
epidemic spreading likely due to effectiveness of lockdown policy. We present
R code to analyze daily time series of COVID-19 total cases in some countries.
1 Introduction
The SARS-CoV-2 virus has emerged in the end of 2019 in China, and in a few weeks, it
spread in other countries worldwide, including Europe and USA. At time of writing, the
infectious disease caused by the new virus, named as COVID-19 is currently considered as
a global threat.
Governments in almost all countries hit by COVID-19 adopt containment policies
aimed to slow down the spread of the new virus. Among these measures social distancing is
considered as the most effective. Typically, at the beginning the policies of social distancing
have been limited to school or university closing, and soon after people have been forced to
stay at home leading to the so-called lockdown. While the lockdown effect in the spreading
of a contagious disease could be somewhat expected, it could be of interest to prove and to
quantify its effectiveness in influencing some of the well known measures which are usually
taken to monitor an epidemic progression: growth rates and doubling times.
We discuss a statistical framework to provide some answer to the question: ‘what is the
benefit of staying at home’ ? what about the lockdown impact on the virus spreading? ’ In
other words, we aim to quantify the epidemic slowdown, probably ascribable to lockdown
effectiveness. In the next sections we discuss the relevant statistical framework without
going into methodological details and providing some worked examples.
2 Methods
As it is well known, epidemics follow a logistic-growth trend, namely an exponential growth
at early stages, and a log-type pattern afterwards, see Muggeo & Porcu (2020) for a gentle
introduction in the covid-19 context. Our focus is on the beginnings, when lockdown
measures are set to slow down the spread of epidemic. Let Yt be the cumulative number
of reported infected cases at day t = 1, 2, . . . , n. The exponential growth of the epidemic
leads to consider the regression equation for the expected value E[Yt ]
where exp{β1 } − 1 is the average growth rate between two consecutive days. If the lock-
down measures are set up, we expect some drop in the growth rate, and as the lockdown
∗ Dipartimento di Scienze Econom, Az. e Statistiche, Università di Palermo, Italy.
† Dipartimento di Scienze Politiche e Sociali, Università di Cagliari, Italy.
1
continues, further decelerations should be observed; namely, the exponential trend could
be split into two or more regimes with decreasing growth rates. Thus model (1) is extended
to (Muggeo, 2003)
> args(segmented19)
function(y, npsi, rule=">=0", oseg, fix.ppsi=FALSE, origin, format, ...)
We detail the arguments below, while the other functions, the print and the plot methods,
are illustrated in the next section via a worked example.
• y: the cumulative outcome counts (total cases typically);
• npsi: scalar or vector meaning the number of breakpoints to assess. If scalar, one
segmented model is fitted, otherwise several models are fitted and the returned fit
is selected by BIC;
• rule: a character to consider counts larger than the specified value. E.g. ">=100"
to drop the ()first) observations with values smaller than 100;
• oseg: an optional segmented fit. If provided, its estimated breakpoints are supplied
as fixed or starting values (depending on fix.ppsi) in the new segmented fit;
• fix.ppsi: if oseg is provided, fix.ppsi=TRUE means that the breakpoints in oseg
are kept fixed in the new segmented fit, otherwise they are assumed as starting
values;
• origin, format: optional, the origin date of the time series and relevant format.
If supplied the plots and results will expressed also in terms of dates (rather then
simple numerical values 1, 2, . . . ).
3 Application
To illustrate, we use data dowloaded from the European Centre for Disease Prevention
and Control (ECDC) website. Data are available from https://www.ecdc.europa.eu, and
we load in R via the standard read.csv() function. We assume the dataframe is named
d.ecdc. The function take.y() is a simple utility function aimed to extract the cumulative
total cases vector for a specified country from the ECDC database. The function also
returns, as an attribute of the vector, the origin date which is useful for displaying and
plotting results. For instance, the total cases vector from UK is obtained via
> y<-take.y(d.ecdc, "United_Kingdom")
Since we do not known the number of breakpoints, we fit several segmented models
(2) (with K = 1, . . . , 6) by letting the BIC to return the best one
2
> oUK<-segmented19(y, npsi=1:6, rule=">100")
Running ... 4 breakpoints selected by BIC
where the specification rule=">100" is set to rule out the first observations with low
values, i.e. less than one hundred. Actually the function tries to fit models with all
specified breakpoints (1 to 6 in the above line), but if data do not support too many
breakpoints, estimation could be problematic with no final convergence. If this is the case,
the corresponding model is not returned. The BIC values are stored in the bic vector of
the object,
> oUK$bic
1 2 3 4 6
2115.0422 946.2029 644.3881 615.1100 652.3153
where we note the model with 5 breakpoints has not been fitted due to non-convergence
issues, as discussed above. Typically, non-convergence issue suggests that the model being
fitted is overparameterized, and therefore it could be discarded.
The print method returns some basic information, including the percent growth rates
r̂k and doubling times dˆk , along with the breakpoints where the trend has been estimated
to change.
> oUK
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@ Epidemic modelling via segmented regression @@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Number of regimes: 5
% Growth Rates: 26.5 19.9 14.7 11.5 8.6
Doubling Times: 2.95 3.81 5.07 6.36 8.38
Breakpoints: 15.18 24 29.5 32
(dates): 03-21 03-30 04-05 04-07
The fit object, oUK in the exampe above, is of class ‘segmented’, so the usual functions,
including summary.segmented() and plot.segmented(), can be used to get details. For
instance, the growth rates estimates and confidence intervals can be obtained via the
function slope(.., APC=TRUE), where APC allows to obtain the per cent changes (and
confidence intervals) rather than the simple slopes. The doubling times, which are a
simple reparametrization of slopes, may be gained via
> log(2)/slope(oUK)[[1]][,c(1,5,4)] #columns for estimates, and CI limits
Est. CI(95%).u CI(95%).l
slope1 2.947430 2.885829 3.011719
slope2 3.811642 3.735837 3.890588
...
Finally, we use the plot method (i.e. plot.segmented19()) to display the fitted piece-
wise exponential lines along with the observations, if needed, and possibly the percent
growth rates as legend. Assuming the object fits have been obtained using the same line
code as for the oUK object, Figure 1 reports the fitted curves for some countries, as obtained
by the following code
> plot(oIta, main="Italy") ##default..
> plot(oIran, main="Iran", col=2:3) ##lines with 2 alternating colors
> plot(oUS, main="US", grey=TRUE, pcol=1, pcex=.8) ##lines in grey scale..
> plot(oUK, main="UK", col=4, logs=TRUE, pcol=2) ##lines with a single color
> plot(oGe, main="Germany", col=c(1,2,7)) ##lines with 3 alternating colors
>
> oSpain1$origin=NULL ##erase the origin not to show dates on the axis
> plot(oSpain1, pcex=0, col=2, leg=FALSE, prev.trend=FALSE, psi.int=FALSE,
+ main="Spain (red) and France (blue)")
> plot(oFra, add=TRUE, pcex=0, prev.trend=FALSE, col=4)
Some arguments look rather intuitive, including col for the line colors, pcol and pcex for
color and cex for the circle (observations), and logs=TRUE to plot the fitted lines on the
log-scale (see UK plot). prev.trend=TRUE and psi.int=TRUE allow to draw the dashed
lines of hypothetical (past) trends and the tick lines on bottom to distinguish the sub-
intervals identified by the estimated breakpoints. Note that the bottom right plot provide
an example of multiple curve fitting suitable for comparisons (in this case Spain and France
trends).
3
Italy Iran
150K 40.1 (37.5, 42.8) 70K 58.8 (55.2, 62.5)
23.7 (23, 24.4) 26.5 (24.5, 28.6)
18.9 (18.4, 19.4) 11.8 (11.5, 12.2)
no. of total cases
60K 28K
30K 14K
0 0
02−25
02−26
02−27
02−28
02−29
03−01
03−02
03−03
03−04
03−05
03−06
03−07
03−08
03−09
03−10
03−11
03−12
03−13
03−14
03−15
03−16
03−17
03−18
03−19
03−20
03−21
03−22
03−23
03−24
03−25
03−26
03−27
03−28
03−29
03−30
03−31
04−01
04−02
04−03
04−04
04−05
04−06
04−07
04−08
04−09
04−10
04−11
04−12
04−13
02−28
02−29
03−01
03−02
03−03
03−04
03−05
03−06
03−07
03−08
03−09
03−10
03−11
03−12
03−13
03−14
03−15
03−16
03−17
03−18
03−19
03−20
03−21
03−22
03−23
03−24
03−25
03−26
03−27
03−28
03−29
03−30
03−31
04−01
04−02
04−03
04−04
04−05
04−06
04−07
04−08
04−09
04−10
04−11
04−12
04−13
US UK
31.3 (30.6, 32.1) 11 26.5 (25.9, 27.2)
500K 39.4 (38.9, 39.9) 20 (19.5, 20.4)
200K 7.4
6.2
100K
5
0
03−04
03−05
03−06
03−07
03−08
03−09
03−10
03−11
03−12
03−13
03−14
03−15
03−16
03−17
03−18
03−19
03−20
03−21
03−22
03−23
03−24
03−25
03−26
03−27
03−28
03−29
03−30
03−31
04−01
04−02
04−03
04−04
04−05
04−06
04−07
04−08
04−09
04−10
04−11
04−12
04−13
03−07
03−08
03−09
03−10
03−11
03−12
03−13
03−14
03−15
03−16
03−17
03−18
03−19
03−20
03−21
03−22
03−23
03−24
03−25
03−26
03−27
03−28
03−29
03−30
03−31
04−01
04−02
04−03
04−04
04−05
04−06
04−07
04−08
04−09
04−10
04−11
04−12
04−13
Germany Spain (red) and France (blue)
120K 28.2 (27.6, 28.9)
21.4 (19.2, 23.8) 150K
46 (44.1, 47.9)
no. of total cases
48K 60K
24K 30K
0 0
03−02
03−03
03−04
03−05
03−06
03−07
03−08
03−09
03−10
03−11
03−12
03−13
03−14
03−15
03−16
03−17
03−18
03−19
03−20
03−21
03−22
03−23
03−24
03−25
03−26
03−27
03−28
03−29
03−30
03−31
04−01
04−02
04−03
04−04
04−05
04−06
04−07
04−08
04−09
04−10
04−11
04−12
04−13
1 2 3 4 5 6 7 8 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41
days
We conclude this brief report by trying to summarize the piecewise trends of different
countries by means of the average annual per cent change, i.e. an average of the slopes
weighted by the corresponding interval width (Muggeo, 2010). To this aim, we use the
function aapc() available in the package segmented, and report results just for Italy
Thus the piecewise linear trend, for Italy, could be summarized by an average growth
rate of 15.4%(= e0.143 − 1) over the considered period. Similar estimates could be obtained
and contrasted for the other countries.
References
Muggeo, V.M.R. (2003). Estimating regression models with unknown break-points.
Statistics in Medicine 22, 3055–3071.
Muggeo, V.M.R. (2008). Segmented: an R package to fit regression models with broken-
line relationships. Rnews 8, 20–25.
Muggeo, V.M.R. (2010). Comment on “estimating average annual per cent change in
trend analysis” by Clegg et al., Stat Med 2009. Statistics in Medicine 29, 1958–1960.