0% found this document useful (0 votes)
64 views5 pages

Modelling COVID-19 Outbreak: Segmented Regression To Assess Lockdown Effectiveness

- The document discusses using segmented regression to analyze daily COVID-19 case data and quantify the impact of lockdown policies on slowing the spread of the virus. - Segmented regression allows modeling changes in the growth rate over time, with each segment representing a different growth regime. This can identify breakpoints where growth rates decreased, likely due to lockdown measures. - Code is provided to fit segmented regression models to COVID-19 case count data from various countries and estimate growth rates, doubling times, and breakpoints corresponding to changes in growth trends.

Uploaded by

Debabrata Saha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views5 pages

Modelling COVID-19 Outbreak: Segmented Regression To Assess Lockdown Effectiveness

- The document discusses using segmented regression to analyze daily COVID-19 case data and quantify the impact of lockdown policies on slowing the spread of the virus. - Segmented regression allows modeling changes in the growth rate over time, with each segment representing a different growth regime. This can identify breakpoints where growth rates decreased, likely due to lockdown measures. - Code is provided to fit segmented regression models to COVID-19 case count data from various countries and estimate growth rates, doubling times, and breakpoints corresponding to changes in growth trends.

Uploaded by

Debabrata Saha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/340664370

Modelling COVID-19 outbreak: segmented regression to assess lockdown


effectiveness

Technical Report · April 2020


DOI: 10.13140/RG.2.2.32798.28485

CITATIONS

3 authors, including:

Vito Muggeo
Università degli Studi di Palermo
76 PUBLICATIONS   2,545 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Applications of segmented modelling and breakpoint estimation View project

Additive quantile regression via P-splines View project

All content following this page was uploaded by Vito Muggeo on 16 April 2020.

The user has requested enhancement of the downloaded file.


Modelling COVID-19 outbreak:
segmented regression to assess
lockdown effectiveness
Vito M.R. Muggeo∗ Gianluca Sottile* Mariano Porcu†
April 16, 2020

Abstract
We discuss a statistical framework to monitor COVID-19 epidemic outbreak.
More specifically we use segmented regression to quantify the deceleration of
epidemic spreading likely due to effectiveness of lockdown policy. We present
R code to analyze daily time series of COVID-19 total cases in some countries.

1 Introduction
The SARS-CoV-2 virus has emerged in the end of 2019 in China, and in a few weeks, it
spread in other countries worldwide, including Europe and USA. At time of writing, the
infectious disease caused by the new virus, named as COVID-19 is currently considered as
a global threat.
Governments in almost all countries hit by COVID-19 adopt containment policies
aimed to slow down the spread of the new virus. Among these measures social distancing is
considered as the most effective. Typically, at the beginning the policies of social distancing
have been limited to school or university closing, and soon after people have been forced to
stay at home leading to the so-called lockdown. While the lockdown effect in the spreading
of a contagious disease could be somewhat expected, it could be of interest to prove and to
quantify its effectiveness in influencing some of the well known measures which are usually
taken to monitor an epidemic progression: growth rates and doubling times.
We discuss a statistical framework to provide some answer to the question: ‘what is the
benefit of staying at home’ ? what about the lockdown impact on the virus spreading? ’ In
other words, we aim to quantify the epidemic slowdown, probably ascribable to lockdown
effectiveness. In the next sections we discuss the relevant statistical framework without
going into methodological details and providing some worked examples.

2 Methods
As it is well known, epidemics follow a logistic-growth trend, namely an exponential growth
at early stages, and a log-type pattern afterwards, see Muggeo & Porcu (2020) for a gentle
introduction in the covid-19 context. Our focus is on the beginnings, when lockdown
measures are set to slow down the spread of epidemic. Let Yt be the cumulative number
of reported infected cases at day t = 1, 2, . . . , n. The exponential growth of the epidemic
leads to consider the regression equation for the expected value E[Yt ]

log E[Yt ] = β0 + β1 t (1)

where exp{β1 } − 1 is the average growth rate between two consecutive days. If the lock-
down measures are set up, we expect some drop in the growth rate, and as the lockdown
∗ Dipartimento di Scienze Econom, Az. e Statistiche, Università di Palermo, Italy.
† Dipartimento di Scienze Politiche e Sociali, Università di Cagliari, Italy.

1
continues, further decelerations should be observed; namely, the exponential trend could
be split into two or more regimes with decreasing growth rates. Thus model (1) is extended
to (Muggeo, 2003)

log E[Yt ] = β0 + β1 t + δ1 (t − ψ1 )+ + . . . + δK (t − ψK )+ . (2)


PK
We assume there are K + 1 regimes with slopes β1 , β2 = β1 + δ1 , and βK+1 = β1 + δk
from which the growth rates can be computed, i.e. rk = exp{βk } − 1 for each regime
k = 1, 2, 1 . . . , K + 1. Along with the growth rates, the number of days requested to double
the number of cases is a quite easy-to-understand parameter which is often reported by
epidemiologists and health experts. These so-called doubling times are obtained by means
of a simple parameterization of the slopes, namely: dk = log(2)/βk , again for each regime
k. Poisson likelihoods or quasi-likelihoods are maximized to get estimates of all model
parameters, including the breakpoints. When several segmented models have been fitted,
the BIC (Bayesian Information Criterion) can be used to select the one that fits better
the data, namely the model with appropriate number of breakpoints suitable to fit the
observed data. The R package segmented (Muggeo, 2008) can be used to fit the segmented
regression model (2). However, in order to simplify and to make easyer estimation and
visualization of results, some wrapper functions have been written. At time of writing the
functions can be downloaded from
https://www.researchgate.net/publication/340664259
The fitter function is segmented19() with obvious reference to COVID-19 epidemic. The
function reads as

> args(segmented19)
function(y, npsi, rule=">=0", oseg, fix.ppsi=FALSE, origin, format, ...)

We detail the arguments below, while the other functions, the print and the plot methods,
are illustrated in the next section via a worked example.
• y: the cumulative outcome counts (total cases typically);
• npsi: scalar or vector meaning the number of breakpoints to assess. If scalar, one
segmented model is fitted, otherwise several models are fitted and the returned fit
is selected by BIC;
• rule: a character to consider counts larger than the specified value. E.g. ">=100"
to drop the ()first) observations with values smaller than 100;
• oseg: an optional segmented fit. If provided, its estimated breakpoints are supplied
as fixed or starting values (depending on fix.ppsi) in the new segmented fit;
• fix.ppsi: if oseg is provided, fix.ppsi=TRUE means that the breakpoints in oseg
are kept fixed in the new segmented fit, otherwise they are assumed as starting
values;
• origin, format: optional, the origin date of the time series and relevant format.
If supplied the plots and results will expressed also in terms of dates (rather then
simple numerical values 1, 2, . . . ).

3 Application
To illustrate, we use data dowloaded from the European Centre for Disease Prevention
and Control (ECDC) website. Data are available from https://www.ecdc.europa.eu, and
we load in R via the standard read.csv() function. We assume the dataframe is named
d.ecdc. The function take.y() is a simple utility function aimed to extract the cumulative
total cases vector for a specified country from the ECDC database. The function also
returns, as an attribute of the vector, the origin date which is useful for displaying and
plotting results. For instance, the total cases vector from UK is obtained via
> y<-take.y(d.ecdc, "United_Kingdom")

Since we do not known the number of breakpoints, we fit several segmented models
(2) (with K = 1, . . . , 6) by letting the BIC to return the best one

2
> oUK<-segmented19(y, npsi=1:6, rule=">100")
Running ... 4 breakpoints selected by BIC

where the specification rule=">100" is set to rule out the first observations with low
values, i.e. less than one hundred. Actually the function tries to fit models with all
specified breakpoints (1 to 6 in the above line), but if data do not support too many
breakpoints, estimation could be problematic with no final convergence. If this is the case,
the corresponding model is not returned. The BIC values are stored in the bic vector of
the object,
> oUK$bic
1 2 3 4 6
2115.0422 946.2029 644.3881 615.1100 652.3153

where we note the model with 5 breakpoints has not been fitted due to non-convergence
issues, as discussed above. Typically, non-convergence issue suggests that the model being
fitted is overparameterized, and therefore it could be discarded.
The print method returns some basic information, including the percent growth rates
r̂k and doubling times dˆk , along with the breakpoints where the trend has been estimated
to change.
> oUK
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@ Epidemic modelling via segmented regression @@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Number of regimes: 5
% Growth Rates: 26.5 19.9 14.7 11.5 8.6
Doubling Times: 2.95 3.81 5.07 6.36 8.38
Breakpoints: 15.18 24 29.5 32
(dates): 03-21 03-30 04-05 04-07

The fit object, oUK in the exampe above, is of class ‘segmented’, so the usual functions,
including summary.segmented() and plot.segmented(), can be used to get details. For
instance, the growth rates estimates and confidence intervals can be obtained via the
function slope(.., APC=TRUE), where APC allows to obtain the per cent changes (and
confidence intervals) rather than the simple slopes. The doubling times, which are a
simple reparametrization of slopes, may be gained via
> log(2)/slope(oUK)[[1]][,c(1,5,4)] #columns for estimates, and CI limits
Est. CI(95%).u CI(95%).l
slope1 2.947430 2.885829 3.011719
slope2 3.811642 3.735837 3.890588
...

Finally, we use the plot method (i.e. plot.segmented19()) to display the fitted piece-
wise exponential lines along with the observations, if needed, and possibly the percent
growth rates as legend. Assuming the object fits have been obtained using the same line
code as for the oUK object, Figure 1 reports the fitted curves for some countries, as obtained
by the following code
> plot(oIta, main="Italy") ##default..
> plot(oIran, main="Iran", col=2:3) ##lines with 2 alternating colors
> plot(oUS, main="US", grey=TRUE, pcol=1, pcex=.8) ##lines in grey scale..
> plot(oUK, main="UK", col=4, logs=TRUE, pcol=2) ##lines with a single color
> plot(oGe, main="Germany", col=c(1,2,7)) ##lines with 3 alternating colors
>
> oSpain1$origin=NULL ##erase the origin not to show dates on the axis
> plot(oSpain1, pcex=0, col=2, leg=FALSE, prev.trend=FALSE, psi.int=FALSE,
+ main="Spain (red) and France (blue)")
> plot(oFra, add=TRUE, pcex=0, prev.trend=FALSE, col=4)

Some arguments look rather intuitive, including col for the line colors, pcol and pcex for
color and cex for the circle (observations), and logs=TRUE to plot the fitted lines on the
log-scale (see UK plot). prev.trend=TRUE and psi.int=TRUE allow to draw the dashed
lines of hypothetical (past) trends and the tick lines on bottom to distinguish the sub-
intervals identified by the estimated breakpoints. Note that the bottom right plot provide
an example of multiple curve fitting suitable for comparisons (in this case Spain and France
trends).

3
Italy Iran
150K 40.1 (37.5, 42.8) 70K 58.8 (55.2, 62.5)
23.7 (23, 24.4) 26.5 (24.5, 28.6)
18.9 (18.4, 19.4) 11.8 (11.5, 12.2)
no. of total cases

no. of total cases


120K 14.2 (14, 14.4) 56K 7.6 (6.8, 8.4)
7.8 (7.6, 7.9) 5.6 (5.1, 6.1)
4.2 (4, 4.3) 9 (8.8, 9.3)
90K 2.8 (2.7, 2.9) 42K 5.3 (5.1, 5.5)
2.9 (2.7, 3.1)

60K 28K

30K 14K

0 0
02−25
02−26
02−27
02−28
02−29
03−01
03−02
03−03
03−04
03−05
03−06
03−07
03−08
03−09
03−10
03−11
03−12
03−13
03−14
03−15
03−16
03−17
03−18
03−19
03−20
03−21
03−22
03−23
03−24
03−25
03−26
03−27
03−28
03−29
03−30
03−31
04−01
04−02
04−03
04−04
04−05
04−06
04−07
04−08
04−09
04−10
04−11
04−12
04−13

02−28
02−29
03−01
03−02
03−03
03−04
03−05
03−06
03−07
03−08
03−09
03−10
03−11
03−12
03−13
03−14
03−15
03−16
03−17
03−18
03−19
03−20
03−21
03−22
03−23
03−24
03−25
03−26
03−27
03−28
03−29
03−30
03−31
04−01
04−02
04−03
04−04
04−05
04−06
04−07
04−08
04−09
04−10
04−11
04−12
04−13
US UK
31.3 (30.6, 32.1) 11 26.5 (25.9, 27.2)
500K 39.4 (38.9, 39.9) 20 (19.5, 20.4)

log(no. of total cases)


23 (22.7, 23.3) 14.7 (14.3, 15.1)
no. of total cases

15 (14.8, 15.3) 9.8 11.6 (10.8, 12.4)


400K 13.2 (12.9, 13.6) 8.6 (8.4, 8.8)
8.5 (8.4, 8.6)
6.6 (6.4, 6.8) 8.6
300K

200K 7.4

6.2
100K

5
0
03−04
03−05
03−06
03−07
03−08
03−09
03−10
03−11
03−12
03−13
03−14
03−15
03−16
03−17
03−18
03−19
03−20
03−21
03−22
03−23
03−24
03−25
03−26
03−27
03−28
03−29
03−30
03−31
04−01
04−02
04−03
04−04
04−05
04−06
04−07
04−08
04−09
04−10
04−11
04−12
04−13

03−07
03−08
03−09
03−10
03−11
03−12
03−13
03−14
03−15
03−16
03−17
03−18
03−19
03−20
03−21
03−22
03−23
03−24
03−25
03−26
03−27
03−28
03−29
03−30
03−31
04−01
04−02
04−03
04−04
04−05
04−06
04−07
04−08
04−09
04−10
04−11
04−12
04−13
Germany Spain (red) and France (blue)
120K 28.2 (27.6, 28.9)
21.4 (19.2, 23.8) 150K
46 (44.1, 47.9)
no. of total cases

no. of total cases

96K 14.3 (14.1, 14.6)


8.6 (8.4, 8.7) 120K
4.1 (4, 4.2)
72K
90K

48K 60K

24K 30K

0 0
03−02
03−03
03−04
03−05
03−06
03−07
03−08
03−09
03−10
03−11
03−12
03−13
03−14
03−15
03−16
03−17
03−18
03−19
03−20
03−21
03−22
03−23
03−24
03−25
03−26
03−27
03−28
03−29
03−30
03−31
04−01
04−02
04−03
04−04
04−05
04−06
04−07
04−08
04−09
04−10
04−11
04−12
04−13

1 2 3 4 5 6 7 8 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41

days

Figure 1: Output of plot.segmented19() function to display results: observed


times series (circles) and fitted piecewise trends for some countries.

We conclude this brief report by trying to summarize the piecewise trends of different
countries by means of the average annual per cent change, i.e. an average of the slopes
weighted by the corresponding interval width (Muggeo, 2010). To this aim, we use the
function aapc() available in the package segmented, and report results just for Italy

> aapc(oIta, wrong.se = FALSE)


Est. St.Err CI(95%).l CI(95%).u
0.1432644197 0.0008906379 0.1415188015 0.1450100379

Thus the piecewise linear trend, for Italy, could be summarized by an average growth
rate of 15.4%(= e0.143 − 1) over the considered period. Similar estimates could be obtained
and contrasted for the other countries.

References
Muggeo, V.M.R. (2003). Estimating regression models with unknown break-points.
Statistics in Medicine 22, 3055–3071.

Muggeo, V.M.R. (2008). Segmented: an R package to fit regression models with broken-
line relationships. Rnews 8, 20–25.

Muggeo, V.M.R. (2010). Comment on “estimating average annual per cent change in
trend analysis” by Clegg et al., Stat Med 2009. Statistics in Medicine 29, 1958–1960.

Muggeo, V.M.R. & Porcu, M. (2020). La curva dei contagiati da covid-


19: la ricerca del punto di svolta. URL https://www.neodemos.info/articoli/
la-curva-dei-contagiati-da-covid-19-la-ricerca-del-punto-di-svolta/.

View publication stats

You might also like