An implementation of the time-series Susceptible-Infected-Recovered (TSIR) model using a number of different fitting options for infectious disease time series data. The manuscript based on this package can be found here < https://doi:10.1371/0185528>. The method implemented here is described by Finkenstadt and Grenfell (2000)
This R package fits and runs the TSIR model using a number of different options, both Bayesian and frequentist.
You can currently install tsiR via devtools (although the package will soon be put on CRAN) which will install all dependencies, except rjags which must be installed manually (https://cran.r-project.org/web/packages/rjags/index.html).
Load the package and dependencies kernlab, ggplot2, reshape2, and grid, The first dependency is for the Gaussian Process regression, while the others are for plotting.
We include 20 data sets form the UK each with 20 years of measles data (biweekly data, i.e. infectious period of two weeks (
IP = 2) in the
We require that the data must be a
data.frame with column names 'time','cases', 'births', and 'pop'. You can load these four vectors into the function
example <- tsiRdata(time = , cases = , births = , pop = , IP = )) where
IP= designates the generation time to interpolate the data on. Data can be plotted using
plotcases. If using any of the plot functions, the time column of the data.frame must be recognized by ggplot2.
A table of the main functions can be found below. Individual options can be found via
?function_name in the R console.
||Using simple regression: reconstructs susceptibles and estimates parameters|
||Using the London data: walks one through the fitting options of
||Using simple regression: optimizes the threshold parameter for sparse data|
||Using MCMC: reconstructions susceptibles and estimates parameters|
||Using MCMC: reconstructions susceptibles, estimates parameters, and runs the simulation|
||Plots just the cases data|
||Plots the data versus the simulations|
||Plots the (interpolated) cases, births, and population time series|
||Plots the fitted regressions, reporting, susceptible reconstruction, estimated parameters, and the fit diagonstics|
||Using simple regression: reconstructions suceptibles, estimates parameters, and runs the simulation|
||Runs the simulation taking in the output from
||Interpolates cases, births, and pop vectors to the generation time of the disease|
||Complete biweekly (IP=2) data sets from twenty UK cities|
In the fitting functions, a number of options such as fixing alpha (
alpha = 0.97, for example) and Sbar (
sbar = 0.025, for example) are provided. Additionally, contact can be seasonal, driven by the school term calendar, or a single non time-varying parameter (
seasonality = "standard", "schoolterm" or "none") respectively. These functions work for data with any fixed generation time, although in the examples we will use biweekly data
(IP = 2).
Several examples follow, although they do not completely exhaust the fitting options.
After loading the dependencies, load and plot the data:
namesLondonMeas <- twentymeasheadplotdata
The default settings for
runtsir (and all functions) can be accessed through
?runtsir in the R console. Based on measles, everything is written in modulo two weeks, however this can be changed by setting
IP= to the number of weeks between each time step. The minimal necessary input for a biweekly data set is thus simply
data =. Here we run a simple example using the default options: cumulative cases on the x axis, a Gaussian regression (default) between cumulative cases and births, estimating both
alpha, estimating a 26 (52/IP) point contact parameter, and we run the forward simulation completely forward. We specify the option to draw the next time step from a negative binomial distribution and we specificy a Gaussian family (default) with an identity link (default) function in the GLM. Any family and link can be inputted, however, the options are essentially
quasipoisson, poisson, gaussian where
poisson/ quassipoisson take
family=gaussian takes either a
Options for the GLM family
Additionally, we specify to do 100 simulations. Other regression types can be specified using
regtype= where the options are
lm, lowess, loess, spline for a linear regression, a lowess, a loess, and a spline regression with 2.5 degrees freedom.
LondonRes <- runtsir
We plot the full diagnostic using the
Walking through the eight plots: the first gives the regression model
Yhat between cumulative births and cases, the second gives the reporting rate over time (for
regtype = 'lm' this is a constant), the third gives the residuals and then the reconstructed susceptible dynamics, the fourth is the profiled average susceptibility, the fifth is the repeating seasonal contact parameter with 95% confidence / credible intervals, the sixth is the predicted-observed regression, and the final two are the forward predictions (red) and the data (blue). The top forward prediction shows each of the
nsim individually, while the bottom shows the mean with 95% confidence intervals.
We can run this same model in a Bayesian framework with schooltime forcing. We can specify almost all the same functions as the
runtsir functions, but also number of iterations, chains, burnins, adapts, etc. This example takes about 30-45 minutes to run. Run time scales with the number of parameters, thus
seasonality = 'standard' will take longer. We can plot the traces and posteriors.
requireparms <- mcmcestparsnamesplot
We can now enter these parameters into the
simulatetsir function. Here we will use a Poisson distribution to draw the next generation and do twenty simulations. Note that this model gives similar results, and 95% credible intervals are constructed for the parameter estimates themselves.
sim <- simulatetsirplotres
Alternatively, we could run the parameter estimation and forward simulations all in one, by calling the
Often times however, the population size is not large enough to support a persistent epidemic like in London. These epidemics are often sporadic and may or may not be spatially coupled. An example of this is the city of Mold.
MoldMeas <- twentymeasplotdata
Note that while there are some small case counts, there are essentially only outbreaks every year or two. These epidemics cannot be predicted fully forward, but rather must be done epidemic-ahead (Caudron et al 2015). In order to do this, a threshold parameter must be set which indicts when a new epidemic has been sparked. This threshold parameter can be set arbitrarily, i.e.
runtsir(data = MoldMeas, epidemics = 'break', threshold = 10) for example. Alternatively, the threshold can be fit by maximizing Rsquared predictions using the
maxthreshold function. Epidemic-ahead predictions can then be accomplished using the
epidemics = break command in the
First, parameter estimates must be acquired using either
mcmcestpars. While not necessary, here we fix
alpha = 0.97. The average number of susceptibles could be fixed the same way, i.e.
parms <- estparstau <- maxthresholdMoldRes <- simulatetsirplotres
Note that all of these examples used biweekly data,
IP = 2. If the data is aggregated by a different number of weeks, you must specificy this globally and consistently.
Please email me at adbecker at princeton dot edu with any issues or suggestions. If you find the package useful, please cite us. The bibtex file can be called by
citation('tsiR') in the R console.