Compute, Handle, Plot and Model Incidence of Dated Events

Provides functions and classes to compute, handle and visualise incidence from dated events for a defined time interval. Dates can be provided in various standard formats. The class 'incidence' is used to store computed incidence and can be easily manipulated, subsetted, and plotted. In addition, log-linear models can be fitted to 'incidence' objects using 'fit'. This package is part of the RECON (< http://www.repidemicsconsortium.org/>) toolkit for outbreak analysis.




To install the current stable, CRAN version of the package, type:

install.packages("incidence")

To benefit from the latest features and bug fixes, install the development, github version of the package using:

devtools::install_github("reconhub/incidence")

Note that this requires the package devtools installed.


What does it do?

The main features of the package include:

  • incidence: compute incidence from dates in various formats; any fixed time interval can be used; the returned object is an instance of the (S3) class incidence.

  • plot: this method (see ?plot.incidence for details) plots incidence objects, and can also add predictions of the model(s) contained in an incidence_fit object (or a list of such objects).

  • fit: fit one or two exponential models (i.e. linear regression on log-incidence) to an incidence object; two models are calibrated only if a date is provided to split the time series in two (argument split); this is typically useful to model the two phases of exponential growth, and decrease of an outbreak; each model returned is an instance of the (S3) class incidence_fit, each of which contains various useful information (e.g. growth rate r, doubling/halving time, predictions and confidence intervals).

  • fit_optim_split: finds the optimal date to split the time series in two, typically around the peak of the epidemic.

  • [: lower-level subsetan of incidence objects, permiting to specify which dates and groups to retain; uses a syntax similar to matrices, i.e. x[i, j], where x is the incidence object, i a subset of dates, and j a subset of groups.

  • subset: subset an incidence object by specifying a time window.

  • pool: pool incidence from different groups into one global incidence time series.

  • as.data.frame: converts an incidence object into a data.frame containing dates and incidence values.


Resources


An overview of incidence is provided below in the worked example below. More detailed tutorials are distributed as vignettes with the package:

vignette(package = "incidence")
#> Vignettes not found

To open these, type:

vignette("overview", package="incidence")
vignette("customize_plot", package="incidence")
vignette("incidence_class", package="incidence")



The following websites are available:



Bug reports and feature requests should be posted on github using the issue system. All other questions should be posted on the RECON forum:
http://www.repidemicsconsortium.org/forum/



A quick overview

The following worked example provides a brief overview of the package's functionalities. See the vignettes section for more detailed tutorials.

This example uses the simulated Ebola Virus Disease (EVD) outbreak from the package outbreaks. We will compute incidence for various time steps, calibrate two exponential models around the peak of the epidemic, and analyse the results.

First, we load the data:

library(outbreaks)
library(ggplot2)
library(incidence)
 
dat <- ebola.sim$linelist$date.of.onset
class(dat)
#> [1] "Date"
head(dat)
#> [1] "2014-04-07" "2014-04-15" "2014-04-21" "2014-04-27" "2014-04-26"
#> [6] "2014-04-25"

We compute the weekly incidence:

i.7 <- incidence(dat, interval = 7)
i.7
#> <incidence object>
#> [5888 cases from days 2014-04-07 to 2015-04-27]
#> [5888 cases from ISO weeks 2014-W15 to 2015-W18]
#> 
#> $counts: matrix with 56 rows and 1 columns
#> $n: 5888 cases in total
#> $dates: 56 dates marking the left-side of bins
#> $interval: 7 days
#> $timespan: 386 days
plot(i.7)

incidence can also compute incidence by specified groups using the groups argument. For instance, we can compute the weekly incidence by gender:

i.7.sex <- incidence(dat, interval = 7, groups = ebola.sim$linelist$gender)
i.7.sex
#> <incidence object>
#> [5888 cases from days 2014-04-07 to 2015-04-27]
#> [5888 cases from ISO weeks 2014-W15 to 2015-W18]
#> [2 groups: f, m]
#> 
#> $counts: matrix with 56 rows and 2 columns
#> $n: 5888 cases in total
#> $dates: 56 dates marking the left-side of bins
#> $interval: 7 days
#> $timespan: 386 days
plot(i.7.sex, stack = TRUE, border = "grey")

incidence objects can be manipulated easily. The [ operator implements subetting of dates (first argument) and groups (second argument). For instance, to keep only the first 20 weeks of the epidemic:

i.7[1:20]
#> <incidence object>
#> [797 cases from days 2014-04-07 to 2014-08-18]
#> [797 cases from ISO weeks 2014-W15 to 2014-W34]
#> 
#> $counts: matrix with 20 rows and 1 columns
#> $n: 797 cases in total
#> $dates: 20 dates marking the left-side of bins
#> $interval: 7 days
#> $timespan: 134 days
plot(i.7[1:20])

Some temporal subsetting can be even simpler using subset, which permits to retain data within a specified time window:

i.tail <- subset(i.7, from = as.Date("2015-01-01"))
i.tail
#> <incidence object>
#> [1156 cases from days 2015-01-05 to 2015-04-27]
#> [1156 cases from ISO weeks 2015-W02 to 2015-W18]
#> 
#> $counts: matrix with 17 rows and 1 columns
#> $n: 1156 cases in total
#> $dates: 17 dates marking the left-side of bins
#> $interval: 7 days
#> $timespan: 113 days
plot(i.tail, border = "white")

Subsetting groups can also matter. For instance, let's try and visualise the incidence based on onset of symptoms by outcome:

i.7.outcome <- incidence(dat, 7, groups = ebola.sim$linelist$outcome)
i.7.outcome
#> <incidence object>
#> [5888 cases from days 2014-04-07 to 2015-04-27]
#> [5888 cases from ISO weeks 2014-W15 to 2015-W18]
#> [3 groups: Death, NA, Recover]
#> 
#> $counts: matrix with 56 rows and 3 columns
#> $n: 5888 cases in total
#> $dates: 56 dates marking the left-side of bins
#> $interval: 7 days
#> $timespan: 386 days
plot(i.7.outcome, stack = TRUE, border = "grey")

Groups can also be collapsed into a single time series using pool:

i.pooled <- pool(i.7.outcome)
i.pooled
#> <incidence object>
#> [5888 cases from days 2014-04-07 to 2015-04-27]
#> [5888 cases from ISO weeks 2014-W15 to 2015-W18]
#> 
#> $counts: matrix with 56 rows and 1 columns
#> $n: 5888 cases in total
#> $dates: 56 dates marking the left-side of bins
#> $interval: 7 days
#> $timespan: 386 days
identical(i.7$counts, i.pooled$counts)
#> [1] TRUE

Incidence data, excluding zeros, can be modelled using log-linear regression of the form: log(y) = r x t + b

where y is the incidence, r is the growth rate, t is the number of days since a specific point in time (typically the start of the outbreak), and b is the intercept.

Such model can be fitted to any incidence object using fit. Of course, a single log-linear model is not sufficient for modelling our time series, as there is clearly an growing and a decreasing phase. As a start, we can calibrate a model on the first 20 weeks of the epidemic:

plot(i.7[1:20])

early.fit <- fit(i.7[1:20])
early.fit
#> <incidence_fit object>
#> 
#> $lm: regression of log-incidence over time
#> 
#> $info: list containing the following items:
#>   $r (daily growth rate):
#> [1] 0.03175771
#> 
#>   $r.conf (confidence interval):
#>           2.5 %     97.5 %
#> [1,] 0.02596229 0.03755314
#> 
#>   $doubling (doubling time in days):
#> [1] 21.8261
#> 
#>   $doubling.conf (confidence interval):
#>         2.5 %   97.5 %
#> [1,] 18.45777 26.69823
#> 
#>   $pred: data.frame of incidence predictions (20 rows, 5 columns)

The resulting objects can be plotted, in which case the prediction and its confidence interval is displayed:

plot(early.fit)

However, a better way to display these predictions is adding them to the incidence plot using the argument fit:

plot(i.7[1:20], fit = early.fit)

In this case, we would ideally like to fit two models, before and after the peak of the epidemic. This is possible using the following approach, in which the best possible splitting date (i.e. the one maximizing the average fit of both models), is determined automatically:

best.fit <- fit_optim_split(i.7)
best.fit
#> $df
#>         dates   mean.R2
#> 1  2014-08-04 0.7650406
#> 2  2014-08-11 0.8203351
#> 3  2014-08-18 0.8598316
#> 4  2014-08-25 0.8882682
#> 5  2014-09-01 0.9120857
#> 6  2014-09-08 0.9246023
#> 7  2014-09-15 0.9338797
#> 8  2014-09-22 0.9339813
#> 9  2014-09-29 0.9333246
#> 10 2014-10-06 0.9291131
#> 11 2014-10-13 0.9232523
#> 12 2014-10-20 0.9160439
#> 13 2014-10-27 0.9071665
#> 
#> $split
#> [1] "2014-09-22"
#> 
#> $fit
#> $fit$before
#> <incidence_fit object>
#> 
#> $lm: regression of log-incidence over time
#> 
#> $info: list containing the following items:
#>   $r (daily growth rate):
#> [1] 0.02982209
#> 
#>   $r.conf (confidence interval):
#>           2.5 %     97.5 %
#> [1,] 0.02608945 0.03355474
#> 
#>   $doubling (doubling time in days):
#> [1] 23.24274
#> 
#>   $doubling.conf (confidence interval):
#>         2.5 %  97.5 %
#> [1,] 20.65721 26.5681
#> 
#>   $pred: data.frame of incidence predictions (25 rows, 5 columns)
#> 
#> $fit$after
#> <incidence_fit object>
#> 
#> $lm: regression of log-incidence over time
#> 
#> $info: list containing the following items:
#>   $r (daily growth rate):
#> [1] -0.01016191
#> 
#>   $r.conf (confidence interval):
#>            2.5 %       97.5 %
#> [1,] -0.01102526 -0.009298561
#> 
#>   $halving (halving time in days):
#> [1] 68.21031
#> 
#>   $halving.conf (confidence interval):
#>         2.5 %   97.5 %
#> [1,] 62.86899 74.54349
#> 
#>   $pred: data.frame of incidence predictions (32 rows, 5 columns)
#> 
#> 
#> $plot

plot(i.7, fit = best.fit$fit)



Contributors (by alphabetic order):

See details of contributions on:
https://github.com/reconhub/incidence/graphs/contributors

Contributions are welcome via pull requests.

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

Maintainer: Thibaut Jombart ([email protected])

News

incidence 1.5.4 (2019-01-15)

BUG FIX

MISC

incidence 1.5.3 (2018-12-07)

BUG FIX

MISC

  • demo("incidence-demo" package = "incidence") has been updated to show use of custom colors.

incidence 1.5.2 (2018-11-30)

BUG FIX

  • print.incidence() will now print isoweeks even if the $interval element is "week".

MISC

  • subset.incidence() will now give a more informative error message when the user specifies a group that does not exist.
  • demo('incidence-demo', package = 'incidence') now shows plotting with show_cases = TRUE.
  • In the the case where a date is accidentally mis-typed leading to a gross mis-calculation of the date range (i.e. 2018 is mis-typed as 3018), a warning will be issued. The default threshold is set at 18262 days (50 years), but the user can define their own threshold by setting the incidence.max.days option

incidence 1.5.1 (2018-11-14)

BUG FIX

  • Two bugs regarding the ordering of groups when the user specifies a factor/ column order have been fixed. This affects plot.incidence(), incidence(), and as.data.frame.incidence() For details, see https://github.com/reconhub/incidence/issues/79

incidence 1.5.0 (2018-11-01)

NEW FUNCTIONS

  • group_names() allows the user to retrieve and set the group names.
  • get_timespan() returns the $timespan element.
  • get_n() returns the $n element.
  • dim(), nrow(), and ncol() are now available for incidence objects, returning the dimensions of the number of bins and the number of groups.

NEW FEATURES

DOCUMENTATION UPDATES

  • An example of EPIET-style bars for small data sets has been added to the plot customisation vignette by @jakobschumacher. See https://github.com/reconhub/incidence/pull/68 for details.
  • The incidence class vignette has been updated to use the available accessors.

BUG FIX

  • estimate_peak() no longer fails with integer dates
  • incidence() no longer fails when providing both group information and a first_date or last_date parameter that is inside the bounds of the observed dates. Thanks to @mfaber for reporting this bug. See https://github.com/reconhub/incidence/issues/70 for details.

MISC

  • code has been spread out into a more logical file structure where the internal_checks.R file has been split into the relative components.
  • A message is now printed if missing observations are present when creating the incidence object.

incidence 1.4.1 (2018-08-24)

BEHAVIORAL CHANGES

  • The $lm field of the incidence_fit class is now named $model to clearly indicate that this can contain any model.

NEW FEATURES

  • incidence() will now accept text-based intervals that are valid date intervals: day, week, month, quarter, and year.

  • incidence() now verifies that all user-supplied arguments are accurate and spelled correctly.

  • fit_optim_split() now gains a separate_split argument that will determine the optimal split separately for groups.

  • A new class, incidence_fit_list, has been implemented to store and summarise incidence_fit objects within a nested list. This is the class returned by in the $fit element of fit_optim_split().

NEW FUNCTIONS

  • bootstrap() will bootstrap epicurves stored as incidence objects.

  • find_peak() identifies the peak date of an incidence objects.

  • estimate_peak() uses bootstrap to estimate the peak time of a partially observed outbreak.

  • get_interval() will return the numeric interval or several intervals in the case of intervals that can't be represented in a fixed number of days (e.g. months).

  • get_dates() returns the dates or counts of days on the right, center, or left of the interval.

  • get_counts() returns the matrix of case counts for each date.

  • get_fit() returns a list of incidence_fit objects from an incidence_fit_list object.

  • get_info() returns information stored in the $info element of an incidence_fit/incidence_fit_list object.

DOCUMENTATION

  • The new vignette incidence_fit_class instructs the user on how incidence_fit and incidence_fit_list objects are created and accessed.

DEPRECATED

  • In the incidence() function, the iso_week parameter is deprecated in favor of standard for a more general way of indicating that the interval should start at the beginning of a valid date timeframe.

BUG FIXES

  • The $timespan item in the incidence object from Dates was not type-stable and would change if subsetted. A re-working of the incidence constructor fixed this issue.

  • Misspelled or unrecgonized parameters passed to incidence() will now cause an error instead of being silently ignored.

  • Plotting for POSIXct data has been fixed.

incidence 1.3.1 (2018-06-11)

BUG FIXES

  • tweak of the plotting of incidence object to avoid conflicts with additional geoms such as geom_ribbon, now used in projections::add_projections.

incidence 1.3.0 (2018-06-01)

BUG FIXES

  • fixed issue caused by new version of ggplot2

NEW FEATURES

  • the argument n_breaks has been added to plot.incidence, to specify the ideal number of breaks for the date legends; will work with ggplot2 > 2.2.1

  • added the internal function make_iso_weeks_breaks to generate dates and labels for date x-axis legends using ISO weeks

  • added a function add_incidence_fit, which can be used for adding fits to epicurves in a piping-friendly way

  • added a function cumulate, which computes cumulative incidence and returns an incidence object

incidence 1.2.1 (2017-10-19)

BUG FIXES

  • fixed issues in testing incidence plots by employing vdiffr package.

incidence 1.2.0 (2017-04-03)

NEW FEATURES

  • new generic as.incidence, to create incidence objects from already computed incidences. Methods for: matrix, data.frame, numeric vectors

  • better processing of input dates, including: automatic conversion from characters, issuing errors for factors, and silently converting numeric vectors which are essentially integers (issuing a warning otherwise)

  • new vignette on conversions to and from incidence objects

  • new tests

BUG FIXES

  • fixed issues caused by variables which changed names in some datasets of the outbreaks package, used in the documentation

  • disabled by default the isoweeks in incidence; this part of the code will break with changes made in the devel version of ggplot2, which is now required by plotly

incidence 1.1.2 (2017-03-24)

BUG FIXES

  • it is now possible to subset an incidence object based on Date dates using numeric values, which are interpreted as number of intervals since the first date (origin = 1)

  • NAs are no longer removed from the input dates, as it would cause mismatches with grouping factors.

incidence 1.1.1 (2017-02-15)

BUG FIXES

  • adapting to new names of datasets in outbreaks: ebola.sim -> ebola_sim and ebola.sim.clean -> ebola_sim_clean

incidence 1.1.0 (2016-12-13)

NEW FEATURES

  • add an argument iso_week to incidence.Date() and incidence.POSIXt() to support ISO week-based incidence when computing weekly incidence.

  • add an argument labels_iso_week to plot.incidence() to label x axis tick marks with ISO weeks when plotting ISO week-based weekly incidence.


incidence 1.0.1 (2016-11-23)

NEW FEATURES

  • The README.Rmd / README.md now contains information about various websites for incidence as well as guidelines for posting questions on the RECON forum.

  • incidence now has a dedicated website http://www.repidemicsconsortium.org/incidence/ generated with pkgdown

MINOR IMPROVEMENTS

  • Vignettes titles are now correctly displayed on CRAN (they read 'Vignette title').

incidence 1.0.0 (2016-11-03)

First release of the incidence package on CRAN!

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("incidence")

1.7.0 by Zhian N. Kamvar, 6 months ago


http://www.repidemicsconsortium.org/incidence/


Report a bug at http://github.com/reconhub/incidence/issues


Browse source code at https://github.com/cran/incidence


Authors: Thibaut Jombart [aut] , Zhian N. Kamvar [aut, cre] , Rich FitzJohn [aut] , Jun Cai [ctb] , Sangeeta Bhatia [ctb] , Jakob Schumacher [ctb] , Juliet R.C. Pulliam [ctb]


Documentation:   PDF Manual  


MIT + file LICENSE license


Imports ggplot2, aweek

Suggests magrittr, outbreaks, testthat, vdiffr, covr, knitr, rmarkdown, scales, cowplot


Imported by EpiEstim, projections.

Suggested by earlyR, epitrix, outbreaks.


See at CRAN