Estimating Disease Prevalence from Registry Data

Estimates disease prevalence for a given index date using existing registry data extended with Monte Carlo simulations following the method of Crouch et al (2014) .

CRAN_Status_Badge Build Status Coverage Status DOI

rprev estimates disease prevalence at a specified index date from incomplete registry data. It is designed to be used when estimates of point prevalence from registry data are required, but the registry hasn't been running for sufficiently long to count the number of prevalent cases. Monte Carlo simulation techniques are used to simulate incident cases in years for which incidence data is unavailable, and then estimate survival at the index date.

Prevalence arises from two independent stochastic processes: disease incidence and survival. Default models are provided that model incidence as a homogeneous Poisson process and survival as a standard parameteric distribution, although both of these models can be user specified for further control. See the user_guide vignette for more details about the implementation, and the original publication for details of the algorithm, available at


To install from CRAN, simply use install.packages('rprev'), while the latest development version can be installed from GitHub using devtools::install_github("stulacy/rprev-dev").


rprev 1.0.2

Hotfix to address new sample implementation forthcoming in R 3.6.0. Currently the warning is being suppressed, but the unit tests will be updated once these changes have been implemented in stable R.

rprev 1.0.1

Minor documentation fixes, with the main one being correcting the name of the Diagnostics vignette.

rprev 1.0.0

Major overhaul to the API with non-backwards compatible changes. The primary change is that both the incidence and survival models are now specifiable, in contrast to the previous version which forced a homogeneous Poisson process incidence model and a Weibull survival model that uses age and sex as covariates. These models are retained as defaults, but the user can provide custom objects for both these processes, as documented in the User Guide.

A number of small basic functions mostly relating to diagnostics have been removed to condense the API.

See the User Guide vignette for examples of the new parameterisation of prevalence and general documentation.

rprev 0.2.3

Renamed raw_incidence to yearly_incidence

This function has been renamed to be more descriptive of what the function actually does, and reparameterised to allow the user to specify the ending date of the time interval of interested instead. raw_incidence is still included but it throws a deprecated warning and suggests the use of yearly_incidence

Renamed determine_registry_years to determine_yearly_limits

The original function name isn't very descriptive for what it does (provides the yearly end points of a specific time interval) and so have renamed it to better reflect its purpose. determine_yearly_limits has a slighlty different argument list to determine_registry_years to allow for the specification of the closing date in the interval rather than the opening.


  • Plot methods now return ggplot objects, allowing for easier manual tweaking
  • prevalence no longer runs the simulation when there is more registry data available than needed to estimate N-year prevalence
  • prevalence no longer requires a population size as an argument. Absolute prevalence is always calculated, with relative rates provided if population size is specified
  • user_manual: Updated to include a link to the specific webpage where the ONS data set is obtained from and improved formatting
  • summary.prevalence correctly displays posterior age distributions of simulated cases and now displays the prevalence estimates themselves
  • unit tests updated to reflect the above changes
  • vignette updated to reflect the above changes

rprev 0.2.2

Bug hotfix.

rprev 0.2.1

The posterior age distribution, returned from prevalence as in the simulated object, is now stored in the format of a nested list rather than a matrix as before. The first dimension of the list corresponds to each sex (if applicable), the next indexing the number of years of simulated cases, and the final corresponds to the bootstrap samples. The final level comprises a vector holding the ages of the simulated cases which are still contributing to prevalence at the index date from the corresponding sex, year, and bootstrap sample number.

rprev 0.2.0

Minor bug fixes and a slight change to the parameterisation of prevalence:

  • In prevalence, prevalence_counted, and prevalence_simulated, the user specifies the index date at which to estimate prevalence, rather than having it inferred from the data
  • max_yearly_incidence has been removed as a parameter from both prevalence and prevalence_simulated as it can be calculated from the supplied data
  • prevalence per 100K estimates now have the confidence intervals the correct way around
  • unit tests for prevalence functions don't rely on cached results any longer. This has helped to reduce the size of the source code from 25MB to 2MB.

rprev 0.1.0

First release of the package, working with all features necessary to provide estimates of point prevalence. Issues which we'd like to address in future releases are:

  • Allow for other incidence processes than homogeneous Poisson
  • Enable more flexibility in survival modelling, rather than Weibull regression with linear covariate effects
  • Allow for the inclusion of more covariates in both the survival modelling, and the marking of the incidence process

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.


1.0.5 by Stuart Lacy, 7 months ago

Browse source code at

Authors: Stuart Lacy [cre, aut] , Simon Crouch [aut] , Stephanie Lax [aut]

Documentation:   PDF Manual  

GPL-2 license

Imports data.table, dplyr, ggplot2, lazyeval, lubridate, magrittr, tidyr

Depends on survival

Suggests flexsurv, flexsurvcure, knitr, rmarkdown, rms, testthat, covr

See at CRAN