Last updated on 2021-02-21
by John Blischak, Alison Hill
The goal of reproducible research is to tie specific instructions to data analysis and experimental data so that scholarship can be recreated, better understood and verified. Packages in R for this purpose can be split into groups for: literate programming, pipeline toolkits, package reproducibility, project workflows, code/data formatting tools, format convertors, and object caching.
The maintainers gratefully acknowledge
Ben Marwick, and
for their useful feedback and contributions.
If you would like to recommend a package to be included in this
task view, please open a
The primary way that R facilitates reproducible research is using a document that is a combination of content and data analysis code. The
Sweave function (in the base R utils package) and the knitr package can be used to blend the subject matter and R code so that a single document defines the content and the analysis. The brew and R.rsp packages contain alternative approaches to embedding R code into various markups.
The resources for literate programming are best organized by the document type/markup language:
Sweave and knitr can process LaTeX files. lazyWeave can create LaTeX documents from scratch.
The knitr and rmarkdown packages (along with
pandoc) can be used to create slides
using the LaTeX beamer class.
Object Conversion Functions:
summary tables/statistics: Hmisc, NMOF, papeR, quantreg, rapport, reporttools, sparktex, tables,xtable, ztable
tables/cross-tabulations: Hmisc, lazyWeave, knitLatex, knitr, reporttools, ztable
graphics: animation, Hmisc,
grDevices:::pictex, sparktex, tikzDevice
statistical models/methods: apsrtable, memisc, quantreg, rms, stargazer, suRtex, texreg, xtable, ztable
bibtex: bibtex and RefManageR
others: latex2exp converts LaTeX equations to plotmath expressions.
Hmisc contains a function to correctly escape special characters. Standardized exams can be created using the exams package.
The knitr package can process HTML files directly.
Sweave can also work with HTML by way of the R2HTML package. lazyWeave can create HTML format documents from scratch.
For HTML slides, a combination of the knitr and
rmarkdown packages (along with pandoc) can be used to create slides
reveal.js, Slidy, or remark.js (from the
Object Conversion Functions:
summary tables/statistics: stargazer
tables/cross-tabulations: DT, flextable, formattable, htmlTable, HTMLUtils, hwriter, knitr, lazyWeave, SortableHTMLTables, texreg, ztable
statistical models/methods: rapport, stargazer, xtable
others: knitcitations, RefManageR
Miscellaneous Tools: htmltools has various tools for working with HTML. tufterhandout for creating Tufte-style handouts
The knitr package can process markdown files without assistance. The packages markdown and rmarkdown have general tools for working with documents in this format. lazyWeave can create markdown format documents from scratch.
Also, the ascii package can write R objects to the AsciiDoc format.
Object Conversion Functions:
Miscellaneous Tools: tufterhandout for creating Tufte-style handouts. kfigr allows for figure indexing in markdown documents.
The officer (formerly ReporteRs and before that R2DOCX) package can create docx and pptx files.
R2wd (windows only) can also create Word documents from scratch and R2PPT (also windows only) can create PowerPoint slides. The rtf package does the same for Rich Text Format documents.
The openxlsx package creates xlsx files.
The readODS package can read and write Open Document Spreadsheets.
Object Conversion Functions:
Pipeline toolkits help maintain and verify reproducibility. They synchronize computational output with the underlying code and data, and they tell the user when everything is up to date. In other words, they provide concrete evidence that results are re-creatable from the starting materials, and the data analysis project does not need to rerun from scratch. The targets package is such a pipeline toolkit. It is similar to GNU Make, but it is R-focused.
- drake: A general-purpose computational engine for data analysis, drake rebuilds intermediate data objects when their dependencies change, and it skips work when the results are already up to date.
- flowr: This framework allows you to design and implement complex pipelines, and deploy them on your institution's computing cluster.
- repo: A data manager meant to avoid manual storage/retrieval of data to/from the file system.
- targets: As a pipeline toolkit for Statistics and data science in R, the 'targets' package brings together function-oriented programming and 'Make'-like declarative workflows.
R also has tools for ensuring that specific packages versions can be required for analyses. checkpoint, groundhog, rbundler, packrat and renv install packages required for a project to a local archive as they existed at a specified point in time. This allows specific package versions to be maintained over time and different users.
The miniCRAN and switchr packages facilitate the creation of local CRAN-like repositories and their simultaneous operation.
liftr allows to containerize an R Markdown document using Docker by providing additional metadata.
Successfully completing a data analysis project often requires much more than statistics and visualizations.
Efficiently managing the code, data, and results as the project matures helps reduce stress and errors.
The following "workflow" packages assist the R programmer by managing project infrastructure and/or facilitating a reproducible workflow.
Workflow utility packages provide single-use functions to implement project infrastructure or solve a specific problem.
As a typical example,
usethis::use_git() initializes a Git repository, ignores common R files, and commits all project files.
- cabinets: Creates project specific directory and file templates that are written to a .Rprofile file.
- here: Constructs paths to your project's files.
- prodigenr: Create a project directory structure, along with typical files for that project.
- RepoGenerator: Generates a project and repo for easy initialization of a GitHub repo for R workshops.
- rrtools (GitHub only): Instructions, templates, and functions for making a basic compendium suitable for doing reproducible research with R.
- starters (GitHub only): Setting up R project directories for teaching, presenting, analysis, package development can be a pain. starters shortcuts this by creating folder structures and setting good defaults for you.
- usethis: Automate package and project setup tasks that are otherwise performed manually.
Workflow framework packages provide an organized directory structure and helper functions to assist during the development of the project.
As a typical example,
ProjectTemplate::create.project() creates an organized setup with many subdirectories,
ProjectTemplate::run.project() executes each R script that is saved in the
- exreport: Analysis of experimental results and automatic report generation in both interactive HTML and LaTeX.
- madrat: Provides a framework which should improve reproducibility and transparency in data processing. It provides functionality such as automatic meta data creation and management, rudimentary quality management, data caching, work-flow management and data aggregation.
- makeProject: This package creates an empty framework of files and directories for the "Load, Clean, Func, Do" structure described by Josh Reich.
- orderly: Order, create and store reports from R.
- projects: Provides a project infrastructure with a focus on manuscript creation.
- ProjectTemplate: Provides functions to automatically build a directory structure for a new R project. Using this structure, 'ProjectTemplate' automates data loading, preprocessing, library importing and unit testing.
- reportfactory: Provides an infrastructure for handling multiple R Markdown reports, including automated curation and time-stamping of outputs, parameterisation and provision of helper functions to manage dependencies.
- represtools: Reproducible research tools automates the creation of an analysis directory structure and work flow. There are R markdown skeletons which encapsulate typical analytic work flow steps. Functions will create appropriate modules which may pass data from one step to another.
- RSuite: Supports safe and reproducible solutions development in R. It will help you with environment separation per project, dependency management, local packages creation and preparing deployment packs for your solutions.
- tinyProject: Creates useful files and folders for data analysis projects and provides functions to manage data, scripts and output files.
- worcs: Create reproducible and transparent research projects in 'R'. This package is based on the Workflow for Open Reproducible Code in Science (WORCS), a step-by-step procedure based on best practices for Open Science.
- workflowr: Provides a workflow for your analysis projects by combining literate programming ('knitr' and 'rmarkdown') and version control ('Git', via 'git2r') to generate a website containing time-stamped, versioned, and documented results.
- zoon: Reproducible and remixable species distribution modelling.
formatR, highlight, and highr can be used to color and/or format R code.
Packages humanFormat, lubridate, prettyunits, and rprintf have functions to better format data.
pander can be used for rendering R objects into Pandoc's markdown. knitr has the function pandoc that can call an installed version of Pandoc to convert documents between formats such as Markdown, HTML, LaTeX, PDF and Word. tth facilitates TeX to HTML/MathML conversions.
Object Caching Packages
Sweave and knitr it can be advantageous to cache the results of time consuming code chunks if the document will be re-processed (i.e. during debugging). knitr facilitates object caching and the Bioconductor package weaver can be used with
Non-literate programming packages to facilitating caching/archiving are R.cache, archivist, storr, and trackr.
- GitHub repository for editing this task view
- Sweave: Dynamic Generation of Statistical Reports Using Literate Data Analysis
- knitr: Elegant, flexible and fast dynamic report generation with R
- Bioconductor package: weaver
- Wikipedia: Literate Programming
- Harrell: Reproducible Research (Biostatistics for Biomedical Research)
- Koenker, Zeileis: On Reproducible Econometric Research
- Peng: Reproducible Research and Biostatistics
- Rossini, Leisch: Literate Statistical Practice
- Baggerly, Coombes: Deriving Chemosensitivity from Cell Lines: Forensic Bioinformatics and Reproducible Research in High-Throughput Biology
- Leisch: Sweave, Part I: Mixing R and LaTeX
- Leisch: Sweave, Part II: Package Vignettes
- Betebenner: Using Control Structures with Sweave
- Garbade, Burgard: Using R/Sweave in Everyday Clinical Practice
- Gorjanc: Using Sweave with LyX
- Lecoutre: The R2HTML Package
- List of pipeline toolkits
- Computational Environments and Reproducibility
- Bryan: Project-oriented workflow
- rOpenSci: Reproducibility in Science
- Temple Lang, Gentleman: Statistical Analyses and Reproducible Research
- Marwick, Boettiger, Mullen: Packaging Data Analytical Work Reproducibly Using R (and Friends)
- Xie: Write An R Package Using Literate Programming Techniques