Last updated on 2020-12-16 by John Blischak, Alison Hill
The goal of reproducible research is to tie specific instructions to data analysis and experimental data so that scholarship can be recreated, better understood and verified. Packages in R for this purpose can be split into groups for: literate programming, pipeline toolkits, package reproducibility, project workflows, code/data formatting tools, format convertors, and object caching.
The maintainers gratefully acknowledge Anna Krystalli, Max Kuhn, Will Landau, Ben Marwick, and Daniel Nüst for their useful feedback and contributions.
If you would like to recommend a package to be included in this task view, please open a GitHub Issue to discuss.
The primary way that R facilitates reproducible research is using a document that is a combination of content and data analysis code. The Sweave
function (in the base R utils package) and the knitr package can be used to blend the subject matter and R code so that a single document defines the content and the analysis. The brew and R.rsp packages contain alternative approaches to embedding R code into various markups.
The resources for literate programming are best organized by the document type/markup language:
Both Sweave
and knitr can process LaTeX files. lazyWeave can create LaTeX documents from scratch.
The knitr and rmarkdown packages (along with pandoc) can be used to create slides using the LaTeX beamer class.
Object Conversion Functions:
grDevices:::pictex
, sparktex, tikzDevice
Miscellaneous Tools
The knitr package can process HTML files directly. Sweave
can also work with HTML by way of the R2HTML package. lazyWeave can create HTML format documents from scratch.
For HTML slides, a combination of the knitr and rmarkdown packages (along with pandoc) can be used to create slides using ioslides, reveal.js, Slidy, or remark.js (from the xaringan package).
Object Conversion Functions:
Miscellaneous Tools: htmltools has various tools for working with HTML. tufterhandout for creating Tufte-style handouts
The knitr package can process markdown files without assistance. The packages markdown and rmarkdown have general tools for working with documents in this format. lazyWeave can create markdown format documents from scratch. Also, the ascii package can write R objects to the AsciiDoc format.
Object Conversion Functions:
Miscellaneous Tools: tufterhandout for creating Tufte-style handouts. kfigr allows for figure indexing in markdown documents.
The officer (formerly ReporteRs and before that R2DOCX) package can create docx and pptx files. R2wd (windows only) can also create Word documents from scratch and R2PPT (also windows only) can create PowerPoint slides. The rtf package does the same for Rich Text Format documents. The openxlsx package creates xlsx files. The readODS package can read and write Open Document Spreadsheets.
Object Conversion Functions:
Pipeline toolkits help maintain and verify reproducibility. They synchronize computational output with the underlying code and data, and they tell the user when everything is up to date. In other words, they provide concrete evidence that results are re-creatable from the starting materials, and the data analysis project does not need to rerun from scratch. The drake package is such a pipeline toolkit. It is similar to GNU Make, but it is R-focused.
R also has tools for ensuring that specific packages versions can be required for analyses. checkpoint, groundhog, rbundler, packrat and renv install packages required for a project to a local archive as they existed at a specified point in time. This allows specific package versions to be maintained over time and different users. The miniCRAN and switchr packages facilitate the creation of local CRAN-like repositories and their simultaneous operation. liftr allows to containerize an R Markdown document using Docker by providing additional metadata.
Successfully completing a data analysis project often requires much more than statistics and visualizations. Efficiently managing the code, data, and results as the project matures helps reduce stress and errors. The following "workflow" packages assist the R programmer by managing project infrastructure and/or facilitating a reproducible workflow.
Workflow utility packages provide single-use functions to implement project infrastructure or solve a specific problem.
As a typical example, usethis::use_git()
initializes a Git repository, ignores common R files, and commits all project files.
Workflow framework packages provide an organized directory structure and helper functions to assist during the development of the project.
As a typical example, ProjectTemplate::create.project()
creates an organized setup with many subdirectories,
and ProjectTemplate::run.project()
executes each R script that is saved in the src/
subdirectory.
formatR, highlight, and highr can be used to color and/or format R code.
Packages humanFormat, lubridate, prettyunits, and rprintf have functions to better format data.
pander can be used for rendering R objects into Pandoc's markdown. knitr has the function pandoc that can call an installed version of Pandoc to convert documents between formats such as Markdown, HTML, LaTeX, PDF and Word. tth facilitates TeX to HTML/MathML conversions.
When using Sweave
and knitr it can be advantageous to cache the results of time consuming code chunks if the document will be re-processed (i.e. during debugging). knitr facilitates object caching and the Bioconductor package weaver can be used with Sweave
.
Non-literate programming packages to facilitating caching/archiving are R.cache, archivist, storr, and trackr.
9 years ago by Michael Malecki
apsrtable model-output formatter for social science
20 days ago by Przemyslaw Biecek
Tools for Storing, Restoring and Searching for R Objects
4 months ago by Hong Ooi
Install Packages from Snapshots on the Checkpoint Server for Reproducibility
23 days ago by William Michael Landau
A Pipeline Toolkit for Reproducible Computation at Scale
a month ago by Uri Simonsohn
Reproducible Scripts via Version-Specific Package Loading
6 years ago by Markus Loecher, Berlin School of Economics and Law (BSEL)
Facilitates Automated HTML Report Creation
6 years ago by Michael C Koohafkan
Integrated Code Chunk Anchoring and Referencing for R Markdown Documents
a year ago by Jan Philipp Dietrich
May All Data be Reproducible and Transparent (MADRaT) *
9 years ago by Noah Silverman
Creates an empty package framework for the LCFD format
3 months ago by Martin Elff
Management of Survey Data and Presentation of Analysis Results
5 months ago by Andrie de Vries
Create a Mini Version of CRAN Containing Only Selected Packages
a year ago by David Gohel
Chart Generation for 'Microsoft Word' and 'Microsoft PowerPoint' Documents
2 months ago by David Gohel
Manipulation of Microsoft Word and PowerPoint Documents
2 years ago by Kevin Ushey
A Dependency Management System for Projects and their R Package Dependencies
17 days ago by Kenton White
Automates the Creation of New Statistical Analysis Projects
a year ago by Henrik Bengtsson
Fast and Light-Weight Caching (Memoization) of Objects and Results to Speed Up Computations
9 years ago by Wayne Jones
Simple R Interface to Microsoft PowerPoint using rcom or RDCOMClient.
7 years ago by Yoni Ben-Meshulam
Rbundler manages an application's dependencies systematically and repeatedly.
3 months ago by Mathew W. McLean
Straightforward 'BibTeX' and 'BibLaTeX' Bibliography Management
3 years ago by Jared P. Lander
Generates a Project and Repo for Easy Initialization of a Workshop
2 years ago by Walerian Sokolowski
Supports Developing, Building and Deploying R Solution
9 years ago by John Myles White
Turns a data frame into an HTML file containing a sortable table.
3 years ago by Marek Hlavac
Well-Formatted Regression and Summary Statistics Tables
10 months ago by Gabriel Becker
Installing, Managing, and Switching Between Distinct Sets of Installed Packages
8 months ago by Philip Leifeld
Conversion of R Regression Output to LaTeX or HTML Tables
2 years ago by Francois Guillem
A Lightweight Template for Data Analysis Projects
a year ago by Gabriel Becker
Semantic Annotation and Discoverability System for R-Based Artifacts
10 months ago by John Blischak
A Framework for Reproducible and Collaborative Data Science
a year ago by Tom August
Reproducible, Accessible & Shareable Species Distribution Modelling