Task view: Reproducible Research

Last updated on 2019-11-11 by John Blischak, Alison Hill

The goal of reproducible research is to tie specific instructions to data analysis and experimental data so that scholarship can be recreated, better understood and verified. Packages in R for this purpose can be split into groups for: literate programming, pipeline toolkits, package reproducibility, project workflows, code/data formatting tools, format convertors, and object caching.

The maintainers gratefully acknowledge Anna Krystalli, Max Kuhn, Will Landau, Ben Marwick, and Daniel Nüst for their useful feedback and contributions.

Literate Programming

The primary way that R facilitates reproducible research is using a document that is a combination of content and data analysis code. The Sweave function (in the base R utils package) and the knitr package can be used to blend the subject matter and R code so that a single document defines the content and the analysis. The brew and R.rsp packages contain alternative approaches to embedding R code into various markups.

The resources for literate programming are best organized by the document type/markup language:

LaTeX

Both Sweave and knitr can process LaTeX files. lazyWeave can create LaTeX documents from scratch.

Object Conversion Functions:

Miscellaneous Tools

  • Hmisc contains a function to correctly escape special characters. resumer creates resumes. Standardized exams can be created using the exams package.

HTML

The knitr package can process HTML files directly. Sweave can also work with HTML by way of the R2HTML package. lazyWeave can create HTML format documents from scratch.

Object Conversion Functions:

Miscellaneous Tools: htmltools has various tools for working with HTML. tufterhandout for creating Tufte-style handouts

Markdown

The knitr package can process markdown files without assistance. The packages markdown and rmarkdown have general tools for working with documents in this format. lazyWeave can create markdown format documents from scratch.

Object Conversion Functions:

Miscellaneous Tools: tufterhandout for creating Tufte-style handouts. kfigr allows for figure indexing in markdown documents.

OpenDocument Format (ODF)

Object Conversion Functions:

  • statistical models/methods: rapport

Microsoft Formats

R2wd (windows only) can also create Word documents from scratch and R2PPT (also windows only) can create PowerPoint slides. The rtf package does the same for Rich Text Format documents.

Pipeline Toolkits

Pipeline toolkits help maintain and verify reproducibility. They synchronize computational output with the underlying code and data, and they tell the user when everything is up to date. In other words, they provide concrete evidence that results are re-creatable from the starting materials, and the data analysis project does not need to rerun from scratch. The drake package is such a pipeline toolkit. It is similar to GNU Make, but it is R-focused.

  • drake: A general-purpose computational engine for data analysis, drake rebuilds intermediate data objects when their dependencies change, and it skips work when the results are already up to date.
  • repo: A data manager meant to avoid manual storage/retrieval of data to/from the file system.

Package Reproducibility

R also has tools for ensuring that specific packages versions can be required for analyses. checkpoint, rbundler, packrat and renv install packages required for a project to a local archive as they existed at a specified point in time. This allows specific package versions to be maintained over time and different users. The miniCRAN package facilitates the creation of local CRAN-like repositories.

Project Workflows

Successfully completing a data analysis project often requires much more than statistics and visualizations. Efficiently managing the code, data, and results as the project matures helps reduce stress and errors. The following "workflow" packages assist the R programmer by managing project infrastructure and/or facilitating a reproducible workflow.

Workflow utility packages provide single-use functions to implement project infrastructure or solve a specific problem. As a typical example, usethis::use_git() initializes a Git repository, ignores common R files, and commits all project files.

  • cabinets: Creates project specific directory and file templates that are written to a .Rprofile file.
  • here: Constructs paths to your project's files.
  • prodigenr: Create a project directory structure, along with typical files for that project.
  • RepoGenerator: Generates a project and repo for easy initialization of a GitHub repo for R workshops.
  • rrtools (GitHub only): Instructions, templates, and functions for making a basic compendium suitable for doing reproducible research with R.
  • starters (GitHub only): Setting up R project directories for teaching, presenting, analysis, package development can be a pain. starters shortcuts this by creating folder structures and setting good defaults for you.
  • usethis: Automate package and project setup tasks that are otherwise performed manually.

Workflow framework packages provide an organized directory structure and helper functions to assist during the development of the project. As a typical example, ProjectTemplate::create.project() creates an organized setup with many subdirectories, and ProjectTemplate::run.project() executes each R script that is saved in the src/ subdirectory.

  • adapr: Tracks reading and writing within R scripts that are organized into a directed acyclic graph.
  • DataPackageR: A framework to help construct R data packages in a reproducible manner.
  • exreport: Analysis of experimental results and automatic report generation in both interactive HTML and LaTeX.
  • madrat: Provides a framework which should improve reproducibility and transparency in data processing. It provides functionality such as automatic meta data creation and management, rudimentary quality management, data caching, work-flow management and data aggregation.
  • makeProject: This package creates an empty framework of files and directories for the "Load, Clean, Func, Do" structure described by Josh Reich.
  • orderly: Order, create and store reports from R.
  • projects: Provides a project infrastructure with a focus on manuscript creation.
  • ProjectTemplate: Provides functions to automatically build a directory structure for a new R project. Using this structure, 'ProjectTemplate' automates data loading, preprocessing, library importing and unit testing.
  • reports: Assists in writing reports and presentations by providing a frame work that brings together existing R, LaTeX/.docx and Pandoc tools.
  • represtools: Reproducible research tools automates the creation of an analysis directory structure and work flow. There are R markdown skeletons which encapsulate typical analytic work flow steps. Functions will create appropriate modules which may pass data from one step to another.
  • RSuite: Supports safe and reproducible solutions development in R. It will help you with environment separation per project, dependency management, local packages creation and preparing deployment packs for your solutions.
  • tinyProject: Creates useful files and folders for data analysis projects and provides functions to manage data, scripts and output files.
  • workflowr: Provides a workflow for your analysis projects by combining literate programming ('knitr' and 'rmarkdown') and version control ('Git', via 'git2r') to generate a website containing time-stamped, versioned, and documented results.
  • zoon: Reproducible and remixable species distribution modelling.

Formatting Tools

formatR, highlight, and highr can be used to color and/or format R code.

Packages humanFormat, lubridate, prettyunits, and rprintf have functions to better format data.

Format Convertors

pander can be used for rendering R objects into Pandoc's markdown. knitr has the function pandoc that can call an installed version of Pandoc to convert documents between formats such as Markdown, HTML, LaTeX, PDF and Word. tth facilitates TeX to HTML/MathML conversions.

Object Caching Packages

When using Sweave and knitr it can be advantageous to cache the results of time consuming code chunks if the document will be re-processed (i.e. during debugging). knitr facilitates object caching and the Bioconductor package weaver can be used with Sweave.

Non-literate programming packages to facilitating caching/archiving are R.cache, archivist, and storr.

Packages

adapr — 2.0.0

Implementation of an Accountable Data Analysis Process

animation — 2.6

A Gallery of Animations in Statistics and Utilities to Create Animations

apsrtable — 0.8-8

apsrtable model-output formatter for social science

archivist — 2.3.4

Tools for Storing, Restoring and Searching for R Objects

bibtex — 0.4.2.2

Bibtex Parser

brew — 1.0-6

Templating Framework for Report Generation

cabinets — 0.3.1

Project Specific Workspace Organization Templates

checkpoint — 0.4.8

Install Packages from Snapshots on the Checkpoint Server for Reproducibility

DataPackageR — 0.15.7

Construct Reproducible Analytic Data Sets as R Packages

drake — 7.9.0

A Pipeline Toolkit for Reproducible Computation at Scale

DT — 0.11

A Wrapper of the JavaScript Library 'DataTables'

exams — 2.3-4

Automatic Generation of Exams in R

exreport — 0.4.1

Fast, Reliable and Elegant Reproducible Research

formatR — 1.7

Format R Code Automatically

formattable — 0.2.0.1

Create 'Formattable' Data Structures

here — 0.1

A Simpler Way to Find Your Files

highlight — 0.5.0

Syntax Highlighter

highr — 0.8

Syntax Highlighting for R Source Code

htmlTable — 1.13.3

Advanced Tables for Markdown/HTML

htmltools — 0.4.0

Tools for HTML

HTMLUtils — 0.1.7

Facilitates Automated HTML Report Creation

humanFormat — 1.0

Human-friendly formatting functions

hwriter — 1.3.2

HTML Writer - Outputs R objects in HTML format

Hmisc — 4.3-0

Harrell Miscellaneous

prodigenr — 0.5.0

Research Project Directory Generator

latex2exp — 0.4.0

Use LaTeX Expressions in Plots

lazyWeave — 3.0.2

LaTeX Wrappers for R Users

lubridate — 1.7.4

Make Dealing with Dates a Little Easier

kfigr — 1.2

Integrated Code Chunk Anchoring and Referencing for R Markdown Documents

knitcitations — 1.0.10

Citations for 'Knitr' Markdown Files

knitLatex — 0.9.0

'Knitr' Helpers - Mostly Tables

knitr — 1.27

A General-Purpose Package for Dynamic Report Generation in R

madrat — 1.64.5

May All Data be Reproducible and Transparent (MADRaT) *

makeProject — 1.0

Creates an empty package framework for the LCFD format

markdown — 1.1

Render Markdown with the C Library 'Sundown'

memisc — 0.99.21

Management of Survey Data and Presentation of Analysis Results

miniCRAN — 0.2.12

Create a Mini Version of CRAN Containing Only Selected Packages

NMOF — 2.0-1

Numerical Methods and Optimization in Finance

orderly — 1.0.4

Lightweight Reproducible Reporting

packrat — 0.5.0

A Dependency Management System for Projects and their R Package Dependencies

pander — 0.6.3

An R 'Pandoc' Writer

papeR — 1.0-4

A Toolbox for Writing Pretty Papers and Reports

prettyunits — 1.1.1

Pretty, Human Readable Formatting of Quantities

projects — 2.0.0

A Project Infrastructure for Researchers

ProjectTemplate — 0.9.0

Automates the Creation of New Statistical Analysis Projects

quantreg — 5.54

Quantile Regression

R.cache — 0.14.0

Fast and Light-Weight Caching (Memoization) of Objects and Results to Speed Up Computations

R.rsp — 0.43.2

Dynamic Generation of Scientific Reports

R2HTML — 2.3.2

HTML Exportation for R Objects

R2PPT — 2.1

Simple R Interface to Microsoft PowerPoint using rcom or RDCOMClient.

R2wd — 1.5

Write MS-Word documents from R

rapport — 1.0

A Report Templating System

rbundler — 0.3.7

Rbundler manages an application's dependencies systematically and repeatedly.

RefManageR — 1.2.12

Straightforward 'BibTeX' and 'BibLaTeX' Bibliography Management

renv — 0.9.2

Project Environments

repo — 2.1.4

A Data-Centered Data Flow Manager

RepoGenerator — 0.0.1

Generates a Project and Repo for Easy Initialization of a Workshop

reports — 0.1.4

Assist the Workflow of Writing Academic Articles and Other Reports

reporttools — 1.1.2

Generate LaTeX Tables of Descriptive Statistics

represtools — 0.1.3

Reproducible Research Tools

resumer — 0.0.3

Build Resumes with R

rmarkdown — 2.1

Dynamic Documents for R

rms — 5.1-4

Regression Modeling Strategies

rprintf — 0.2.1

Adaptive Builder for Formatted Strings

RSuite — 0.37-253

Supports Developing, Building and Deploying R Solution

rtf — 0.4-14

Rich Text Format (RTF) Output

SortableHTMLTables — 0.1-3

Turns a data frame into an HTML file containing a sortable table.

sparktex — 0.1

Generate LaTeX sparklines in R

stargazer — 5.2.2

Well-Formatted Regression and Summary Statistics Tables

storr — 1.2.1

Simple Key Value Stores

suRtex — 0.9

LaTeX descriptive statistic reporting for survey data

tables — 0.8.8

Formula-Driven Table Generation

texreg — 1.36.23

Conversion of R Regression Output to LaTeX or HTML Tables

tikzDevice — 0.12.3

R Graphics Output in LaTeX Format

tinyProject — 0.6.1

A Lightweight Template for Data Analysis Projects

tth — 4.3-2-1

TeX to HTML/MathML Translators tth/ttm

tufterhandout — 1.2.1

Tufte-style html document format for rmarkdown

usethis — 1.5.1

Automate Package and Project Setup

workflowr — 1.6.0

A Framework for Reproducible and Collaborative Data Science

xtable — 1.8-4

Export Tables to LaTeX or HTML

zoon — 0.6.4

Reproducible, Accessible & Shareable Species Distribution Modelling

ztable — 0.2.0

Zebra-Striped Tables in LaTeX and HTML Formats


Task view list