Resampled Data Frames

Create data frames with virtual groups that can be used with 'dplyr' to efficiently compute resampled statistics, generate the data for hypothetical outcome plots, and fit multiple models on resampled variations of the original data.


Codecov testcoverage Travis buildstatus CRANstatus Lifecycle:experimental

Introduction

The goal of strapgod is to create virtual groups on top of a tibble or grouped_df as a way of resampling the original data frame. You can then efficiently perform various dplyr operations on this resampled_df, like: summarise(), do(), group_map(), and more, to easily compute bootstrapped and resampled statistics.

Installation

You can install the released version of strapgod from CRAN with:

install.packages("strapgod")

Install the development version from GitHub with:

devtools::install_github("DavisVaughan/strapgod")

Learning about strapgod

If you aren’t already on the pkgdown site, I would encourage starting there. From there, you will be able to click on these two vignettes to learn about working with resampled tibbles.

  • vignette("virtual-bootstraps", "strapgod")

  • vignette("dplyr-support", "strapgod")

Example

Create resampled data frames with bootstrapify() or samplify(). Notice how we grouped by the virtual column, .bootstrap and there are still only 150 rows even though we bootstrapped this dataset 10 times.

library(strapgod)
library(dplyr)
set.seed(123)
 
bootstrapify(iris, 10)
#> # A tibble: 150 x 5
#> # Groups:   .bootstrap [10]
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          4.9         3            1.4         0.2 setosa 
#>  3          4.7         3.2          1.3         0.2 setosa 
#>  4          4.6         3.1          1.5         0.2 setosa 
#>  5          5           3.6          1.4         0.2 setosa 
#>  6          5.4         3.9          1.7         0.4 setosa 
#>  7          4.6         3.4          1.4         0.3 setosa 
#>  8          5           3.4          1.5         0.2 setosa 
#>  9          4.4         2.9          1.4         0.2 setosa 
#> 10          4.9         3.1          1.5         0.1 setosa 
#> # … with 140 more rows

You can feed a resampled_df into summarise() or group_map() to perform efficient bootstrapped computations.

iris %>%
  bootstrapify(10) %>%
  summarise(per_strap_mean = mean(Petal.Width))
#> # A tibble: 10 x 2
#>    .bootstrap per_strap_mean
#>         <int>          <dbl>
#>  1          1           1.20
#>  2          2           1.22
#>  3          3           1.23
#>  4          4           1.13
#>  5          5           1.20
#>  6          6           1.15
#>  7          7           1.18
#>  8          8           1.13
#>  9          9           1.31
#> 10         10           1.19

The original data can be grouped as well, and the bootstraps will be created for each group.

iris %>%
  group_by(Species) %>%
  bootstrapify(10) %>%
  summarise(per_strap_per_species_mean = mean(Petal.Width))
#> # A tibble: 30 x 3
#> # Groups:   Species [3]
#>    Species .bootstrap per_strap_per_species_mean
#>    <fct>        <int>                      <dbl>
#>  1 setosa           1                      0.25 
#>  2 setosa           2                      0.246
#>  3 setosa           3                      0.24 
#>  4 setosa           4                      0.238
#>  5 setosa           5                      0.252
#>  6 setosa           6                      0.274
#>  7 setosa           7                      0.238
#>  8 setosa           8                      0.258
#>  9 setosa           9                      0.252
#> 10 setosa          10                      0.256
#> # … with 20 more rows

Plotting bootstrapped results

A fun example of using strapgod is to create bootstrapped visualizations quickly and easily for hypothetical outcome plots.

set.seed(123)
library(ggplot2)
 
# without bootstrap
mtcars %>%
  ggplot(aes(hp, mpg)) + 
  geom_smooth(se = FALSE) +
  ylim(y = c(0, 40))
# with bootstrap
mtcars %>%
  bootstrapify(10) %>%
  collect() %>%
  ggplot(aes(hp, mpg, group = .bootstrap)) + 
  geom_smooth(se = FALSE) +
  ylim(y = c(0, 40))

In the wild

  • Claus Wilke has used strapgod to power some pieces of his ungeviz package for visualizing uncertainty.

  • You can watch Claus’s rstudio::conf 2019 talk to see ungeviz and strapgod in action.

News

strapgod 0.0.1

  • Added a NEWS.md file to track changes to the package.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("strapgod")

0.0.1 by Davis Vaughan, a month ago


https://github.com/DavisVaughan/strapgod


Report a bug at https://github.com/DavisVaughan/strapgod/issues


Browse source code at https://github.com/cran/strapgod


Authors: Davis Vaughan [aut, cre]


Documentation:   PDF Manual  


GPL-3 license


Imports dplyr, tidyr, rlang, tibble

Suggests broom, testthat, covr, knitr, rmarkdown


See at CRAN