'dplyr'-Like Syntax for Summary Statistics of Survey Data

Use piping, verbs like 'group_by' and 'summarize', and other 'dplyr' inspired syntactic style when calculating summary statistics on survey data using functions from the 'survey' package.

srvyr brings parts of dplyr's syntax to survey analysis, using the survey package.

srvyr focuses on calculating summary statistics from survey data, such as the mean, total or quantile. It allows for the use of many dplyr verbs, such as summarize, group_by, and mutate, the convenience of pipe-able functions, lazyeval's style of non-standard evaluation and more consistent return types than the survey package.

You can try it out:

# devtools::install_github("gergness/srvyr")

To create a tbl_svy object (the core concept behind the srvyr package), use the function as_survey_design() with the bare column names of the names you would use in survey::svydesign() object.

dstrata <- apistrat %>%
   as_survey_design(strata = stype, weights = pw)

Now many of the dplyr verbs are available.

  • Use mutate() if you want to add or modify a variable.

    dstrata <- dstrata %>%
      mutate(api_diff = api00 - api99)
  • summarise() calculates summary statistics such as mean, total, quantile or ratio.

    dstrata %>% 
      summarise(api_diff = survey_mean(api_diff, vartype = "ci")))
  • Use group_by() if you want to summarise by groups.

    dstrata %>% 
      group_by(stype) %>%
      summarise(api_diff = survey_mean(api_diff, vartype = "ci")))

You can still use functions from the survey package if you'd like to:

svyglm(api99 ~ stype, dstrata)

If you'd like to contribute, please let me know! I started this as a way to learn about R package development, so you'll have to bear with me as I learn, but I would appreciate bug reports, pull requests or other suggestions!



  • Added support for database backed surveys, using dplyr's handling of DBI. Because of problems interacting with the survey package twophase designs do not work.

srvyr 0.1.2

  • Fixed a problem with confidence levels not being passed into quantiles

  • Added deff parameter to survey_mean(), survey_total() and survey_median(), and a df parameter to those functions and survey_quantile() / survey_median().

  • summarize and mutate match dplyr's behavior when arguments aren't named (uses dplyr::auto_name())

srvyr 0.1.1

  • New function cascade summarizes groups, and cascades to create summary statistics of groups of groups.

  • Fixed a bug for confidence intervals for survey_total() on groups.

  • Fixed some issues with the upcoming version of dplyr.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.


0.2.0 by Greg Freedman Ellis, 5 months ago


Report a bug at https://github.com/gergness/srvyr/issues

Browse source code at https://github.com/cran/srvyr

Authors: Greg Freedman Ellis [aut, cre]

Documentation:   PDF Manual  

Task views: Official Statistics & Survey Methodology

GPL-2 | GPL-3 license

Imports dplyr, lazyeval, magrittr, survey, tibble

Suggests ggplot2, knitr, Matrix, rmarkdown, pander, RSQLite, MonetDBLite, survival, testthat

See at CRAN