Trust, but Verify

Declarative template-based framework for verifying that objects meet structural requirements, and auto-composing error messages when they do not.


Project Status: WIP - Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.

Trust, but Verify

Easily

When you write functions that operate on S3 or unclassed objects you can either trust that your inputs will be structured as expected, or tediously check that they are.

vetr takes the tedium out of structure verification so that you can trust, but verify. It lets you express structural requirements declaratively with templates, and it auto-generates human-friendly error messages as needed.

Quickly

vetr is written in C to minimize overhead from parameter checks in your functions. It has no dependencies.

Declarative Checks with Templates

Templates

Declare a template that an object should conform to, and let vetr take care of the rest:

library(vetr)
tpl <- numeric(1L)
vet(tpl, 1:3)
## [1] "`length(1:3)` should be 1 (is 3)"
vet(tpl, "hello")
## [1] "`\"hello\"` should be type \"numeric\" (is \"character\")"
vet(tpl, 42)
## [1] TRUE

The template concept is based on vapply, but generalizes to all S3 objects and adds some special features to facilitate comparison. For example, zero length templates match any length:

tpl <- integer()
vet(tpl, 1L:3L)
## [1] TRUE
vet(tpl, 1L)
## [1] TRUE

And for convenience short (<= 100 length) integer-like numerics are considered integer:

tpl <- integer(1L)
vet(tpl, 1)       # this is a numeric, not an integer
## [1] TRUE
vet(tpl, 1.0001)
## [1] "`1.0001` should be type \"integer-like\" (is \"double\")"

vetr can compare recursive objects such as lists, or data.frames:

tpl.iris <- iris[0, ]      # 0 row DF matches any number of rows in object
iris.fake <- iris
levels(iris.fake$Species)[3] <- "sibirica"   # tweak levels
 
vet(tpl.iris, iris)
## [1] TRUE
vet(tpl.iris, iris.fake)
## [1] "`levels(iris.fake$Species)[3]` should be \"virginica\" (is \"sibirica\")"

From our declared template iris[0, ], vetr infers all the required checks. In this case, vet(iris[0, ], iris.fake, stop=TRUE) is equivalent to:

stopifnot_iris <- function(x) {
  stopifnot(
    is.data.frame(x),
    is.list(x),
    length(x) == length(iris),
    identical(lapply(x, class), lapply(iris, class)),
    is.integer(attr(x, 'row.names')),
    identical(names(x), names(iris)),
    identical(typeof(x$Species), "integer"),
    identical(levels(x$Species), levels(iris$Species))
  )
}
stopifnot_iris(iris.fake)
## Error in stopifnot_iris(iris.fake): identical(levels(x$Species), levels(iris$Species)) is not TRUE

vetr saved us typing, and the time and thought needed to come up with what needs to be compared.

You could just as easily have created templates for nested lists, or data frames in lists. Templates are compared to objects with the alike function. For a thorough description of templates and how they work see the alike vignette. For template examples see example(alike).

Auto-Generated Error Messages

Let's revisit the error message:

vet(tpl.iris, iris.fake)
## [1] "`levels(iris.fake$Species)[3]` should be \"virginica\" (is \"sibirica\")"

It tells us:

  • The reason for the failure
  • What structure would be acceptable instead
  • The location of failure levels(iris.fake$Species)[3]

vetr does what it can to reduce the time from error to resolution. The location of failure is generated such that you can easily copy it in part or full to the R prompt for further examination.

Vetting Expressions

You can combine templates with && / ||:

vet(numeric(1L) || NULL, NULL)
## [1] TRUE
vet(numeric(1L) || NULL, 42)
## [1] TRUE
vet(numeric(1L) || NULL, "foo")
## [1] "`\"foo\"` should be `NULL`, or type \"numeric\" (is \"character\")"

Templates only check structure. When you need to check values use . to refer to the object:

vet(numeric(1L) && . > 0, -42)  # strictly positive scalar numeric
## [1] "`-42 > 0` is not TRUE (FALSE)"
vet(numeric(1L) && . > 0, 42)
## [1] TRUE

You can compose vetting expressions as language objects and combine them:

scalar.num.pos <- quote(numeric(1L) && . > 0)
foo.or.bar <- quote(character(1L) && . %in% c('foo', 'bar'))
vet.exp <- quote(scalar.num.pos || foo.or.bar)
 
vet(vet.exp, 42)
## [1] TRUE
vet(vet.exp, "foo")
## [1] TRUE
vet(vet.exp, "baz")
## [1] "At least one of these should pass:"                         
## [2] "  - `\"baz\" %in% c(\"foo\", \"bar\")` is not TRUE (FALSE)" 
## [3] "  - `\"baz\"` should be type \"numeric\" (is \"character\")"

all_bw is available for value range checks (~10x faster than isTRUE(all(. >= x & . <= y)) for large vectors):

vet(all_bw(., 0, 1), runif(5) + 1)
## [1] "`all_bw(runif(5) + 1, 0, 1)` is not TRUE (is chr: \"`1.465853` at index 1 not in `[0,1]`\")"

There are a number of predefined vetting tokens you can use in your vetting expressions such as:

vet(NUM.POS, -runif(5))    # positive numeric; see `?vet_token` for others
## [1] "`-runif(5)` should contain only positive values, but has negatives"

Vetting expressions are designed to be intuitive to use, but their implementation is complex. We recommend you look at example(vet) for usage ideas, or at the "Non Standard Evaluation" section of the vignette for the gory details.

vetr in Functions

If you are vetting function inputs, you can use the vetr function, which works just like vet except that it is streamlined for use within functions:

fun <- function(x, y) {
  vetr(numeric(1L), logical(1L))
  TRUE   # do work...
}
fun(1:2, "foo")
## Error in fun(x = 1:2, y = "foo"): For argument `x`, `length(1:2)` should be 1 (is 2)
fun(1, "foo")
## Error in fun(x = 1, y = "foo"): For argument `y`, `"foo"` should be type "logical" (is "character")

vetr automatically matches the vetting expressions to the corresponding arguments and fetches the argument values from the function environment.

See vignette for additional details on how the vetr function works.

Additional Documentation

Development Status

vetr is still in development, although most of the features are considered mature. The most likely area of change is the treatment of function and language templates (e.g. alike(sum, max)), and more flexible treatment of list templates (e.g. in future lists may be allowed to be different lengths so long as every named element in the template exists in the object).

Installation

install.packages('vetr')

Or for the development version:

# install.packages('devtools')
devtools::install_github('brodieg/[email protected]')

Alternatives

There are many alternatives available to vetr. We do a survey of the following in our parameter validation functions review:

The following packages also perform related tasks, although we do not review them:

  • valaddin v0.1.0 by Eugene Ha, a framework for augmenting existing functions with validation contracts. Currently the package is undergoing a major overhaul so we will add it to the comparison once the new release (v0.3.0) is out.
  • ensurer v1.1 by Stefan M. Bache, a framework for flexibly creating and combining validation contracts. The development version adds an experimental method for creating type safe functions, but it is not published to CRAN so we do not test it here.
  • validate by Mark van der Loo and Edwin de Jonge, with a primary focus on validating data in data frames and similar data structures.
  • assertr by Tony Fischetti, also focused on data validation in data frames and similar structures.
  • types by Jim Hester, which implements but does not enforce type hinting.
  • argufy by Gábor Csárdi, which implements parameter validation via roxygen tags (not released to CRAN).

Acknowledgments

Thank you to:

About the Author

Brodie Gaslam is a hobbyist programmer based on the US East Coast.

News

0.2.7

  • Fix new rchk warnings.
  • Set RNGversion() due to changes to sampling mechanism.

0.2.6

  • #96 Fix r-devel test failures that started with r75024.
  • #94 Properly credit vapply for template concept.

0.2.5

  • Address CRAN warnings about packages used in tests not in suggests.

0.2.4

  • As per #93, ensure that attribute comparisons are always done in the same order. We now sort the attribute lists prior to comparison. This may result in slightly different output than previously as which attribute is declared incorrect or missing may change as a result of the sort since the first such attribute is reported. Additionally, there is now more explicit handling of missing attributes so the error reporting for them will be slightly different.
  • Fix memory problems reported by valgrind.

0.2.3

  • #92 vetr evaluated expressions in wrong environment.
  • #89 Zero length vetting token results pass; this is to align with all(logical(0)) and consequently stopifnot.
  • #88 Extra space in deparsed vetted language.

0.2.2

  • Test errors on Solaris.

0.2.1

  • Fix Solaris compilation issue.
  • Fix new rcheck warnings.
  • Change R dependency to 3.3.2 to avoid problems with CRAN osx R-devel build.

0.2.0

  • #48: Implement all_bw, a more efficient version of !anyNA(.) && all(. < x) && all(. > y).
  • #65 #51: Check expressions that return character vectors will have part of the first element of that vector included in the error message.
  • #69: Vetting expressions that use the symbol of the object being vetted are no longer valid. This avoid confusion caused by intended standard tokens being treated as template tokens because they use the object symbol instead of . to refer to the object.
  • #64: Rewrite result handling for multi token expressions to avoid unnecessary slow downs
  • #43: Fix rchck, rcnst, UBSAN, valgrind (ht @kalibera).
  • #76: Standardize defined terms (e.g. Standard vs Template Tokens)
  • #77: Replace SIZE_T_MAX with SIZE_MAX for portability
  • #70: Feedback from Richie Cotton and Michel Lang re: comparison "vignette"
  • #45: Cleanup error messages for objects that should be NULL.
  • #73: Cleaner protection stack handling
  • #56: Over-aggressive detection of infinite recursion in symbol substitution
  • #81: Remove test that attached attribute to symbol (illegal in R-devel now).
  • #59: Add a CONTRIBUTING.md
  • Assorted typos (@franknarf1, @DasonK)

0.1.0

Initial release.

0.0.2

Finalizing initial release.

  • #40: Removed suggests dependencies to ggplot, microbenchmark, and valaddin to improve travis build time.
  • Internal: formatting strings longer than nchar.max no longer allowed
  • #39: type_alike return values structured like alike, doc fixes.
  • #38: Run with valgrind
  • #36: Fix INTEGER C bug
  • #34: allow substitution of . symbol when part of ...
  • #33: prevent infinite recursion with recursive symbol substitution
  • #30: allow specification of substitution / matching / evaluation environment.
  • #28: expose alike and vetr setting control.
  • #24: clarify use of vet_token.
  • #18: better documentation for NSE.
  • #11: segfault when validating language objects.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("vetr")

0.2.7 by Brodie Gaslam, 3 months ago


https://github.com/brodieG/vetr


Report a bug at https://github.com/brodieG/vetr/issues


Browse source code at https://github.com/cran/vetr


Authors: Brodie Gaslam [aut, cre] , Paxdiablo [cph] (Hash table implementation in src/pfhash.h) , R Core Team [cph] (Used/adapted several code snippets from R sources , see src/misc-alike.c and src/valname.c)


Documentation:   PDF Manual  


GPL (>= 2) license


Suggests knitr, rmarkdown, unitizer, methods


See at CRAN