Data Validation Infrastructure

Declare data validation rules and data quality indicators; confront data with them and analyze or visualize the results. The package supports rules that are per-field, in-record, cross-record or cross-dataset. Rules can be automatically analyzed for rule type and connectivity. See also Van der Loo and De Jonge (2018) , chapter 6.


version 0.2.6

  • New methods 'all' and 'any' for 'validation' objects
  • New 'plot' method for 'validator' objects.
  • New 'plot' method for 'validation' objects.
  • New '' methods for 'validatorComparison' and 'cellComparison' objects.
  • New 'plot' method for 'validatorComparison' and 'cellComparison' objects.
  • New 'barplot' method for 'validatorComparison' and 'cellComparison' objects.
  • Improved 'plot'/'barplot' method for objects of class 'validator'.
  • Bugfixes in 'cells' and 'compare'.
  • The row order of output in 'cells' has changed to be more consistent with row order in the output of 'compare'.
  • The row 'new_missing' is renamed 'removed' in 'cellComparison' objects.
  • Improved cross-references in reference manual

version 0.2.5

  • Documentation updates (Thanks to Matthias Gomolka)
  • Removed deprecated functions 'number_missing', 'fraction_missing', 'row_missing', 'number_unique', 'any_dunplicated', 'any_missing'.
  • Bugfix: the 'call' object in a 'confrontation' object was created incorrectly. This only affects printing.

version 0.2.4

  • New set membership operator "%vin%", behaving consistently with "=="
  • Set membership operator %in% is now replaced with %vin% (#80).
  • Added [[<- and [<- methods for validator objects.
  • Validation rules can now be endowed with user-defined metadata, see ?meta (#59).
  • fixed (harmless) compiler warning (thanks to Brian Ripley)
  • bugfix: empty [] selection on expressionset crashed (thanks to Anne Petersen)
  • bugfix: NA in validating expression would crash parser (thanks to Masafumi Okada)

version 0.2.0

  • new object of class 'indicator'
  • validation rules now support if-then-else syntax.
  • validaton rules can now contain embedded if-statements, e.g. A | (if (B) C)
  • expressionset objects (validator, indicator) can be combined using '+'
  • results of getter functions (e.g. 'labels') are now named.
  • less verbose printng of options for expressionset objects (validator, indicator)
  • expressionset objects (validator, indicator) can now be coerced to
  • confrontation objects (validation, indication) can now be coerced to data.frame.
  • safer expression manipulation under the hood.
  • breaking: 'rule' column now called 'name' in summary of 'confrontation' objects.
  • bugfix: character indexing with multiple elements of expressionset objects would crash.
  • bugfix: proper recycling for replace methods in expressionset.

version 0.1.8

  • Added loggers for the 'lumberjack' package: lbj_cells and lbj_rules
  • Tolerance for checking (non-strict) linear inequalities is now controlled by voptions(lin.ineq.opts).
  • Macro assignments through := are no longer put in brackets.

version 0.1.7

  • The missingess counters are now only internally documented and will be deprecated (the introduction of the '.' made them more or less obsolete).
  • Package now 'depends' on methods to allow dispatch on objects inhereting from data.frame
  • bugfix: The assignment created() <- was broken. #65, thanks to Andrew R Gibson.
  • bugfix: Simple equallity checks on character data behaved unexpectedly. #67, thanks to Kevin Kuo.
  • Native routines now registered as required by updated CRAN policy.

version 0.1.5

  • The '.' is now used to reference the validated data set as whole.
  • Small change in output of 'compare' to match the table in van den Broek et al. (2013)
  • registered native routines as now recommended by CRAN

version 0.1.4

  • 'confront' now emits a warining when variable name conflicts with name of a reference data set
  • Deprecated 'validate_reset', in favour of the shorter 'reset' (use 'validate::reset' in case of ambiguity)
  • Deprecated 'validate_options' in favour of the shorter 'voptions'
  • New option na.value with default value NA, controlling the output when a rule evaluates to NA.
  • Added rules from the ESSnet on validation (deliverable 17) to automated tests.
  • added 'grepl' to allowed validation syntax (suggested by Dusan Sovic)
  • exported a few functions w/ keywords internal for extensibility
  • Bugfix: blocks sometimes reported wrong nr of blocks (in case of a single connected block.)
  • Bugfix: macro expansion failed when macros were reused in other macros.
  • Bugfix: certain nonlinear relations were recognized as linear
  • Bugfix: rules that use (anonymous) function definitions raised error when printed.

version 0.1.3

  • initial release

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.


1.0.1 by Mark van der Loo, 2 months ago

Report a bug at

Browse source code at

Authors: Mark van der Loo [cre, aut] , Edwin de Jonge [aut] , Paul Hsieh [ctb]

Documentation:   PDF Manual  

Task views: Official Statistics & Survey Methodology

GPL-3 license

Imports stats, graphics, grid, settings, yaml

Depends on methods

Suggests tinytest, knitr, bookdown, lumberjack

Imported by dcmodify, deductive, iNZightTools, rspa.

Depended on by errorlocate, validatetools.

See at CRAN