'rquery' for 'data.table'

Implements the 'rquery' piped Codd-style query algebra using 'data.table'. This allows for a high-speed in memory implementation of Codd-style data manipulation tools.

rqdatatable is an implementation of the rquery piped Codd-style relational algebra hosted on data.table. rquery allow the expression of complex transformations as a series of relational operators and rqdatatable implements the operators using data.table.

For example scoring a logistic regression model (which requires grouping, ordering, and ranking) is organized as follows. For more on this example please see "Let’s Have Some Sympathy For The Part-time R User".

# data example
dL <- build_frame(
   "subjectID", "surveyCategory"     , "assessmentTotal" |
   1          , "withdrawal behavior", 5                 |
   1          , "positive re-framing", 2                 |
   2          , "withdrawal behavior", 3                 |
   2          , "positive re-framing", 4                 )
scale <- 0.237
# example rquery pipeline
rquery_pipeline <- local_td(dL) %.>%
             probability :=
               exp(assessmentTotal * scale))  %.>% 
                 partitionby = 'subjectID') %.>%
             k = 1,
             partitionby = 'subjectID',
             orderby = c('probability', 'surveyCategory'),
             reverse = c('probability', 'surveyCategory')) %.>% 
  rename_columns(., c('diagnosis' = 'surveyCategory')) %.>%
  select_columns(., c('subjectID', 
                      'probability')) %.>%
  orderby(., cols = 'subjectID')

We can show the expanded form of query tree.

  assessmentTotal) %.>%
  probability := exp(assessmentTotal * 0.237)) %.>%
  probability := probability / sum(probability),
  p= subjectID) %.>%
  row_number := row_number(),
  p= subjectID,
  o= "probability" DESC, "surveyCategory" DESC) %.>%
   row_number <= 1) %.>%
  c('diagnosis' = 'surveyCategory')) %.>%
   subjectID, diagnosis, probability) %.>%
 orderby(., subjectID)

And execute it using data.table.

##    subjectID           diagnosis probability
## 1:         1 withdrawal behavior   0.6706221
## 2:         2 positive re-framing   0.5589742

One can also apply the pipeline to new tables.

   "subjectID", "surveyCategory"     , "assessmentTotal" |
   7          , "withdrawal behavior", 5                 |
   7          , "positive re-framing", 20                ) %.>%
##    subjectID           diagnosis probability
## 1:         7 positive re-framing   0.9722128

Initial bench-marking of rqdatatable is very favorable (notes here).

Note rqdatatable has an "immediate mode" which allows direct application of pipelines stages without pre-assembling the pipeline. "Immediate mode" is a convenience for ad-hoc analyses, and has some negative performance impact, so we encourage users to build pipelines for most work. Some notes on the issue can be found here.

rqdatatable is a fairly complete implementation of rquery. The main differences are the rqdatatable implementations of sql_node() and theta_join() are implemented by round-tripping through a database handle specified by the rquery.rquery_db_executor option (so it is not they are not very desirable implementation).

To install rqdatatable please use install.packages("rqdatatable").


rqdatatable 1.1.1 2018/09/20

  • alternate data.table implementation path.
  • force parent.frame.

rqdatatable 1.0.0 2018/09/10

  • allow no group columns project.
  • work on ordering in extend.

rqdatatable 0.1.4 2018/08/18

  • More tests.
  • Work on result print-visibility.

rqdatatable 0.1.3 2018/07/28

  • Fix full join print glitch.
  • data.table implementation of theta-join.
  • Documentation fixes.

rqdatatable 0.1.2 2018/07/08

  • Adapt to instant execution path.
  • Don't expect %>>%.
  • Documentation improvements.

rqdatatable 0.1.1 2018/06/26

  • Don't use isFALSE() (new to R 3.5.0).
  • Update install instructions.
  • Improve regexps.

rqdatatable 0.1.0 2018/06/18

  • First CRAN release.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.


1.1.3 by John Mount, 5 days ago

https://github.com/WinVector/rqdatatable/, https://winvector.github.io/rqdatatable/

Report a bug at https://github.com/WinVector/rqdatatable/issues

Browse source code at https://github.com/cran/rqdatatable

Authors: John Mount [aut, cre] , Win-Vector LLC [cph]

Documentation:   PDF Manual  

GPL-3 license

Imports wrapr, data.table, methods

Depends on rquery

Suggests knitr, rmarkdown, DBI, RSQLite, parallel, RUnit

Suggested by cdata, rquery, vtreat.

See at CRAN