Utilities for Generating SQL Queries from Model Objects

Provides utilities for generating SQL queries (particularly CREATE TABLE statements) from R model objects. The most important use case is generating SQL to score a generalized linear model or related model represented as an R object, in which case the package handles parsing formula operators and including the model's response function.


CRAN_Status_Badge Build Status Coverage Status Downloads License

Utilities for scoring GLMs and related models in SQL. Use the create_statement and select_statement functions to generate scoring queries from model objects. The most important use case is for very large scoring datasets, especially those which can't fit into memory or would require too much network or storage I/O if scored the usual way in R.

The SQL-generating functions in sqlscore handle various formula operators, and also take care of wrapping the model's linear predictor in the appropriate link function. If needed, you can also specify a custom link function.

Usage:

The SQL-generating functions create_statement and select_statement do what their names suggest and generate CREATE TABLE and SELECT statements for model scoring.

If, for example, you have a database table of iris measurements, you can model the sepal length and generate predictions as follows:

mod <- glm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species,
           data=datasets::iris)

create_statement(mod, src_table="iris", dest_table="iris_scores", pk=c("id"))
#> Error: `con` must not be NULL

To get a SELECT statement that's not wrapped in a CREATE TABLE (so that, e.g., you can add your own database-specific pieces of SQL), use select_statement:

mod <- glm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species,
           data=datasets::iris)

select_statement(mod, src_table="iris", pk=c("id"))
#> Error: `con` must not be NULL

Helper functions include linpred(), which generates an R call object representing the linear predictor, and score_expression, an S3 generic that handles wrapping the linear predictor in the response function.

Supported model types:

Specific packages and models that are known to work include: glm and lm from package:stats, cv.glmnet from package:glmnet, glmboost from package:mboost, and bayesglm from package:arm.

Default S3 methods are for objects structured like those of class "glm", so models not listed here may work if they resemble those objects, but are not guaranteed to.

Installation:

Install the released version from CRAN:

install.packages('sqlscore')

Install the dev version from github:

install.packages("devtools")
devtools::install_github("wwbrannon/sqlscore")

News

sqlscore 0.1.4

This version includes:

  • Compatibility with the new version 1.4.0 of dbplyr. Documentation also clarifies the possibility of passing dbplyr::simulate_* objects for offline generation of different SQL dialects.

sqlscore 0.1.3

This version includes:

  • Improved tests, including fixes needed after an upgrade to glmnet.

sqlscore 0.1.2

This version includes:

  • Support for the recent refactor of dplyr into dplyr and dbplyr.
  • Support for the new Binomial_glm type of object in mboost.

sqlscore 0.1.1

This version includes:

  • Support for cauchit links in glm objects.
  • Support for non-Gaussian families in glmboost, specifically: binomial (logit and probit), Poisson and gamma.

sqlscore 0.1.0

First release.

This version includes:

  • Functions to generate CREATE TABLE and SELECT statements from model objects;
  • Functions for generating unevaluated R expressions from model objects that correspond to
    • the model's linear predictor
    • the model's final prediction expression (the response function of the linear predictor)
  • Support for built-in glm and lm objects, as well as
    • bayesglm from package:arm
    • cv.glmnet from package:glmnet
    • glmboost from package:mboost (only Gaussian models) Except for glmboost, all link functions that can be represented in SQL are supported for all packages.
  • Using a custom link function by name. This is useful if, e.g., your database provides probit or tobit functions.
  • Support for various formula features (in particular :, I() and model.matrix-style factor expansion).

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("sqlscore")

0.1.4 by William Brannon, 5 months ago


https://github.com/wwbrannon/sqlscore/


Report a bug at https://github.com/wwbrannon/sqlscore/issues


Browse source code at https://github.com/cran/sqlscore


Authors: William Brannon [aut, cre]


Documentation:   PDF Manual  


MIT + file LICENSE license


Imports dbplyr

Suggests testthat, arm, glmnet, mboost, covr


See at CRAN