A Fast, Easy-to-Use Database Interface

Provides select, insert, update, upsert, and delete database operations. Supports 'PostgreSQL', 'MySQL', 'SQLite', and more, and plays nicely with the 'DBI' package.


ūüĒ• A fast, easy-to-use database library for R

  • Intuitive functions
  • High performance batch operations
  • Safe inserts, updates, and deletes without writing SQL
  • Upserts!!
  • Great date and time support
  • Works well with auto-incrementing primary keys
  • Built on top of DBI

Designed for both research and production environments

Supports Postgres, MySQL, SQLite, and more

Screenshot

Build Status CRAN status

Installation

Install dbx

install.packages("dbx")

And follow the instructions for your database

To install with Jetpack, use:

jetpack::add("dbx")

Postgres

Install the R package

install.packages("RPostgres")

And use:

library(dbx)
 
db <- dbxConnect(adapter="postgres", dbname="mydb")

You can also pass user, password, host, port, and url.

MySQL

Install the R package

install.packages("RMySQL")

And use:

library(dbx)
 
db <- dbxConnect(adapter="mysql", dbname="mydb")

You can also pass user, password, host, port, and url.

Works with RMariaDB as well

SQLite

Install the R package

install.packages("RSQLite")

And use:

library(dbx)
 
db <- dbxConnect(adapter="sqlite", dbname=":memory:")

SQL Server

Install the R package

install.packages("odbc")

And use:

library(dbx)
 
db <- dbxConnect(adapter=odbc::odbc(), database="mydb")

You can also pass uid, pwd, server, and port.

Redshift

For Redshift, follow the Postgres instructions.

Others

Install the appropriate R package and use:

db <- dbxConnect(adapter=odbc::odbc(), database="mydb")

Operations

Select

Create a data frame of records from a SQL query

records <- dbxSelect(db, "SELECT * FROM forecasts")

Pass parameters

dbxSelect(db, "SELECT * FROM forecasts WHERE period = ? AND temperature > ?", params=list("hour", 27))

Parameters can also be vectors

dbxSelect(db, "SELECT * FROM forecasts WHERE id IN (?)", params=list(1:3))

Insert

Insert records

table <- "forecasts"
records <- data.frame(temperature=c(32, 25))
dbxInsert(db, table, records)

If you use auto-incrementing ids in Postgres, you can get the ids of newly inserted rows by passing the column name:

dbxInsert(db, table, records, returning=c("id"))

Update

Update records

records <- data.frame(id=c(1, 2), temperature=c(16, 13))
dbxUpdate(db, table, records, where_cols=c("id"))

Use where_cols to specify the columns used for lookup. Other columns are written to the table.

Updates are batched when possible, but often need to be run as multiple queries. We recommend upsert when possible for better performance, as it can always be run as a single query. Turn on logging to see the difference.

Upsert

Atomically insert if they don’t exist, otherwise update them

records <- data.frame(id=c(2, 3), temperature=c(20, 25))
dbxUpsert(db, table, records, where_cols=c("id"))

Use where_cols to specify the columns used for lookup. There must be a unique index on them, or an error will be thrown.

Only available for PostgreSQL 9.5+, MySQL 5.5+, and SQLite 3.24+

To skip existing rows instead of updating them, use:

dbxUpsert(db, table, records, where_cols=c("id"), skip_existing=TRUE)

If you use auto-incrementing ids in Postgres, you can get the ids of newly upserted rows by passing the column name:

dbxUpsert(db, table, records, where_cols=c("id"), returning=c("id"))

Delete

Delete specific records

bad_records <- data.frame(id=c(1, 2))
dbxDelete(db, table, where=bad_records)

Delete all records (uses TRUNCATE when possible for performance)

dbxDelete(db, table)

Execute

Execute a statement

dbxExecute(db, "UPDATE forecasts SET temperature = temperature + 1")

Pass parameters

dbxExecute(db, "UPDATE forecasts SET temperature = ? WHERE id IN (?)", params=list(27, 1:3))

Logging

Log all SQL queries with:

options(dbx_logging=TRUE)

Customize logging by passing a function

logQuery <- function(sql) {
  # your logging code
}
 
options(dbx_logging=logQuery)

Database Credentials

Environment variables are a convenient way to store database credentials. This keeps them outside your source control. It’s also how platforms like Heroku store them.

Create an .Renviron file in your home directory with:

DATABASE_URL=postgres://user:[email protected]/dbname

Install urltools:

install.packages("urltools")

And use:

db <- dbxConnect()

If you have multiple databases, use a different variable name, and:

db <- dbxConnect(url=Sys.getenv("OTHER_DATABASE_URL"))

Batching

By default, operations are performed in a single statement or transaction. This is better for performance and prevents partial writes on failures. However, when working with large data frames on production systems, it can be better to break writes into batches. Use the batch_size option to do this.

dbxInsert(db, table, records, batch_size=1000)
dbxUpdate(db, table, records, where_cols, batch_size=1000)
dbxUpsert(db, table, records, where_cols, batch_size=1000)
dbxDelete(db, table, records, where, batch_size=1000)

Query Comments

Add comments to queries to make it easier to see where time-consuming queries are coming from.

options(dbx_comment=TRUE)

The comment will be appended to queries, like:

SELECT * FROM users /*script:forecast.R*/

Set a custom comment with:

options(dbx_comment="hi")

Transactions

To perform multiple operations in a single transaction, use:

DBI::dbWithTransaction(db, {
  dbxInsert(db, ...)
  dbxDelete(db, ...)
})

Data Type Notes

Dates & Times

Dates are returned as Date objects and times as POSIXct objects. Times are stored in the database in UTC and converted to your local time zone when retrieved.

Times without dates are returned as character vectors since R has no built-in support for this type. If you use hms, you can convert columns with:

records$column <- hms::as.hms(records$column)

SQLite does not have support for TIME columns, so we recommend storing as VARCHAR.

JSON

JSON and JSONB columns are returned as character vectors. You can use jsonlite to parse them with:

records$column <- lapply(records$column, jsonlite::fromJSON)

SQLite does not have support for JSON columns, so we recommend storing as TEXT.

Binary Data

BLOB and BYTEA columns are returned as raw vectors.

Data Type Limitations

Dates & Times

RSQLite does not currently provide enough info to automatically typecast dates and times. You can manually typecast date columns with:

records$column <- as.Date(records$column)

And time columns with:

records$column <- as.POSIXct(records$column, tz="Etc/UTC")
attr(records$column, "tzone") <- Sys.timezone()

Booleans

RMariaDB and RSQLite do not currently provide enough info to automatically typecast booleans. You can manually typecast with:

records$column <- records$column != 0

JSON

RMariaDB does not currently support JSON.

Binary Data

RMySQL can write BLOB columns, but can’t retrieve them directly. To workaround this, use:

records <- dbxSelect(db, "SELECT HEX(column) AS column FROM table")
 
hexToRaw <- function(x) {
  y <- strsplit(x, "")[[1]]
  z <- paste0(y[c(TRUE, FALSE)], y[c(FALSE, TRUE)])
  as.raw(as.hexmode(z))
}
 
records$column <- lapply(records$column, hexToRaw)

Bigint

BIGINT columns are returned as numeric vectors. The numeric type in R loses precision above 253. Some libraries (RPostgres, RMariaDB, RSQLite, ODBC) support returning bit64::integer64 vectors instead.

dbxConnect(bigint="integer64")

Connection Pooling

Install the pool package

install.packages("pool")

Create a pool

library(pool)
 
factory <- function() {
  dbxConnect(adapter="postgres", ...)
}
 
pool <- poolCreate(factory, maxSize=5)

Run queries

conn <- poolCheckout(pool)
 
tryCatch({ 
  dbxSelect(conn, "SELECT * FROM forecasts")
}, finally={ 
  poolReturn(conn)
})

In the future, dbx commands may work directly with pools.

Security

When connecting to a database over a network you don’t fully trust, make sure your connection is secure.

With Postgres, use:

db <- dbxConnect(adapter="postgres", sslmode="verify-full", sslrootcert="ca.pem")

With RMariaDB, use:

db <- dbxConnect(adapter="mysql", ssl.ca="ca.pem")

Please let us know if you have a way that works with RMySQL.

Variables

Set session variables with:

db <- dbxConnect(variables=list(search_path="archive"))

Timeouts

Set a statement timeout with:

# Postgres
db <- dbxConnect(variables=list(statement_timeout=1000)) # ms
 
# MySQL 5.7.8+
db <- dbxConnect(variables=list(max_execution_time=1000)) # ms
 
# MariaDB 10.1.1+
db <- dbxConnect(variables=list(max_statement_time=1)) # sec

With Postgres, set a connect timeout with:

db <- dbxConnect(connect_timeout=3) # sec

Reference

To close a connection, use:

dbxDisconnect(db)

All connections are simply DBI connections, so you can use them with DBI functions as well.

dbGetInfo(db)

Upgrading

0.2.0

Version 0.2.0 brings a number of fixes and improvements to data types.

However, there a few breaking changes to be aware of:

  • The dbxInsert and dbxUpsert functions no longer return a data frame by default. For MySQL and SQLite, the data frame was just the records argument. For Postgres, if you use auto-incrementing primary keys, the data frame contained ids of the newly inserted/upserted records. To get the ids, pass name of the column as the returning argument:

    dbxInsert(db, table, records, returning=c("id"))
  • timestamp without time zone columns in Postgres are now stored in UTC instead of local time by default. This does not affect timestamp with time zone columns. To keep the previous behavior, use:

    dbxConnect(adapter="postgres", storage_tz=Sys.timezone(), ...)

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To get started with development and testing:

git clone https://github.com/ankane/dbx.git
cd dbx
 
# create Postgres database 
createdb dbx_test
 
# create MySQL database 
mysql -u root -e "CREATE DATABASE dbx_test"

In R, do:

install.packages("devtools")
devtools::install_deps(dependencies=TRUE)
devtools::test()

News

dbx 0.2.3

  • Added parameters for dbxSelect
  • Added skip_existing option to dbxUpsert
  • Fixed issue with batch_size option skipping partial batches

dbx 0.2.2

  • Added statement duration to logging
  • Added logging for dbxUpdate transaction
  • Added variables to dbxConnect
  • Added connect_timeout, sslcert, sslkey, and sslcrl for RPostgreSQL
  • Prefer dbx_logging over dbx_verbose
  • Improved support for ODBC
  • Fixed error with dbxUpsert

dbx 0.2.1

  • Added sslmode and sslrootcert for RPostgreSQL
  • Fixed error with dplyr check

dbx 0.2.0

  • Added support for fractional seconds
  • Fixed error with updates
  • Fixed times for RSQLite
  • Fixed typecasting for dates, times, and booleans with RMySQL
  • Fixed error when writing binary objects with RPostgres and RPostgreSQL
  • Fixed error when writing dates with RMySQL and RMariaDB
  • Fixed error when writing booleans with RPostgreSQL
  • Fixed error when writing hms objects

Breaking

  • The dbxInsert and dbxUpsert functions no longer return a data frame by default. For MySQL and SQLite, the data frame was just the records argument. For Postgres, if you use auto-incrementing primary keys, the data frame contained ids of the newly inserted/upserted records. To get the ids, pass name of the column as the returning argument:

    dbxInsert(db, table, records, returning=c("id"))
  • timestamp without time zone columns in Postgres are now stored in UTC instead of local time by default. This does not affect timestamp with time zone columns. To keep the previous behavior, use:

    dbxConnect(adapter="postgres", storage_tz=Sys.timezone(), ...)

dbx 0.1.0

  • First release

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("dbx")

0.2.4 by Andrew Kane, 2 months ago


https://github.com/ankane/dbx


Report a bug at https://github.com/ankane/dbx/issues


Browse source code at https://github.com/cran/dbx


Authors: Andrew Kane [aut, cre]


Documentation:   PDF Manual  


MIT + file LICENSE license


Imports DBI

Suggests testthat, urltools, RSQLite, RMariaDB, RMySQL, RPostgres, RPostgreSQL, hms, jsonlite, blob, odbc


See at CRAN