R Database Interface

A database interface definition for communication between R and relational database management systems. All classes in this package are virtual and need to be extended by the various R/DBMS implementations.


The DBI package defines a common interface between the R and database management systems (DBMS). The interface defines a small set of classes and methods similar in spirit to Perl's DBI, Java's JDBC, Python's DB-API, and Microsoft's ODBC. It defines a set of classes and methods defines what operations are possible and how they are performed:

  • connect/disconnect to the DBMS
  • create and execute statements in the DBMS
  • extract results/output from statements
  • error/exception handling
  • information (meta-data) from database objects
  • transaction management (optional)

DBI separates the connectivity to the DBMS into a "front-end" and a "back-end". Applications use only the exposed "front-end" API. The facilities that communicate with specific DBMSs (SQLite, MySQL, PostgreSQL, MonetDB, etc.) are provided by "drivers" (other packages) that get invoked automatically through S4 methods.

The following example illustrates some of the DBI capabilities:

library(DBI)
# Create an ephemeral in-memory RSQLite database
con <- dbConnect(RSQLite::SQLite(), dbname = ":memory:")
 
dbListTables(con)
dbWriteTable(con, "mtcars", mtcars)
dbListTables(con)
 
dbListFields(con, "mtcars")
dbReadTable(con, "mtcars")
 
# You can fetch all results:
res <- dbSendQuery(con, "SELECT * FROM mtcars WHERE cyl = 4")
dbFetch(res)
dbClearResult(res)
 
# Or a chunk at a time
res <- dbSendQuery(con, "SELECT * FROM mtcars WHERE cyl = 4")
while(!dbHasCompleted(res)){
  chunk <- dbFetch(res, n = 5)
  print(nrow(chunk))
}
dbClearResult(res)
 
dbDisconnect(con)

To install DBI:

  • Get the released version from CRAN: install.packages("DBI")
  • Get the development version from github: devtools::install_github("rstats-db/DBI")

Discussions associated with DBI and related database packages take place on R-SIG-DB.

There are four main DBI classes. Three which are each extended by individual database backends:

  • DBIObject: a common base class for all DBI.

  • DBIDriver: a base class representing overall DBMS properties. Typically generator functions instantiate the driver objects like RSQLite(), RPostgreSQL(), RMySQL() etc.

  • DBIConnection: represents a connection to a specific database

  • DBIResult: the result of a DBMS query or statement.

All classes are virtual: they cannot be instantiated directly and instead must be subclassed.

The following history of DBI was contributed by David James, the driving force behind the development of DBI, and many of the packages that implement it.

The idea/work of interfacing S (originally S3 and S4) to RDBMS goes back to the mid- and late 1990's in Bell Labs. The first toy interface I did was to implement John Chamber's early concept of "Data Management in S" (1991). The implementation followed that interface pretty closely and immediately showed some of the limitations when dealing with very large databases; if my memory serves me, the issue was the instance-based of the language back then, e.g., if you attached an RDBMS to the search() path and then needed to resolve a symbol "foo", you effectively had to bring all the objects in the database to check their mode/class, i.e., the instance object had the metadata in itself as attributes. The experiment showed that the S3 implementation of "data management" was not really suitable to large external RDBMS (probably it was never intended to do that anyway). (Note however, that since then, John and Duncan Temple Lang generalized the data management in S4 a lot, including Duncan's implementation in his RObjectTables package where he considered a lot of synchronization/caching issues relevant to DBI and, more generally, to most external interfaces).

Back then we were working very closely with Lucent's microelectronics manufacturing --- our colleagues there had huge Oracle (mostly) databases that we needed to constantly query via SQL*Plus. My colleague Jake Luciani was developing advanced applications in C and SQL, and the two of us came up with the first implementation of S3 directly connecting with Oracle. What I remember is that the Linux PRO*C pre-compiler (that embedded SQL in C code) was very buggy --- we spent a lot of time looking for workarounds and tricks until we got the C interface running. At the time, other projects within Bell Labs began using MySQL, and we moved to MySQL (with the help of Doug Bates' student Saikat DebRoy, then a summer intern) with no intentions of looking back at the very difficult Oracle interface. It was at this time that I moved all the code from S3 methods to S4 classes and methods and begun reaching out to the S/R community for suggestions, ideas, etc. All (most) of this work was on Bell Labs versions of S3 and S4, but I made sure it worked with S-Plus. At some point around 2000 (I don't remember exactly when), I ported all the code to R regressing to S3 methods, and later on (once S4 classes and methods were available in R) I re-implemented everything back to S4 classes and methods in R (a painful back-and-forth). It was at this point that I decided to drop S-Plus altogether. Around that time, I came across a very early implementation of SQLite and I was quite interested and thought it was a very nice RDBMS that could be used for all kinds of experimentation, etc., so it was pretty easy to implement on top of the DBI.

Within the R community, there were quite a number of people that showed interest on defining a common interface to databases, but only a few folks actually provided code/suggestions/etc. (Tim Keitt was most active with the dbi/PostgreSQL packages --- he also was considering what he called "proxy" objects, which was reminiscent of what Duncan had been doing). Kurt Hornick, Vincent Carey, Robert Gentleman, and others provided suggestions/comments/support for the DBI definition. By around 2003, the DBI was more or less implemented as it is today.

I'm sure I'll forget some (most should be in the THANKS sections of the various packages), but the names that come to my mind at this moment are Jake Luciani (ROracle), Don MacQueen and other early ROracle users (super helpful), Doug Bates and his student Saikat DebRoy for RMySQL, Fei Chen (at the time a student of Prof. Ripley) also contributed to RMySQL, Tim Keitt (working on an early S3 interface to PostgrSQL), Torsten Hothorn (worked with mSQL and also MySQL), Prof. Ripley working/extending the RODBC package, in addition to John Chambers and Duncan Temple-Lang who provided very important comments and suggestions.

Actually, the real impetus behind the DBI was always to do distributed statistical computing --- not to provide a yet-another import/export mechanism --- and this perspective was driven by John and Duncan's vision and work on inter-system computing, COM, CORBA, etc. I'm not sure many of us really appreciated (even now) the full extent of those ideas and concepts. Just like in other languages (C's ODBC, Java's JDBC, Perl's DBI/DBD, Python dbapi), R/S DBI was meant to unify the interfacing to RDBMS so that R/S applications could be developed on top of the DBI and not be hard coded to any one relation database. The interface I tried to follow the closest was the Python's DBAPI --- I haven't worked on this topic for a while, but I still feel Python's DBAPI is the cleanest and most relevant for the S language.

News

DBI 0.5-1 (2016-09-09)

  • Documentation and example updates.

DBI 0.5 (2016-08-11, CRAN release)

  • Interface changes

    • dbDataType() maps character values to "TEXT" by default (#102).
    • The default implementation of dbQuoteString() doesn't call encodeString() anymore: Neither SQLite nor Postgres understand e.g. \n in a string literal, and all of SQLite, Postgres, and MySQL accept an embedded newline (#121).
  • Interface enhancements

    • New dbSendStatement() generic, forwards to dbSendQuery() by default (#20, #132).
    • New dbExecute(), calls dbSendStatement() by default (#109, @bborgesr).
    • New dbWithTransaction() that calls dbBegin() and dbCommit(), and dbRollback() on failure (#110, @bborgesr).
    • New dbBreak() function which allows aborting from within dbWithTransaction() (#115, #133).
    • Export dbFetch() and dbQuoteString() methods.
  • Documentation improvements:

    • One example per function (except functions scheduled for deprecation) (#67).
    • Consistent layout and identifier naming.
    • Better documentation of generics by adding links to the class and related generics in the "See also" section under "Other DBI... generics" (#130). S4 documentation is directed to a hidden page to unclutter documentation index (#59).
    • Fix two minor vignette typos (#124, @mdsumner).
    • Add package documentation.
    • Remove misleading parts in dbConnect() documentation (#118).
    • Remove misleading link in dbDataType() documentation.
    • Remove full stop from documentation titles.
    • New help topic "DBIspec" that contains the full DBI specification (currently work in progress) (#129).
    • HTML documentation generated by staticdocs is now uploaded to http://rstats-db.github.io/DBI for each build of the "production" branch (#131).
    • Further minor changes and fixes.
  • Internal

    • Use contains argument instead of representation() to denote base classes (#93).
    • Remove redundant declaration of transaction methods (#110, @bborgesr).

DBI 0.4-1 (2016-05-07, CRAN release)

  • The default show() implementations silently ignore all errors. Some DBI drivers (e.g., RPostgreSQL) might fail to implement dbIsValid() or the other methods used.

DBI 0.4 (2016-04-30)

  • New package maintainer: Kirill Müller.

  • dbGetInfo() gains a default method that extracts the information from dbGetStatement(), dbGetRowsAffected(), dbHasCompleted(), and dbGetRowCount(). This means that most drivers should no longer need to implement dbGetInfo() (which may be deprecated anyway at some point) (#55).

  • dbDataType() and dbQuoteString() are now properly exported.

  • The default implementation for dbDataType() (powered by dbiDataType()) now also supports difftime and AsIs objects and lists of raw (#70).

  • Default dbGetQuery() method now always calls dbFetch(), in a tryCatch() block.

  • New generic dbBind() for binding values to a parameterised query.

  • DBI gains a number of SQL generation functions. These make it easier to write backends by implementing common operations that are slightly tricky to do absolutely correctly.

    • sqlCreateTable() and sqlAppendTable() create tables from a data frame and insert rows into an existing table. These will power most implementations of dbWriteTable(). sqlAppendTable() is useful for databases that support parameterised queries.

    • sqlRownamesToColumn() and sqlColumnToRownames() provide a standard way of translating row names to and from the database.

    • sqlInterpolate() and sqlParseVariables() allows databases without native parameterised queries to use parameterised queries to avoid SQL injection attacks.

    • sqlData() is a new generic that converts a data frame into a data frame suitable for sending to the database. This is used to (e.g.) ensure all character vectors are encoded as UTF-8, or to convert R varible types (like factor) to types supported by the database.

    • The sqlParseVariablesImpl() is now implemented purely in R, with full test coverage (#83, @hannesmuehleisen).

  • dbiCheckCompliance() has been removed, the functionality is now available in the DBItest package (#80).

  • Added default show() methods for driver, connection and results.

  • New concrete ANSIConnection class and ANSI() function to generate a dummy ANSI compliant connection useful for testing.

  • Default dbQuoteString() and dbQuoteIdentifer() methods now use encodeString() so that special characters like \n are correctly escaped. dbQuoteString() converts NA to (unquoted) NULL.

  • The initial DBI proposal and DBI version 1 specification are now included as a vignette. These are there mostly for historical interest.

  • The new DBItest package is described in the vignette.

  • Deprecated print.list.pairs().

  • Removed unused dbi_dep().

Version 0.3.1

  • Actually export dbIsValid() :/

  • dbGetQuery() uses dbFetch() in the default implementation.

Version 0.3.0

  • dbIsValid() returns a logical value describing whether a connection or result set (or other object) is still valid. (#12).

  • dbQuoteString() and dbQuoteIdentifier() to implement database specific quoting mechanisms.

  • dbFetch() added as alias to fetch() to provide consistent name. Implementers should define methods for both fetch() and dbFetch() until fetch() is deprecated in 2015. For now, the default method for dbFetch() calls fetch().

  • dbBegin() begins a transaction (#17). If not supported, DB specific methods should throw an error (as should dbCommit() and dbRollback()).

  • dbGetStatement(), dbGetRowsAffected(), dbHasCompleted(), and dbGetRowCount() gain default methods that extract the appropriate elements from dbGetInfo(). This means that most drivers should no longer need to implement these methods (#13).

  • dbGetQuery() gains a default method for DBIConnection which uses dbSendQuery(), fetch() and dbClearResult().

  • The following functions are soft-deprecated. They are going away, and developers who use the DBI should begin preparing. The formal deprecation process will begin in July 2015, where these function will emit warnings on use.

    • fetch() is replaced by dbFetch().

    • make.db.names(), isSQLKeyword() and SQLKeywords(): a black list based approach is fundamentally flawed; instead quote strings and identifiers with dbQuoteIdentifier() and dbQuoteString().

  • dbGetDBIVersion() is deprecated since it's now just a thin wrapper around packageVersion("DBI").

  • dbSetDataMappings() (#9) and dbCallProc() (#7) are deprecated as no implementations were ever provided.

  • dbiCheckCompliance() makes it easier for implementors to check that their package is in compliance with the DBI specification.

  • All examples now use the RSQLite package so that you can easily try out the code samples (#4).

  • dbDriver() gains a more effective search mechanism that doesn't rely on packages being loaded (#1).

  • DBI has been converted to use roxygen2 for documentation, and now most functions have their own documentation files. I would love your feedback on how we could make the documentation better!

Version 0.2-7

  • Trivial changes (updated package fields, daj)

Version 0.2-6

  • Removed deprecated \synopsis in some Rd files (thanks to Prof. Ripley)

Version 0.2-5

  • Code cleanups contributed by Matthias Burger: avoid partial argument name matching and use TRUE/FALSE, not T/F.

  • Change behavior of make.db.names.default to quote SQL keywords if allow.keywords is FALSE. Previously, SQL keywords would be name mangled with underscores and a digit. Now they are quoted using '"'.

Version 0.2-4

  • Changed license from GPL to LPGL

  • Fixed a trivial typo in documentation

Version 0.1-10

  • Fixed documentation typos.

Version 0.1-9

  • Trivial changes.

Version 0.1-8

  • A trivial change due to package.description() being deprecated in 1.9.0.

Version 0.1-7

  • Had to do a substantial re-formatting of the documentation due to incompatibilities introduced in 1.8.0 S4 method documentation. The contents were not changed (modulo fixing a few typos). Thanks to Kurt Hornik and John Chambers for their help.

Version 0.1-6

  • Trivial documentation changes (for R CMD check's sake)

Version 0.1-5

  • Removed duplicated setGeneric("dbSetDataMappings")

Version 0.1-4

  • Removed the "valueClass" from some generic functions, namely, dbListConnections, dbListResults, dbGetException, dbGetQuery, and dbGetInfo. The reason is that methods for these generics could potentially return different classes of objects (e.g., the call dbGetInfo(res) could return a list of name-value pairs, while dbGetInfo(res, "statement") could be a character vector).

  • Added 00Index to inst/doc

  • Added dbGetDBIVersion() (simple wrapper to package.description).

Version 0.1-3

  • ??? Minor changes?

Version 0.1-2

  • An implementation based on version 4 classes and methods.
  • Incorporated (mostly Tim Keitt's) comments.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("DBI")

0.7 by Kirill Müller, 4 months ago


http://rstats-db.github.io/DBI


Report a bug at https://github.com/rstats-db/DBI/issues


Browse source code at https://github.com/cran/DBI


Authors: R Special Interest Group on Databases (R-SIG-DB) [aut], Hadley Wickham [aut], Kirill Müller [aut, cre]


Documentation:   PDF Manual  


LGPL (>= 2) license


Depends on methods

Suggests blob, covr, hms, knitr, magrittr, rprojroot, rmarkdown, RSQLite, testthat, xml2


Imported by BETS, BIEN, BatchExperiments, BatchJobs, CITAN, D3GB, DBItest, DMwR2, InterpretMSSpectrum, MetaIntegrator, MonetDB.R, MonetDBLite, RMariaDB, RODBCDBI, RObsDat, RPresto, RSQLServer, RSQLite, SEERaBomb, TSMySQL, TSPostgreSQL, TSSQLite, TScompare, TSdbi, TSfame, TSmisc, TSodbc, TSsdmx, TSsql, UPMASK, acc, afmToolkit, anchoredDistr, archivist, bdlp, bigrquery, bikedata, chunked, civis, cranlike, dartR, dbfaker, dbplyr, dexter, emuR, etl, gcbd, genomicper, ggraptR, healthcareai, implyr, imputeMulti, liteq, macleish, marmap, mdsr, odbc, pleiades, pool, postGIStools, redcapAPI, refGenome, replyr, sejmRP, sf, smnet, snplist, sparklyr, sparkwarc, sqldf, sqliter, taxizedb, tcpl, tigreBrowserWriter, timeseriesdb, twitteR, vmsbase.

Depended on by ODB, RJDBC, RMySQL, ROracle, RPostgreSQL, RQDA, RRedshiftSQL, RecordLinkage, bibliospec, biglm, datamap, filehashSQLite, gmDatabase, ora, poplite, rpostgis, rpostgisLT, sergeant, sqlutils.

Suggested by ETLUtils, PivotalR, ProjectTemplate, TSdata, WhopGenome, aroma.affymetrix, convey, cytominer, dplyr, knitr, mitools, oai, oce, pitchRx, quantmod, storr, stream, survey, toxboot.


See at CRAN