A database interface definition for communication between R and relational database management systems. All classes in this package are virtual and need to be extended by the various R/DBMS implementations.
The DBI package defines a common interface between the R and database management systems (DBMS). The interface defines a small set of classes and methods similar in spirit to Perl's DBI, Java's JDBC, Python's DB-API, and Microsoft's ODBC. It defines a set of classes and methods defines what operations are possible and how they are performed:
DBI separates the connectivity to the DBMS into a "front-end" and a "back-end". Applications use only the exposed "front-end" API. The facilities that communicate with specific DBMSs (SQLite, MySQL, PostgreSQL, MonetDB, etc.) are provided by "drivers" (other packages) that get invoked automatically through S4 methods.
The following example illustrates some of the DBI capabilities:
library(DBI)# Create an ephemeral in-memory RSQLite databasecon <- dbConnect(RSQLite::SQLite(), dbname = ":memory:") dbListTables(con)dbWriteTable(con, "mtcars", mtcars)dbListTables(con) dbListFields(con, "mtcars")dbReadTable(con, "mtcars") # You can fetch all results:res <- dbSendQuery(con, "SELECT * FROM mtcars WHERE cyl = 4")dbFetch(res)dbClearResult(res) # Or a chunk at a timeres <- dbSendQuery(con, "SELECT * FROM mtcars WHERE cyl = 4")while(!dbHasCompleted(res)){ chunk <- dbFetch(res, n = 5) print(nrow(chunk))}dbClearResult(res) dbDisconnect(con)
To install DBI:
install.packages("DBI")
devtools::install_github("r-dbi/DBI")
Discussions associated with DBI and related database packages take place on R-SIG-DB. The website Databases using R describes the tools and best practices in this ecosystem.
There are four main DBI classes. Three which are each extended by individual database backends:
DBIObject
: a common base class for all DBI.
DBIDriver
: a base class representing overall DBMS properties.
Typically generator functions instantiate the driver objects like RSQLite()
,
RPostgreSQL()
, RMySQL()
etc.
DBIConnection
: represents a connection to a specific database
DBIResult
: the result of a DBMS query or statement.
All classes are virtual: they cannot be instantiated directly and instead must be subclassed.
The following history of DBI was contributed by David James, the driving force behind the development of DBI, and many of the packages that implement it.
The idea/work of interfacing S (originally S3 and S4) to RDBMS goes back to the mid- and late 1990's in Bell Labs. The first toy interface I did was to implement John Chamber's early concept of "Data Management in S" (1991). The implementation followed that interface pretty closely and immediately showed some of the limitations when dealing with very large databases; if my memory serves me, the issue was the instance-based of the language back then, e.g., if you attached an RDBMS to the search()
path and then needed to resolve a symbol "foo", you effectively had to bring all the objects in the database to check their mode/class, i.e., the instance object had the metadata in itself as attributes. The experiment showed that the S3 implementation of "data management" was not really suitable to large external RDBMS (probably it was never intended to do that anyway). (Note however, that since then, John and Duncan Temple Lang generalized the data management in S4 a lot, including Duncan's implementation in his RObjectTables package where he considered a lot of synchronization/caching issues relevant to DBI and, more generally, to most external interfaces).
Back then we were working very closely with Lucent's microelectronics manufacturing --- our colleagues there had huge Oracle (mostly) databases that we needed to constantly query via SQLPlus. My colleague Jake Luciani was developing advanced applications in C and SQL, and the two of us came up with the first implementation of S3 directly connecting with Oracle. What I remember is that the Linux PROC pre-compiler (that embedded SQL in C code) was very buggy --- we spent a lot of time looking for workarounds and tricks until we got the C interface running. At the time, other projects within Bell Labs began using MySQL, and we moved to MySQL (with the help of Doug Bates' student Saikat DebRoy, then a summer intern) with no intentions of looking back at the very difficult Oracle interface. It was at this time that I moved all the code from S3 methods to S4 classes and methods and begun reaching out to the S/R community for suggestions, ideas, etc. All (most) of this work was on Bell Labs versions of S3 and S4, but I made sure it worked with S-Plus. At some point around 2000 (I don't remember exactly when), I ported all the code to R regressing to S3 methods, and later on (once S4 classes and methods were available in R) I re-implemented everything back to S4 classes and methods in R (a painful back-and-forth). It was at this point that I decided to drop S-Plus altogether. Around that time, I came across a very early implementation of SQLite and I was quite interested and thought it was a very nice RDBMS that could be used for all kinds of experimentation, etc., so it was pretty easy to implement on top of the DBI.
Within the R community, there were quite a number of people that showed interest on defining a common interface to databases, but only a few folks actually provided code/suggestions/etc. (Tim Keitt was most active with the dbi/PostgreSQL packages --- he also was considering what he called "proxy" objects, which was reminiscent of what Duncan had been doing). Kurt Hornick, Vincent Carey, Robert Gentleman, and others provided suggestions/comments/support for the DBI definition. By around 2003, the DBI was more or less implemented as it is today.
I'm sure I'll forget some (most should be in the THANKS sections of the various packages), but the names that come to my mind at this moment are Jake Luciani (ROracle), Don MacQueen and other early ROracle users (super helpful), Doug Bates and his student Saikat DebRoy for RMySQL, Fei Chen (at the time a student of Prof. Ripley) also contributed to RMySQL, Tim Keitt (working on an early S3 interface to PostgrSQL), Torsten Hothorn (worked with mSQL and also MySQL), Prof. Ripley working/extending the RODBC package, in addition to John Chambers and Duncan Temple-Lang who provided very important comments and suggestions.
Actually, the real impetus behind the DBI was always to do distributed statistical computing --- not to provide a yet-another import/export mechanism --- and this perspective was driven by John and Duncan's vision and work on inter-system computing, COM, CORBA, etc. I'm not sure many of us really appreciated (even now) the full extent of those ideas and concepts. Just like in other languages (C's ODBC, Java's JDBC, Perl's DBI/DBD, Python dbapi), R/S DBI was meant to unify the interfacing to RDBMS so that R/S applications could be developed on top of the DBI and not be hard coded to any one relation database. The interface I tried to follow the closest was the Python's DBAPI --- I haven't worked on this topic for a while, but I still feel Python's DBAPI is the cleanest and most relevant for the S language.
dbAppendTable()
that by default calls sqlAppendTableTemplate()
and then dbExecute()
with a param
argument, without support for row.names
argument (#74).dbCreateTable()
that by default calls sqlCreateTable()
and then dbExecute()
, without support for row.names
argument (#74).dbCanConnect()
generic with default implementation (#87).dbIsReadOnly()
generic with default implementation (#190, @anhqle).sqlAppendTable()
now accepts lists for the values
argument, to support lists of SQL
objects in R 3.1.dbListFields(DBIConnection, Id)
, this relies on dbQuoteIdentifier(DBIConnection, Id)
(#75).dbGetQuery()
, dbSendQuery()
, dbExecute()
and dbSendStatement()
.dbColumnInfo()
method is now fully specified (#75).dbListFields()
method is now fully specified (#75).value
argument to secondary dbWriteTable()
call (#737, @jimhester).Id
class now uses <Id>
and not <Table>
when printing.dbUnquoteIdentifier()
implementation now complies to the spec.SQL()
now strips the names from the output if the names
argument is unset.dbReadTable()
, dbWriteTable()
, dbExistsTable()
, dbRemoveTable()
, and dbListFields()
generics now specialize over the first two arguments to support implementations with the Id
S4 class as type for the second argument. Some packages may need to update their documentation to satisfy R CMD check again.Id()
, new generics dbListObjects()
and dbUnquoteIdentifier()
, methods for Id
that call dbQuoteIdentifier()
and then forward (#220).dbQuoteLiteral()
generic. The default implementation uses switchpatch to avoid dispatch ambiguities, and forwards to dbQuoteString()
for character vectors. Backends may override methods that also dispatch on the second argument, but in this case also an override for the "SQL"
class is necessary (#172).dbQuoteIdentifier()
and dbQuoteLiteral()
preserve names, default implementation of dbQuoteString()
strips names (#173).dbQuoteString()
and dbQuoteIdentifier()
are available again, for compatibility with clients that use getMethod()
to access them (#218).dbListFields()
.dbReadTable()
now has row.names = FALSE
as default and also supports row.names = NULL
(#186).SQL()
function gains an optional names
argument which can be used to assign names to SQL strings.dbListConnections()
is soft-deprecated by documentation.dbListResults()
is deprecated by documentation (#58).dbGetException()
is soft-deprecated by documentation (#51).print.list.pairs()
has been removed.dbDataType()
for AsIs
object (#198, @yutannihilation).dbQuoteString()
and dbQuoteIdentifier()
to ignore invalid UTF-8 strings (r-dbi/DBItest#156).sqlInterpolate()
now supports both named and positional variables (#216, @hannesmuehleisen).DESCRIPTION
(@wibeasley, #207).dbQuoteString()
and dbQuoteIdentifier()
.DBItest
.dbGetQuery()
now accepts an n
argument and forwards it to dbFetch()
. No warning about pending rows is issued anymore (#76).slots
argument of setClass()
) (#169, @mvkorpel).dbReadTable()
for backends that do not provide their own implementation (#171).Interface changes
dbDriver()
and dbUnloadDriver()
by documentation (#21).sqlInterpolate()
and sqlParseVariables()
to be more consistent with the rest of the interface, and added .dots
argument to sqlParseVariables
. DBI drivers are now expected to implement sqlParseVariables(conn, sql, ..., .dots)
and sqlInterpolate(conn, sql, ...)
(#147).Interface enhancements
valueClass = "logical"
for those generics where the return value is meaningless, to allow backends to return invisibly (#135).dbReadTable()
.dbQuoteString()
and dbQuoteIdentifier()
(#77).tryCatch()
call in dbGetQuery()
(#113).Documentation improvements
DBItest
, only those where the behavior is not finally decided don't do this yet yet.max.connections
requirement from documentation (#56).dbBind()
documentation and example (#136).omegahat.org
URL to omegahat.net
, the particular document still doesn't exist below the new domain.Internal
tic
package for building documentation.Interface changes
dbDataType()
maps character
values to "TEXT"
by default (#102).dbQuoteString()
doesn't call encodeString()
anymore: Neither SQLite nor Postgres understand e.g. \n
in a string literal, and all of SQLite, Postgres, and MySQL accept an embedded newline (#121).Interface enhancements
dbSendStatement()
generic, forwards to dbSendQuery()
by default (#20, #132).dbExecute()
, calls dbSendStatement()
by default (#109, @bborgesr).dbWithTransaction()
that calls dbBegin()
and dbCommit()
, and dbRollback()
on failure (#110, @bborgesr).dbBreak()
function which allows aborting from within dbWithTransaction()
(#115, #133).dbFetch()
and dbQuoteString()
methods.Documentation improvements:
dbConnect()
documentation (#118).dbDataType()
documentation.staticdocs
is now uploaded to http://rstats-db.github.io/DBI for each build of the "production" branch (#131).Internal
contains
argument instead of representation()
to denote base classes (#93).show()
implementations silently ignore all errors. Some DBI drivers (e.g., RPostgreSQL) might fail to implement dbIsValid()
or the other methods used.New package maintainer: Kirill Müller.
dbGetInfo()
gains a default method that extracts the information from
dbGetStatement()
, dbGetRowsAffected()
, dbHasCompleted()
, and
dbGetRowCount()
. This means that most drivers should no longer need to
implement dbGetInfo()
(which may be deprecated anyway at some point) (#55).
dbDataType()
and dbQuoteString()
are now properly exported.
The default implementation for dbDataType()
(powered by dbiDataType()
) now
also supports difftime
and AsIs
objects and lists of raw
(#70).
Default dbGetQuery()
method now always calls dbFetch()
, in a tryCatch()
block.
New generic dbBind()
for binding values to a parameterised query.
DBI gains a number of SQL generation functions. These make it easier to write backends by implementing common operations that are slightly tricky to do absolutely correctly.
sqlCreateTable()
and sqlAppendTable()
create tables from a data
frame and insert rows into an existing table. These will power most
implementations of dbWriteTable()
. sqlAppendTable()
is useful
for databases that support parameterised queries.
sqlRownamesToColumn()
and sqlColumnToRownames()
provide a standard
way of translating row names to and from the database.
sqlInterpolate()
and sqlParseVariables()
allows databases without
native parameterised queries to use parameterised queries to avoid
SQL injection attacks.
sqlData()
is a new generic that converts a data frame into a data
frame suitable for sending to the database. This is used to (e.g.)
ensure all character vectors are encoded as UTF-8, or to convert
R varible types (like factor) to types supported by the database.
The sqlParseVariablesImpl()
is now implemented purely in R, with full
test coverage (#83, @hannesmuehleisen).
dbiCheckCompliance()
has been removed, the functionality is now available
in the DBItest
package (#80).
Added default show()
methods for driver, connection and results.
New concrete ANSIConnection
class and ANSI()
function to generate a dummy
ANSI compliant connection useful for testing.
Default dbQuoteString()
and dbQuoteIdentifer()
methods now use
encodeString()
so that special characters like \n
are correctly escaped.
dbQuoteString()
converts NA
to (unquoted) NULL.
The initial DBI proposal and DBI version 1 specification are now included as a vignette. These are there mostly for historical interest.
The new DBItest
package is described in the vignette.
Deprecated print.list.pairs()
.
Removed unused dbi_dep()
.
Actually export dbIsValid()
:/
dbGetQuery()
uses dbFetch()
in the default implementation.
dbIsValid()
returns a logical value describing whether a connection or
result set (or other object) is still valid. (#12).
dbQuoteString()
and dbQuoteIdentifier()
to implement database specific
quoting mechanisms.
dbFetch()
added as alias to fetch()
to provide consistent name.
Implementers should define methods for both fetch()
and dbFetch()
until
fetch()
is deprecated in 2015. For now, the default method for dbFetch()
calls fetch()
.
dbBegin()
begins a transaction (#17). If not supported, DB specific
methods should throw an error (as should dbCommit()
and dbRollback()
).
dbGetStatement()
, dbGetRowsAffected()
, dbHasCompleted()
, and
dbGetRowCount()
gain default methods that extract the appropriate elements
from dbGetInfo()
. This means that most drivers should no longer need to
implement these methods (#13).
dbGetQuery()
gains a default method for DBIConnection
which uses
dbSendQuery()
, fetch()
and dbClearResult()
.
The following functions are soft-deprecated. They are going away, and developers who use the DBI should begin preparing. The formal deprecation process will begin in July 2015, where these function will emit warnings on use.
fetch()
is replaced by dbFetch()
.
make.db.names()
, isSQLKeyword()
and SQLKeywords()
: a black list
based approach is fundamentally flawed; instead quote strings and
identifiers with dbQuoteIdentifier()
and dbQuoteString()
.
dbGetDBIVersion()
is deprecated since it's now just a thin wrapper
around packageVersion("DBI")
.
dbSetDataMappings()
(#9) and dbCallProc()
(#7) are deprecated as no
implementations were ever provided.
dbiCheckCompliance()
makes it easier for implementors to check that their
package is in compliance with the DBI specification.
All examples now use the RSQLite package so that you can easily try out the code samples (#4).
dbDriver()
gains a more effective search mechanism that doesn't rely on
packages being loaded (#1).
DBI has been converted to use roxygen2 for documentation, and now most functions have their own documentation files. I would love your feedback on how we could make the documentation better!
Code cleanups contributed by Matthias Burger: avoid partial argument name matching and use TRUE/FALSE, not T/F.
Change behavior of make.db.names.default to quote SQL keywords if allow.keywords is FALSE. Previously, SQL keywords would be name mangled with underscores and a digit. Now they are quoted using '"'.
Changed license from GPL to LPGL
Fixed a trivial typo in documentation
Removed the "valueClass" from some generic functions, namely, dbListConnections, dbListResults, dbGetException, dbGetQuery, and dbGetInfo. The reason is that methods for these generics could potentially return different classes of objects (e.g., the call dbGetInfo(res) could return a list of name-value pairs, while dbGetInfo(res, "statement") could be a character vector).
Added 00Index to inst/doc
Added dbGetDBIVersion() (simple wrapper to package.description).