Provides a data.table backend for 'dplyr'. The goal of 'dtplyr' is to allow you to write 'dplyr' code that is automatically translated to the equivalent, but usually much faster, data.table code.
dtplyr is the data.table backend for dplyr. It provides S3 methods for data.table objects so that dplyr works the way you expect.
dtplyr will always be a bit slower than data.table, because it creates copies of objects rather than mutating in place (that's the dplyr philosophy). Currently, dtplyr is quite a lot slower than bare data.table because the methods aren't quite smart enough. I hope interested dplyr & data.table users from the community will help me to improve the performance.
dtplyr was extracted out of dplyr so it could evolve independently (i.e. more rapidly!) than dplyr. It also makes dplyr a little simpler, and it's easier to keep track of issues by backend.
You can install from CRAN with:
install.packages("dtplyr")
Or try the development version from GitHub with:
# install.packages("devtools")devtools::install_github("hadley/dtplyr")
Maintenance release for CRAN checks.
inner_join()
, left_join()
, right_join()
, and full_join()
: new suffix
argument which allows you to control what suffix duplicated variable names
receive, as introduced in dplyr 0.5 (#40, @christophsax).
Joins use extended merge.data.table()
and the on
argument, introduced in
data.table 1.9.6. Avoids copy and allows joins by different keys (#20, #21,
@christophsax).
distinct()
gains .keep_all
argument (#30, #31).
Slightly improve test coverage (#6).
Install devtools
from GitHub on Travis (#32).
Joins return data.table
. Right and full join are now implemented (#16, #19).
Remove warnings from tests (#4).
Extracted from dplyr
at revision e5f2952923028803.