Methods for dividing data into subsets, applying analytical methods to the subsets, and recombining the results. Comes with a generic MapReduce interface as well. Works with key-value pairs stored in memory, on local disk, or on HDFS, in the latter case using the R and Hadoop Integrated Programming Environment (RHIPE).
datadr is an R package that leverages RHIPE to provide a simple interface to division and recombination (D&R) methods for large complex data.
To get started, see the package documentation and function reference located here.
Visualization tools based on D&R can be found here.
# from CRAN:install.packages("datadr") # from github:devtools::install_github("delta-rho/datadr")
This software is currently under the BSD license. Please read the license document.
datadr development is sponsored by:
FEATURES / CHANGES
data.table
(0.8.6)control
option to makeExtractable()
(0.8.5)removeData()
method for local disk connections (0.8.4)drLM()
recombination method for fitting linear models (0.8.0)FIXES
combRbind()
(0.8.0)FEATURES / CHANGES
combDdf()
recombination methodkvApply()
handles outputdrAggregate()
so first argument is data to be consistentkvPair()
method for specifying a key-value pairdrPersist()
method to make transformations persistentto_ddf()
for converting dplyr grouped tbls to ddfsdrQuantile()
and drHexbin()
FIXES
drJoin()
to validate that input data sources are ddo'sdrRead.table()
not overwriting output for local disk casedrQuantile()
divide()
filtering on conditioning variables_rh_meta
to _meta
FEATURES / CHANGES
by
argument in drQuantile()
and drAggregate()
to be a vector of column namesoutput
ability to drAggregate
for returning a ddf when by
is specifiedBUG FIXES
drRead.table()
for reading local filesoverwrite
parameter when using local drRead.table()
drRead.table()
with RHIPE / Hadoop backenddrRead.table()
for HDFSFEATURES / CHANGES
addTransform()
method to specify transformations to be applied to
ddo/ddf objects with deferred evaluation (see
https://github.com/delta-rho/datadr/issues/24 for more information)drGetGlobals()
to properly traverse environments of user-defined
transformation functions and find all global variables and all package
dependenciespackages
argument to MapReduce-inducing functions to allow manual
specification of package dependencies required by user defined
transformationsoptions(defaultLocalDiskControl = ...)
, etc. so that you
do not always need to specify control=
in all MapReduce-inducing operationsdrGLM()
and drBLB()
methods to work with new transformation
approachkvPair()
and classes for making dealing with key-value pairs a bit more
aesthetic