Data Manipulation with Parallelism and Shared Memory Matrices

Provides a new form of data frame backed by shared memory matrices and a way to manipulate them. Upon creation these data frames are shared across multiple local nodes to allow for simple parallel processing.


multiplyr provides a simple interface for manipulating data combined with easy parallel processing capabilities. It's intended that this works very similarly (eventually almost interchangably) with the dplyr package, as many people may be familiar with that already.

dat <- Multiplyr (x=1:100, G=rep(c("A", "B", "C", "D"), each=25), alloc=2)
 
# Group data (A, B, C, D)
dat %>% group_by (G)
 
# Create a new variable (y) with random data, the same length as x
dat %>% mutate (y=rnorm(length(x)))
 
# Remove any rows where y < 0
dat %>% filter (y<0)
 
# Summarise to give 4 rows (A, B, C, D), with number of rows in each group
dat %>% summarise (N=length(x))

Run the following code once multiplyr is installed for more details:

vignette ("basics")

Install latest version from CRAN:

install.packages ("multiplyr")

Install latest stable development version:

# install.packages("devtools")
devtools::install_github("jeblundell/multiplyr", ref="stable", build_vignettes = TRUE)
  • master: represents the version currently in CRAN
  • stable: the latest commit from develop that passes all tests
  • develop: current state of development

News

  • Fixed "no function to return from" bug
  • Group boundaries are now cached and distributed by shared memory matrix
  • Fixed group_by bug, e.g. when using nycflights13 data
  • arrange now supports desc(...)
  • Implemented add_rownames, between, cumall, cumany, cummean, first, lag, lead, n, n_distinct, n_groups, nth
  • Extended bigmemory::morder functionality to allow mixes of ascending/descending (bm_morder and bm_mpermute)
  • Made preparations for multiple data frames on same cluster
  • Added URL/BugReports to package description
  • Added multiplyr.cores option (uses environment variable R_MULTIPLYR_CORES)
  • Removed dependency on lazyeval
  • First version of package. See basics vignette for a good overview

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("multiplyr")

0.1.1 by Jim Blundell, a year ago


http://github.com/jeblundell/multiplyr/


Report a bug at https://github.com/jeblundell/multiplyr/issues/new


Browse source code at https://github.com/cran/multiplyr


Authors: Jim Blundell [aut, cre, cph]


Documentation:   PDF Manual  


GPL-3 license


Imports bigmemory, bigmemory.sri, parallel, methods

Depends on magrittr

Suggests testthat, knitr, rmarkdown


See at CRAN