Provides a new form of data frame backed by shared memory matrices and a way to manipulate them. Upon creation these data frames are shared across multiple local nodes to allow for simple parallel processing.
multiplyr provides a simple interface for manipulating data combined with easy parallel processing capabilities. It's intended that this works very similarly (eventually almost interchangably) with the dplyr package, as many people may be familiar with that already.
dat <- Multiplyr (x=1:100, G=rep(c("A", "B", "C", "D"), each=25), alloc=2)# Group data (A, B, C, D)dat %>% group_by (G)# Create a new variable (y) with random data, the same length as xdat %>% mutate (y=rnorm(length(x)))# Remove any rows where y < 0dat %>% filter (y<0)# Summarise to give 4 rows (A, B, C, D), with number of rows in each groupdat %>% summarise (N=length(x))
Run the following code once multiplyr is installed for more details:
Install latest version from CRAN:
Install latest stable development version:
# install.packages("devtools")devtools::install_github("jeblundell/multiplyr", ref="stable", build_vignettes = TRUE)