Submit R Calculations to a 'SLURM' Cluster

Functions that simplify the R interface the 'SLURM' cluster workload manager, and automate the process of dividing a parallel calculation across cluster nodes.


Many computing-intensive processes in R involve the repeated evaluation of a function over many items or parameter sets. These so-called embarrassingly parallel calculations can be run serially with the lapply or Map function, or in parallel on a single machine with mclapply or mcMap (from the parallel package).

The rslurm package simplifies the process of distributing this type of calculation across a computing cluster that uses the SLURM workload manager. Its main function, slurm_apply, automatically divides the computation over multiple nodes and writes the necessary submission scripts. It also includes functions to retrieve and combine the output from different nodes, as well as wrappers for common SLURM commands.

Development of this R package was supported by the National Socio-Environmental Synthesis Center (SESYNC) under funding received from the National Science Foundation DBI-1052875.

To illustrate a typical rslurm workflow, we use a simple function that takes a mean and standard deviation as parameters, generates a million normal deviates and returns the sample mean and standard deviation.

test_func <- function(par_mu, par_sd) {
    samp <- rnorm(10^6, par_mu, par_sd)
    c(s_mu = mean(samp), s_sd = sd(samp))
}

We then create a parameter data frame where each row is a parameter set and each column matches an argument of the function.

pars <- data.frame(par_mu = 1:10,
                   par_sd = seq(0.1, 1, length.out = 10))
head(pars, 3)
  par_mu par_sd
1      1    0.1
2      2    0.2
3      3    0.3

We can now pass that function and the parameters data frame to slurm_apply, specifiying the number of cluster nodes to use and the number of CPUs per node. The latter (cpus_per_node) determines how many processes will be forked on each node, as the mc.cores argument of parallel::mcMap.

library(rslurm)
sjob <- slurm_apply(test_func, pars, jobname = "test_job", 
                    nodes = 2, cpus_per_node = 2)

The output of slurm_apply is a slurm_job object that stores a few pieces of information (job name and number of nodes) needed to retrieve the job's output.

Assuming the function is run on a machine with access to the cluster, it also prints a message confirming the job has been submitted to SLURM.

Submitted batch job 352375

Particular clusters may require the specification of additional SLURM options, such as time and memory limits for the job. Also, when running R on a local machine without direct cluster access, you may want to generate scripts to be copied to the cluster and run at a later time. These topics are covered in additional sections below this basic example.

After the job has been submitted, you can call print_job_status to display its status (in queue, running or completed) or call cancel_slurm to cancel its execution. These functions are R wrappers for the SLURM command line functions squeue and scancel, respectively.

Once the job completes, get_slurm_out reads and combines the output from all nodes.

res <- get_slurm_out(sjob, outtype = "table")
head(res, 3)
      s_mu       s_sd
1 1.000005 0.09987899
2 2.000185 0.20001108
3 3.000238 0.29988789

When outtype = "table", the outputs from each function evaluation are row-bound into a single data frame; this is an appropriate format when the function returns a simple vector. The default outtype = "raw" combines the outputs into a list and can thus handle arbitrarily complex return objects.

res_raw <- get_slurm_out(sjob, outtype = "raw")
res_raw[1:3]
[[1]]
      s_mu       s_sd 
1.00000506 0.09987899 
 
[[2]]
     s_mu      s_sd 
2.0001852 0.2000111 
 
[[3]]
     s_mu      s_sd 
3.0002377 0.2998879 

The files generated by slurm_apply are saved in a folder named _rslurm_[jobname] under the current working directory.

dir("_rslurm_test_job")
[1] "params.RData"    "results_0.RData" "results_1.RData" "slurm_0.out"    
[5] "slurm_1.out"     "slurm_run.R"     "submit.sh" 

The utility function cleanup_files deletes the temporary folder for the specified slurm_job.

In addition to slurm_apply, rslurm also defines a slurm_call function, which sends a single function call to the cluster. It is analogous in syntax to the base R function do.call, accepting a function and a named list of parameters as arguments.

sjob <- slurm_call(test_func, list(par_mu = 5, par_sd = 1))

Because slurm_call involves a single process on a single node, it does not recognize the nodes and cpus_per_node arguments; otherwise, it accepts the same additional arguments (detailed in the sections below) as slurm_apply.

The function passed to slurm_apply can only receive atomic parameters stored within a data frame. Suppose we want instead to apply a function func to a list of complex R objects, obj_list. To use slurm_apply in this case, we can wrap func in an inline function that takes an integer parameter.

sjob <- slurm_apply(function(i) func(obj_list[[i]]), 
                    data.frame(i = seq_along(obj_list)),
                    add_objects = c("func", "obj_list"),
                    nodes = 2, cpus_per_node = 2)

The add_objects argument specifies the names of any R objects (besides the parameters data frame) that must be accessed by the function passed to slurm_apply. These objects are saved to a .RData file that is loaded on each cluster node prior to evaluating the function in parallel.

By default, all R packages attached to the current R session will also be attached (with library) on each cluster node, though this can be modified with the optional pkgs argument.

The slurm_options argument allows you to set any of the command line options (view list) recognized by the SLURM sbatch command. It should be formatted as a named list, using the long names of each option (e.g. "time" rather than "t"). Flags, i.e. command line options that are toggled rather than set to a particular value, should be set to TRUE in slurm_options. For example, the following code:

sjob <- slurm_apply(test_func, pars, 
                    slurm_options = list(time = "1:00:00", share = TRUE))

sets the command line options --time=1:00:00 --share.

When working from a R session without direct access to the cluster, you can set submit = FALSE within slurm_apply. The function will create the _rslurm_[jobname] folder and generate the scripts and .RData files, without submitting the job. You may then copy those files to the cluster and submit the job manually by calling sbatch submit.sh from the command line.

As mentioned above, the slurm_apply function creates a job-specific folder. This folder contains the parameters data frame and (if applicable) the objects specified as add_objects, both saved in .RData files. The function also generates a R script (slurm_run.R) to be run on each cluster node, as well as a Bash script (submit.sh) to submit the job to SLURM.

More specifically, the Bash script creates a SLURM job array, with each cluster node receiving a different value of the SLURM_ARRAY_TASK_ID environment variable. This variable is read by slurm_run.R, which allows each instance of the script to operate on a different parameter subset and write its output to a different results file. The R script calls parallel::mcMap to parallelize calculations on each node.

Both slurm_run.R and submit.sh are generated from templates, using the whisker package; these templates can be found in the rslurm/templates subfolder in your R package library. There are two templates for each script, one for slurm_apply and the other (with the word single in its title) for slurm_call.

While you should avoid changing any existing lines in the template scripts, you may want to add #SBATCH lines to the submit.sh templates in order to permanently set certain SLURM command line options and thus customize the package to your particular cluster setup.

News

rslurm 0.3.1

2016-06-18

  • Minor bug fix: specify full path of 'Rscript' when running batch scripts.

rslurm 0.3.0

2016-05-27

First version on CRAN

Major update to the package interface and implementation:

  • Added a submit argument to slurm_apply and slurm_call. If submit = FALSE, the submission scripts are created but not run. This is useful if the files need to be transferred from a local machine to the cluster and run at a later time.

  • Added new optional arguments to slurm_apply and slurm_call, allowing users to give informative names to SLURM jobs (jobname) and set any options understood by sbatch (slurm_options).

  • The data_file arugment to slurm_apply and slurm_call is replaced with add_objects, which accepts a vector of R object names from the active workspace and automatically saves them in a .RData file to be loaded on each node.

  • slurm_apply and slurm_call now generate R and Bash scripts through whisker templates. Advanced users may want to edit those templates in the templates folder of the installed R package (e.g. to set default SBATCH options in submit.sh).

  • Files generated by the package (scripts, data files and output) are now saved in a subfolder named _rslurm_[jobname] in the current working directory.

  • Minor updates, including reformatting the output of print_job_status and removing this package's dependency on stringr.

rslurm 0.2.0

2015-11-23

  • Changed the slurm_apply function to use parallel::mcMap instead of mcmapply, which fixes a bug where list outputs (i.e. each function call returns a list) would be collapsed in a single list (rather than returned as a list of lists).

  • Changed the interface so that the output type (table or raw) is now an argument of get_slurm_out rather than of slurm_apply, and defaults to raw.

  • Added cpus_per_node argument to slurm_apply, indicating the number of parallel processes to be run on each node.

rslurm 0.1.3

2015-07-13

  • Added the slurm_call function, which submits a single function evaluation on the cluster, with syntax similar to the base function do.call.

  • get_slurm_out can now process the output even if some filese are missing, in which case it issues a warning.

rslurm 0.1.2

2015-06-29

  • Added the optional argument pkgs to slurm_apply, indicating which packages should be loaded on each node (by default, all packages currently attached to the user's R session).

rslurm 0.1.1

2015-06-24

  • Added the optional argument output to slurm_apply, which can take the value table (each function evaluation returns a row, output is a data frame) or raw (each function evaluation returns an arbitrary R object, output is a list).

  • Fixed a bug in the chunk size calculation for slurm_apply.

rslurm 0.1.0

2015-06-16

  • First version of the package released on Github.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("rslurm")

0.3.1 by Philippe Marchand, 9 months ago


https://github.com/SESYNC-ci/rslurm


Report a bug at https://github.com/SESYNC-ci/rslurm/issues


Browse source code at https://github.com/cran/rslurm


Authors: Philippe Marchand [aut, cre], Mike Smorul [ctb]


Documentation:   PDF Manual  


Task views: High-Performance and Parallel Computing with R


GPL-3 license


Imports parallel, whisker

Suggests testthat, knitr, rmarkdown


See at CRAN