Extensible, Parallelizable Implementation of the Random Forest Algorithm

Scalable decision tree training and prediction.


Changes in 0.1-4:

  • Sparse 'dcGMatrix' matrices accepted, if encoded in 'i/p' format.

  • Autocompression conserves space on a per-predictor basis.

  • Space-saving 'thinLeaves' option suppresses creation of summary data.

  • 'splitQuantile' option allows fine tuning of split-point placement for numerical predictors.

  • Improved scaling with row count.

Changes in 0.1-2:

  • Improved scaling with predictor count.

  • Improved conformance with Caret package.

  • 'minNode' default lowered to reflect uniqueness of indices referenced within a node.

  • Name change: PreTrain deprecated in favor of PreFormat.

  • Minor reorganization to support sparse internal representation planned for next release.

Changes in 0.1-1:

  • Significant reductions in memory footprint.

  • Default predictor-selction mode changed to 'predFixed' (like 'mTry') for small predictor counts. 'predProb' remains the default at higher count.

  • Binary classification now employs faster, weight-based algorithm.

  • Training produces rich internal state by default. In particular, quantile validation and prediction can be performed without having to train specially for them.

  • ForestFloorExport objects can be produced from training state for use by 'forestFloor' feature-analysis package.

  • PreTrain method produces pre-sorted predictor format, saving recomputation when retraining iteratively, such as during a Caret session.

  • OMP parallelization now performed per node/predictor pair, rather than per predictor.

  • Optional 'regMono' vector enforces monotonic constraints on numeric regressors.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.


0.1-8 by Mark Seligman, 2 years ago

http://www.suiji.org/arborist, https://github.com/suiji/Arborist

Browse source code at https://github.com/cran/Rborist

Authors: Mark Seligman

Documentation:   PDF Manual  

Task views: High-Performance and Parallel Computing with R, Machine Learning & Statistical Learning

MPL (>= 2) | GPL (>= 2) | file LICENSE license

Depends on Rcpp

Suggests testthat, knitr, rmarkdown

Enhances forestFloor

Linking to Rcpp, RcppArmadillo

System requirements: gcc (release >= 4.9).

See at CRAN