Visually Exploring Random Forests

Graphic elements for exploring Random Forests using the 'randomForest' or 'randomForestSRC' package for survival, regression and classification forests and 'ggplot2' package plotting.

ggRandomForests: Visually Exploring Random Forests

DOI CRAN version cranlogs

active Build Status Coverage Status

ggRandomForests will help uncover variable associations in the random forests models. The package is designed for use with the randomForest package (A. Liaw and M. Wiener 2002) or the randomForestSRC package (Iswaran 2014, 2008, 2007) for survival, regression and classification random forests and uses the ggplot2 package (Wickham 2009) for plotting diagnostic and variable association results. ggRandomForests is structured to extract data objects from randomForestSRC or randomForest objects and provides S3 functions for printing and plotting these objects.

The randomForestSRC package provides a unified treatment of Breiman's (2001) random forests for a variety of data settings. Regression and classification forests are grown when the response is numeric or categorical (factor) while survival and competing risk forests (Ishwaran et al. 2008, 2012) are grown for right-censored survival data. Recently, suppport for the randomForest package (A. Liaw and M. Wiener 2002) for regression and classification forests has also been added.

Many of the figures created by the ggRandomForests package are also available directly from within the randomForestSRC or randomForest package. However, ggRandomForests offers the following advantages:

  • Separation of data and figures: ggRandomForests contains functions that operate on either the forest object directly, or on the output from randomForestSRC and randomForest post processing functions (i.e. plot.variable,, find.interaction) to generate intermediate ggRandomForests data objects. S3 functions are provide to further process these objects and plot results using the ggplot2 graphics package. Alternatively, users can use these data objects for additional custom plotting or analysis operations.

  • Each data object/figure is a single, self contained object. This allows simple modification and manipulation of the data or ggplot2 objects to meet users specific needs and requirements.

  • The use of ggplot2 for plotting. We chose to use the ggplot2 package for our figures to allow users flexibility in modifying the figures to their liking. Each S3 plot function returns either a single ggplot2 object, or a list of ggplot2 objects, allowing users to use additional ggplot2 functions or themes to modify and customize the figures to their liking.

The package has recently been extended for Breiman and Cutler's Random Forests for Classification and Regression package randomForest where possible. Though methods have been provided for all gg_* functions, the unsupported functions will return an error message indicating where support is still lacking.

Breiman, L. (2001). Random forests, Machine Learning, 45:5-32.

Ishwaran H. and Kogalur U.B. (2014). Random Forests for Survival, Regression and Classification (RF-SRC), R package version 1.5.5.

Ishwaran H. and Kogalur U.B. (2007). Random survival forests for R. R News 7(2), 25--31.

Ishwaran H., Kogalur U.B., Blackstone E.H. and Lauer M.S. (2008). Random survival forests. Ann. Appl. Statist. 2(3), 841--860.

A. Liaw and M. Wiener (2002). Classification and Regression by randomForest. R News 2(3), 18--22.

Wickham, H. ggplot2: elegant graphics for data analysis. Springer New York, 2009.


Package: ggRandomForests Version: 2.0.1

ggRandomForests v2.0.1

ggRandomForests v2.0.0

  • Added initial support for the randomForest package
  • Updated cache files for randomForestSRC 2.2.0 release.
  • Remove regression vignettes to meet CRAN size limts. These remain available at the package source
  • Minor bug and documentation fixes.

ggRandomForests v1.2.1

  • Update cached datasets for randomForestSRC 2.0.0 release.
  • Correct some vignette formatting errors (thx Joe Smith)

ggRandomForests v1.2.0

  • Convert to semantic versioning
  • Updates for release of ggplot2 2.0.0
  • Change from reshape2::melt dependence to tidyr::gather
  • Optimize tests for CRAN to optimise R CMD CHECK times.

ggRandomForests v1.1.4

  • combine.gg_partial bug when giving a single variable plot.variable object.

  • Remove dplyr depends to transitions from "Imports" to "Suggests".

  • Argument for single outcome gg_vimp plot for classification forests.

  • Improvements to gg_vimp arguments for consistency.

  • Add bootstrap confidence intervals to gg_rfsrc function.

  • Initial partial.rfsrc function to replace the randomForestSRC::plot.variable function.

  • Move cache data to randomForestSRC v1.6.1 to take advantage of rfsrc version checking between function calls.

  • Vignette updates for JSS submission of "ggRandomForests: Exploring Random Forest Survival".

  • Vignette updates for arXiv submission of ggRandomForests: Random Forests for Regression

  • Some optimizations to reduce package size.

  • Remove all tests from CRAN build to optimise R CMD CHECK times.

  • Remove pdf vignette figure from CRAN build.

  • Return S3method calls to NAMESPACE for "S3 methods exported but not registered" for R V3.2+.

  • Misc Bug Fixes.

ggRandomForests v1.1.3

  • Update "ggRandomForests: Visually Exploring a Random Forest for Regression" vignette.
  • Further development of draft package vignette "Survival with Random Forests".
  • Rename vignettes to align with randomForestSRC package usage.
  • Add more tests and example functions.
  • Refactor gg_ functions into S3 methods to allow future implementation for other random forest packages.
  • Improved help files.
  • Updated DESCRIPTION file to remove redundant parts.
  • Misc Bug Fixes.

ggRandomForests v1.1.2

  • Add package vignette "ggRandomForests: Visually Exploring a Random Forest for Regression"
  • Add gg_partial_coplot, quantile_cuts and surface_matrix functions
  • export the calc_roc and calc_auc functions.
  • replace tidyr function dependency with reshape2 (melt instead of gather) due to lazy eval issues.
  • reduce dplyr dependencies (remove select and %>% usage for base equivalents, I still use tbl_df for printing)
  • Further development of package vignette "Survival with Random Forests"
  • Refactor cached example datasets for better documentation, estimates and examples.
  • Improved help files.
  • Updated DESCRIPTION file to remove redundant parts.
  • Misc Bug Fixes.

ggRandomForests v1.1.1

Maintenance release, mostly to fix gg_survival and gg_partial plots.

  • Fix the gg_survival functions to plot kaplan-meier estimates.
  • Fix the gg_partial functions for categorical variables.
  • Add some more S3 print functions.
  • Try to make gg_functions more consistent.
  • Further development of package vignette "Survival with Random Forests"
  • Modify the example cached datasets for better estimates and examples.
  • Improve help files.
  • Misc Bug Fixes.

ggRandomForests v1.1.0

  • Add panel option for gg_variable and gg_partial
  • Rework interactions plot
  • add gg_coplot functions
  • Imports instead of depends
  • Add version dependencies for randomForestSRC
  • Include package vignette "Random Forests for Survival"
  • Misc Bug Fixes

ggRandomForests v1.0.0

  • First CRAN release.

ggRandomForests v0.2

  • Initial useR!2014 release.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.