Variable Selection Using Random Forests

Three steps variable selection procedure based on random forests. Initially developed to handle high dimensional data (for which number of variables largely exceeds number of observations), the package is very versatile and can treat most dimensions of data, for regression and supervised classification problems. First step is dedicated to eliminate irrelevant variables from the dataset. Second step aims to select all variables related to the response for interpretation purpose. Third step refines the selection by eliminating redundancy in the set of variables selected by the second step, for prediction purpose. Genuer, R. Poggi, J.-M. and Tuleau-Malot, C. (2015) <>.


News about the R package VSURF:

Main changes in Version 1.0.4 (2018-04-09)

  • skip all tests because randomForest package is being updated and behavior with set.seed() is changing (and hence crashing all tests of VSURF)
  • add PM10 data into the package

Main changes in Version 1.0.3 (2016-04-26)

  • fix VSURF_thres bug in order to always return importance for all variables (in case of very low sample size, NaN are sometimes produced for importance of some variables)
  • add R Journal references in help pages

Main changes in Version 1.0.2 (2015-10-09)

  • fix in tests: they do not pass on windows 32-bit, but do on all other platforms.

Main changes in Version 1.0.1 (2015-10-09)

  • tests were added in the package, testing basic features of the package.
  • para parameter of VSURF, VSURF_thres and VSURF_interp is now called parallel.
  • add imports to all necessary packages except base

Main changes in Version 1.0.0 (2015-05-15) WARNING: MAJOR UPDATE.

  • VSURF is now on github.
  • Change of name functions : VSURF.thres becomes VSURF_thres VSURF.interp becomes VSURF_interp VSURF.pred becomes VSURF_pred
  • VSURF.parallel, VSURF.thres.parallel and VSURF.interp.parallel are removed from the package.
  • VSURF, VSURF_thres and VSURF_interp can now be used to parallel executions, by fixing "para = TRUE".
  • Add a predict method for VSURF objects.
  • Change of name outputs in VSURF and VSURF_thres : ord.imp$x becomes imp.mean.dec ord.imp$ix becomes imp.mean.dec.ind becomes
  • The ntree value now affects all random forests of the procedure (in all three steps).
  • S3 methods are no longer exported.
  • Add several parameters to plot function, allowing to plot each plot individually from a VSURF object.
  • Bug fix: in a VSURF run in parallel, there is no error anymore if the function is called for data with only one variable
  • Plot functions do not need anymore to have the data on which the VSURF object was built to be loaded. However, it is still mandatory if argument var.names is set to TRUE.
  • Dependencies replaced by Imports.

Main changes in Version 0.8.2 (2014-05-12):

  • Bug fix: the random seed is now correctly set when using the VSURF.parallel function with clusterType="FORK"
  • Bug fix: VSURF function can now handle categorical input variables. Thanks to ONF researchers for pointing this out.

Main changes in Version 0.8.1 (2014-01-28):

  • Bug fix: error of the model was wrongly updated in the VSURF.pred function. Thanks a lot to Dustin Fife.

Main changes in Version 0.8 (2013-11-23):

  • Parallel version of VSURF added. Three new main functions: VSURF.parallel, VSURF.thres.parallel and VSURF.interp.parallel. Default for these functions is to run VSURF in parallel on a (local) SOCKET cluster with "number of cores detected by R" minus one core
  • Addition of a print function for VSURF results
  • Bug fix: the fact that the prediction step can not run does not give an error anymore

Main changes in Version 0.7.6 (2013-11-13):

  • Plot functions changed. It is now possible to plot VI mean and VI standard deviation against variables names with function plot.VSURF.thres. By default index variables are plotted, and var.names argument is added to all plot functions
  • Bug concerning ylab and yticks fixed
  • Addition of a warning when using formula method of VSURF, VSURF.thres, VSURF.interp and VSURF.pred: indices of selected variables must be reordered to get indices of the original dataset

Main changes in Version 0.7.5 (2013-10-04):

  • Only plot functions changed. It is now possible to choose the number of variables for the VI mean and the VI standard deviation plots. It is also possible to choose if VI mean and/or VI standard deviation plots must be plooted by function plot.VSURF.thres

Main changes in Version 0.7 (2013-09-10):

  • The package can be used with the formula-type call. Hence the package can handle missing values (NA) (only when used with the formula-type call, as randomForest does)
  • Update of the plot.VSURF function: the two last graphs show the variables names on the x-axis
  • New functions (intermediate plots): plot.VSURF.thres, plot.VSURF.interp, plot.VSURF.pred
  • Functions renamed: VSURF.interp.tune -> tune.VSURF.interp, VSURF.thres.tune -> tune.thres.tune and S3method are now used for these functions

Main changes in Version 0.6 (2013-07-23):

  • New functions: plot.VSURF, summary.VSURF, VSURF.interp.tune, VSURF.thres.tune.
  • Addition of a dataset: toys
  • Addition of new outputs of existing functions.
  • Bug fix: min.thres component of VSURF results list returns now the threshold value.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.


1.1.0 by Robin Genuer, 2 years ago

Report a bug at

Browse source code at

Authors: Robin Genuer [aut, cre] , Jean-Michel Poggi [aut] , Christine Tuleau-Malot [aut]

Documentation:   PDF Manual  

GPL (>= 2) license

Imports doParallel, foreach, parallel, Rborist, randomForest, ranger, rpart

Suggests testthat

Imported by armada.

See at CRAN