Variable Selection for Gaussian Model-Based Clustering

Variable selection for Gaussian model-based clustering as implemented in the 'mclust' package. The methodology allows to find the (locally) optimal subset of variables in a data set that have group/cluster information. A greedy or headlong search can be used, either in a forward-backward or backward-forward direction, with or without sub-sampling at the hierarchical clustering stage for starting 'mclust' models. By default the algorithm uses a sequential search, but parallelisation is also available.


Version 2.3.3 (2018-11) o Added the final estimated model to the 'clustvarsel' object. o Solved a bug that stop execution in the greedy-backward search when no variables could be removed.

Version 2.3.2 (2018-04) o Package version accompanying JSS paper. o Bug fixes in the extreme case no clustering variable is selected using the greedy forward/backward search.

Version 2.3.1 (2017-06) o Fix bug on a "if" executed with a condition that has length greater than 1.

Version 2.3 (2017-01) o Add optional argument 'verbose' to clustvarsel() for printing steps info during the search. o New print method for 'clustvarsel' objects. o A parallel cluster is automatically stopped unless a registered parallel back end is provided as argument to 'parallel' argument in the clustvarsel() function call. o Add "A quick tour of clustvarsel" vignette.

Version 2.2 (2015-11) o Reformat summary output from clustvarsel. o Add and update references in main help page.

Version 2.1 (2014-10) o Version associated with JSS paper submission. o Add explicitly stop of clusters if parallel is used. o Specifically included in the hc() function call the argument name 'data = ...' so that works with both mclust version 4.4 and upper. o Other bug fixes and improvements.

Version 2.0 (2013-10) o Partial rewriting of the package. o "greedy" search has option for forward and backward direction. o "headlong" search has option only for forward direction in this release. o G is not the maximum number of clusters but it must be a vector of number of cluster to look for. o No separate code for samp and nosamp version of each search algorithm. o Inclusion of argument hcModel to control the initial hierarchical clustering. o Include subset selection in the regression of proposed variable on the variables already included. o "greedy" search algorithms can be executed either sequentially or using the parallel computing facilities available in R. o This version of the package requires R (>= 3.0.0) and mclust (>= 4.0).

Version 1.3 (2009-08) o Last version on CRAN available for R-2.14.x and mclust version 3.5

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.


2.3.3 by Luca Scrucca, a year ago

Browse source code at

Authors: Nema Dean [aut] , Adrian E. Raftery [aut] , Luca Scrucca [aut, cre]

Documentation:   PDF Manual  

Task views: Chemometrics and Computational Physics, Cluster Analysis & Finite Mixture Models, Multivariate Statistics

GPL (>= 2) license

Imports stats, Matrix, BMA, foreach, iterators

Depends on mclust

Suggests MASS, parallel, doParallel, knitr, rmarkdown

See at CRAN