Last updated on 2021-11-23 by Martin Maechler
Robust (or "resistant") methods for statistics modelling have been
available in S from the very beginning in the 1980s; and then in R in
mean(*, trim =
fivenum(), the statistic
boxplot() in package
loess()) for robust
nonparametric regression, which had been complemented
runmed() in 2003.
Much further important functionality has been made available in
recommended (and hence present in all R versions) package
MASS (by Bill Venables and Brian Ripley, see the book
Statistics with S).
Most importantly, they provide
rlm() for robust regression and
robust multivariate scatter and covariance.
This task view is about R add-on packages providing newer or faster, more efficient algorithms and notably for (robustification of) new models.
Please send suggestions for additions and extensions to the task view maintainer.
An international group of scientists working in the field of robust statistics has made efforts (since October 2005) to coordinate several of the scattered developments and make the important ones available through a set of R packages complementing each other. These should build on a basic package with "Essentials", coined robustbase with (potentially many) other packages building on top and extending the essential functionality to particular models or applications. Since 2020 and the 2nd edition of Robust Statistics: Theory and Methods , RobStatTM covers its estimators and examples, notably by importing from robustbase and rrcov. Further, there is the quite comprehensive package robust, a version of the robust library of S-PLUS, as an R package now GPLicensed thanks to Insightful and Kjell Konis. Originally, there has been much overlap between 'robustbase' and 'robust', now robust depends on robustbase and rrcov, where 'robust' provides convenient routines for the casual user while robustbase and rrcov contain the underlying functionality, and provide the more advanced statistician with a large range of options for robust modeling.
lmRob()(robust) where the former uses the latest of the fast-S algorithms and heteroscedasticity and autocorrelation corrected (HAC) standard errors, the latter makes use of the M-S algorithm of Maronna and Yohai (2000), automatically when there are factors among the predictors (where S-estimators (and hence MM-estimators) based on resampling typically badly fail). The
lmrob.S()functions are available in robustbase, but rather for comparison purposes.
rlm()from MASS had been the first widely available implementation for robust linear models, and also one of the very first MM-estimation implementations. robustreg provides very simple M-estimates for linear regression (in pure R). Note that Koenker's quantile regression package quantreg contains L1 (aka LAD, least absolute deviations)-regression as a special case, doing so also for nonparametric regression via splines. Package mblm's function
mblm()fits median-based (Theil-Sen or Siegel's repeated) simple linear models.
glmRob()(robust). Robust ordinal regression is provided by rorutadis (UTADIS). drgee fits "Doubly Robust" Generalized Estimating Equations (GEEs), complmrob does robust linear regression with compositional data as covariates. multinomRob fits overdispersed multinomial regression models for count data.
nlrob(). robustgam fits robust GAMs, i.e., robust Generalized Additive Models.
Depends") on robustbase provides nice S4 class based methods, more methods for robust multivariate variance-covariance estimation, and adds robust PCA methodology.
NA) data, and by rrcovHD, providing robust multivariate methods for High Dimensional data.
X <- stackloss; pc.rob <- princomp(X, covmat= MASS::cov.rob(X))
fastmcd(), and similarly for
covOGK(). OTOH, robust's
covRob()has automatically chosen methods, notably
pairwiseQC()for large dimensionality p. Package robustX for experimental, or other not yet established procedures, contains
covNCC(), the latter providing the neighbor variance estimation (NNVE) method of Wang and Raftery (2002), also available (slightly less optimized) in covRobust.
pam()implementing "partioning around medians" is partly robust (medians instead of very unrobust k-means) but is not good enough, as e.g., the k clusters could consist of k-1 outliers one cluster for the bulk of the remaining data.
Large Data Sets:
BACON()(in robustX) should be applicable for larger (n,p) than traditional robust covariance based outlier detectors.
Descriptive Statistics / Exploratory Data Analysis:
boxplot.stats(), etc mentioned above
runmed()provides most robust running median filtering.
vcov(lmrob())also uses a version of HAC standard errors for its robustly estimated linear models. See also the CRAN task view Econometrics
Robust Methods for Bioinformatics:
Robust Methods for Survival Analysis:
Robust Methods for Surveys:
Collections of Several Methodologies:
Other Approaches to robust and resistant methodology: