Last updated on 2021-11-23 by Martin Maechler

Robust (or "resistant") methods for statistics modelling have been available in S from the very beginning in the 1980s; and then in R in package stats. Examples are median(), mean(*, trim = . ), mad(), IQR(), or also fivenum(), the statistic behind boxplot() in package graphics) or lowess() (and loess()) for robust nonparametric regression, which had been complemented by runmed() in 2003. Much further important functionality has been made available in recommended (and hence present in all R versions) package MASS (by Bill Venables and Brian Ripley, see the book Modern Applied Statistics with S). Most importantly, they provide rlm() for robust regression and cov.rob() for robust multivariate scatter and covariance.

An international group of scientists working in the field of robust statistics has made efforts (since October 2005) to coordinate several of the scattered developments and make the important ones available through a set of R packages complementing each other. These should build on a basic package with "Essentials", coined robustbase with (potentially many) other packages building on top and extending the essential functionality to particular models or applications. Since 2020 and the 2nd edition of Robust Statistics: Theory and Methods , RobStatTM covers its estimators and examples, notably by importing from robustbase and rrcov. Further, there is the quite comprehensive package robust, a version of the robust library of S-PLUS, as an R package now GPLicensed thanks to Insightful and Kjell Konis. Originally, there has been much overlap between 'robustbase' and 'robust', now robust depends on robustbase and rrcov, where 'robust' provides convenient routines for the casual user while robustbase and rrcov contain the underlying functionality, and provide the more advanced statistician with a large range of options for robust modeling.

We structure the packages roughly into the following topics, and typically will first mention functionality in packages robustbase, rrcov and robust.

Regression

• Linear Regression:

lmrob() (robustbase) and lmRob() (robust) where the former uses the latest of the fast-S algorithms and heteroscedasticity and autocorrelation corrected (HAC) standard errors, the latter makes use of the M-S algorithm of Maronna and Yohai (2000), automatically when there are factors among the predictors (where S-estimators (and hence MM-estimators) based on resampling typically badly fail). The ltsReg() and lmrob.S() functions are available in robustbase, but rather for comparison purposes. rlm() from MASS had been the first widely available implementation for robust linear models, and also one of the very first MM-estimation implementations. robustreg provides very simple M-estimates for linear regression (in pure R). Note that Koenker's quantile regression package quantreg contains L1 (aka LAD, least absolute deviations)-regression as a special case, doing so also for nonparametric regression via splines. Package mblm's function mblm() fits median-based (Theil-Sen or Siegel's repeated) simple linear models.
• Generalized Linear Models (GLMs) for Regression:

GLMs are provided both via glmrob() (robustbase) and glmRob() (robust). Robust ordinal regression is provided by rorutadis (UTADIS). drgee fits "Doubly Robust" Generalized Estimating Equations (GEEs), complmrob does robust linear regression with compositional data as covariates. multinomRob fits overdispersed multinomial regression models for count data.
• Mixed-Effects (Linear and Nonlinear) Regression:

Quantile regression (and hence L1 or LAD) for mixed effect models, is available in package lqmm. Rank-based mixed effect fitting from package rlme, whereas an MM-like approach for robust linear mixed effects modeling is available from package robustlmm. More recently, skewlmm provides robust linear mixed-effects models LMM via scale mixtures of skew-normal distributions.
• Nonlinear / Smooth (Nonparametric Function) Regression:

Robust Nonlinear model fitting is available through robustbase's nlrob(). robustgam fits robust GAMs, i.e., robust Generalized Additive Models.

Multivariate Analysis:

• Here, the rrcov package which builds ("Depends") on robustbase provides nice S4 class based methods, more methods for robust multivariate variance-covariance estimation, and adds robust PCA methodology.
• 'rrcov' is extended by rrcovNA, providing robust multivariate methods for for incomplete or missing (NA) data, and by rrcovHD, providing robust multivariate methods for High Dimensional data.
• Specialized robust PCA packages are pcaPP (via Projection Pursuit), rpca (incl "sparse") and rospca. Historically, note that robust PCA can be performed by using standard R's princomp(), e.g., X <- stackloss; pc.rob <- princomp(X, covmat= MASS::cov.rob(X))
• Here, robustbase contains a slightly more flexible version, covMcd() than robust's fastmcd(), and similarly for covOGK(). OTOH, robust's covRob() has automatically chosen methods, notably pairwiseQC() for large dimensionality p. Package robustX for experimental, or other not yet established procedures, contains BACON() and covNCC(), the latter providing the neighbor variance estimation (NNVE) method of Wang and Raftery (2002), also available (slightly less optimized) in covRobust.
• RobRSVD provides a robust Regularized Singular Value Decomposition.
• mvoutlier (building on robustbase) provides several methods for outlier identification in high dimensions.
• GSE estimates multivariate location and scatter in the presence of missing data.
• RSKC provides Robust Sparse K-means Clustering.
• robustDA for robust mixture Discriminant Analysis (RMDA) builds a mixture model classifier with noisy class labels.
• robcor computes robust pairwise correlations based on scale estimates, particularly on FastQn().
• covRobust provides the nearest neighbor variance estimation (NNVE) method of Wang and Raftery (2002).

Clustering (Multivariate):

• We are not considering cluster-resistant variance (/standard error) estimation (aka "sandwich"). Rather e.g. model based and hierarchical clustering methodology with a particular emphasis on robustness: Note that cluster's pam() implementing "partioning around medians" is partly robust (medians instead of very unrobust k-means) but is not good enough, as e.g., the k clusters could consist of k-1 outliers one cluster for the bulk of the remaining data.
• "Truly" robust clustering is provided by packages genie, Gmedian, otrimle (trimmed MLE model-based) and notably tclust (robust trimmed clustering).

Large Data Sets:

• BACON() (in robustX) should be applicable for larger (n,p) than traditional robust covariance based outlier detectors.

Descriptive Statistics / Exploratory Data Analysis:

• boxplot.stats(), etc mentioned above

Time Series:

• R's runmed() provides most robust running median filtering.
• Package robfilter contains robust regression and filtering methods for univariate time series, typically based on repeated (weighted) median regressions.
• The RobPer provides several methods for robust periodogram estimation, notably for irregularly spaced time series.
• Peter Ruckdeschel has started to lead an effort for a robust time-series package, see robust-ts on R-Forge.
• Further, robKalman, "Routines for Robust Kalman Filtering --- the ACM- and rLS-filter", is being developed, see robkalman on R-Forge.
Note however that these (last two items) are not yet available from CRAN.

Econometric Models:

• Econometricians tend to like HAC (heteroscedasticity and autocorrelation corrected) standard errors. For a broad class of models, these are provided by package sandwich; similarly clubSandwich and clusterSEs. Note that vcov(lmrob()) also uses a version of HAC standard errors for its robustly estimated linear models. See also the CRAN task view Econometrics

Robust Methods for Bioinformatics:

• There are several packages in the Bioconductor project providing specialized robust methods. In addition, RobLoxBioC provides infinitesimally robust estimators for preprocessing omics data.

Robust Methods for Survival Analysis:

• Package coxrobust provides robust estimation in the Cox model.

Robust Methods for Surveys:

• On R-forge only, package rhte provides a robust Horvitz-Thompson estimator.

Geostatistics:

• Package georob aims at robust geostatistical analysis of spatial data, such as kriging and more.

Collections of Several Methodologies:

• WRS2 contains robust tests for ANOVA and ANCOVA and other functionality from Rand Wilcox's collection.
• walrus builds on WRS2's computations, providing a different user interface.
• robeth contains R functions interfacing to the extensive RobETH fortran library with many functions for regression, multivariate estimation and more.

Other Approaches to robust and resistant methodology:

• The package distr and its several child packages also allow to explore robust estimation concepts, see e.g., distr on R-Forge.
• Notably, based on these, the project robast aims for the implementation of R packages for the computation of optimally robust estimators and tests as well as the necessary infrastructure (mainly S4 classes and methods) and diagnostics; cf. M. Kohl (2005). It includes the R packages RandVar, RobAStBase, RobLox, RobLoxBioC, RobRex. Further, ROptEst, and ROptRegTS.
• RobustAFT computes Robust Accelerated Failure Time Regression for Gaussian and logWeibull errors.
• robumeta for robust variance meta-regression; metaplus adds robustness via t- or mixtures of normal distributions.
• ssmrob provides robust estimation and inference in sample selection models.

Packages

MASS — 7.3-55

Support Functions and Datasets for Venables and Ripley's MASS

robustbase — 0.93-9

Basic Robust Statistics

robust — 0.6-1

Port of the S+ "Robust Library"

rrcov — 1.6-0

Scalable Robust Estimators with High Breakdown Point

clubSandwich — 0.5.5

Cluster-Robust (Sandwich) Variance Estimators with Small-Sample Corrections

cluster — 2.1.2

"Finding Groups in Data": Cluster Analysis Extended Rousseeuw et al.

clusterSEs — 2.6.5

Calculate Cluster-Robust p-Values and Confidence Intervals

complmrob — 0.7.0

Robust Linear Regression with Compositional Data as Covariates

covRobust — 1.1-3

Robust Covariance Estimation via Nearest Neighbor Cleaning

coxrobust — 1.0

Robust Estimation in Cox Model

distr — 2.8.0

Object Oriented Implementation of Distributions

drgee — 1.1.10

Doubly Robust Generalized Estimating Equations

genie — 1.0.5

Fast, Robust, and Outlier Resistant Hierarchical Clustering

georob — 0.3-14

Robust Geostatistical Analysis of Spatial Data

Gmedian — 1.2.6

Geometric Median, k-Medians Clustering and Robust Median PCA

GSE — 4.2

Robust Estimation in the Presence of Cellwise and Casewise Contamination and Missing Data

lqmm — 1.5.6

Linear Quantile Mixed Models

mblm — 0.12.1

Median-Based Linear Models

metaplus — 1.0-2

Robust Meta-Analysis and Meta-Regression

multinomRob — 1.8-6.1

Robust Estimation of Overdispersed Multinomial Regression Models

mvoutlier — 2.1.1

Multivariate Outlier Detection Based on Robust Methods

otrimle — 2.0

Robust Model-Based Clustering

OutlierDM — 1.1.1

Outlier Detection for Multi-replicated High-throughput Data

pcaPP — 1.9-74

Robust PCA by Projection Pursuit

quantreg — 5.87

Quantile Regression

RandVar — 1.2.1

Implementation of Random Variables

rlme — 0.5

Rank-Based Estimation and Prediction in Random Effects Nested Models

RobAStBase — 1.2.1

Robust Asymptotic Statistics

robcor — 0.1-6

Robust Correlations

robeth — 2.7-6

R Functions for Robust Statistics

robfilter — 4.1.2

Robust Time Series Filters

RobLox — 1.2.0

Optimally Robust Influence Curves and Estimators for Location and Scale

RobLoxBioC — 1.2.0

Infinitesimally Robust Estimators for Preprocessing -Omics Data

RobPer — 1.2.2

Robust Periodogram and Periodicity Detection Methods

RobRex — 1.2.0

Optimally Robust Influence Curves for Regression and Scale

RobRSVD — 1.0

Robust Regularized Singular Value Decomposition

RobStatTM — 1.0.3

Robust Statistics: Theory and Methods

robumeta — 2.0

Robust Variance Meta-Regression

RobustAFT — 1.4-5

Truncated Maximum Likelihood Fit and Robust Accelerated Failure Time Regression for Gaussian and Log-Weibull Case

robustDA — 1.2

Robust Mixture Discriminant Analysis

robustgam — 0.1.7

Robust Estimation for Generalized Additive Models

robustlmm — 2.4-5

Robust Linear Mixed Effects Models

robustreg — 0.1-11

Robust Regression Functions

robustX — 1.2-5

'eXtra' / 'eXperimental' Functionality for Robust Statistics

ROptEst — 1.2.1

Optimally Robust Estimation

ROptRegTS — 1.2.0

Optimally Robust Estimation for Regression-Type Models

rospca — 1.0.4

Robust Sparse PCA using the ROSPCA Algorithm

rpca — 0.2.3

RobustPCA: Decompose a Matrix into Low-Rank and Sparse Components

rrcovHD — 0.2-7

Robust Multivariate Methods for High Dimensional Data

rrcovNA — 0.4-15

Scalable Robust Estimators with High Breakdown Point for Incomplete Data

RSKC — 2.4.2

Robust Sparse K-Means

sandwich — 3.0-1

Robust Covariance Matrix Estimators

skewlmm — 1.0.0

Scale Mixture of Skew-Normal Linear Mixed Models

ssmrob — 1.0

Robust Estimation and Inference in Sample Selection Models

tclust — 1.4-2

Robust Trimmed Clustering

walrus — 1.0.3

Robust Statistical Methods

WRS2 — 1.1-3

A Collection of Robust Statistical Methods