Robust Outliers Detection

Detecting outliers using robust methods, i.e. the Median Absolute Deviation (MAD) for univariate outliers; Leys, Ley, Klein, Bernard, & Licata (2013) and the Mahalanobis-Minimum Covariance Determinant (MMCD) for multivariate outliers; Leys, C., Klein, O., Dominicy, Y. & Ley, C. (2018) . There is also the more known but less robust Mahalanobis distance method, only for comparison purposes.


Routliers is designed to help researchers to detect univariate and multivariate outliers, using robust methods: - The MAD method for detecting univariate outliers (see Leys et al. 2013) - The MMCD method for detecting multivariate outliers (see Leys et al. 2018) - The Mahalanobis distance method for detecting multivariate outliers is also available in order to facilite comparisons with the MMCD method. However, this method is less robust than the MMCD method and should be avoided.

Different functions are available in order to (a) detect univariate and multivariate outliers (b) plot univariate and multivariate outliers.

Installation

Currently, this package exists in a development version on GitHub. To use the package, you need to install it directly from GitHub using the install_github function from devtools.

You can use the following code to install the development version of Routliers:

library(devtools)
install_github("mdelacre/Routliers")

As a note, one of the dependencies in Routliers (i.e. MASS) have its own dependencies. You may be prompted to install additional packages to be able to install Routliers.

Examples of using the package

The Routliers package allows you to easily count, detect and plot outliers using robust methods. The most important functions in the package are outliers_mad and outliers_mcd, which allow to detect univariate and multivariate outliers, respectively. By default, the MAD function assume a normal distribution underlying the data and treat all value more than 3 MAD around the median as outliers. Moreover, the MCD function select by default h = n/2 observations on which the empirical mean and empirical covariance matrix are calculated (see Leys et al., 2018).

Here is an example of detecting extreme level of anxiety and depression (based on the HSCL-25; Derogatis, Lipman, Rickels, Uhlenhuth & Covi, 1974) in a sample of 2077 participants after the Brussels attacks, using the outliers_mad function. The 10 first items of the HSCL-25 are averaged in order to compute the level of anxiety and the 15 last items of the HSCL-25 are averaged in order to compute the level of depression. The MAD function is applied on the averaged scores.

data(Attacks)
anxiety <- rowMeans(
  Attacks[,c("hsc1","hsc2","hsc3","hsc4",
  "hsc5","hsc6","hsc7","hsc8","hsc9","hsc10")]
                    )
res1 <- outliers_mad(x = anxiety)
res1
#> Call:
#> outliers_mad.default(x = anxiety)
#> 
#> Median:
#> [1] 1.7
#> 
#> MAD:
#> [1] 0.59304
#> 
#> Limits of acceptable range of values:
#> [1] -0.07912  3.47912
#> 
#> Number of detected outliers
#>  extremely low extremely high          total 
#>              0             25             25
depression <- rowMeans(
  Attacks[,c("hsc11","hsc12","hsc13","hsc14",
             "hsc15","hsc16","hsc17","hsc18",
             "hsc19","hsc20","hsc21","hsc22",
             "hsc23","hsc24","hsc25")]
  )
 
res2 <- outliers_mad(x = depression)
res2
#> Call:
#> outliers_mad.default(x = depression)
#> 
#> Median:
#> [1] 1.733333
#> 
#> MAD:
#> [1] 0.59304
#> 
#> Limits of acceptable range of values:
#> [1] -0.04578667  3.51245333
#> 
#> Number of detected outliers
#>  extremely low extremely high          total 
#>              0             17             17

Here is an example of detecting extremely surprising combination of scores of anxiety and depression, using the outliers_mcd function.

resMCD <- outliers_mcd(x = data.frame(anxiety,depression))
resMCD
#> Call:
#> outliers_mcd.default(x = data.frame(anxiety, depression))
#> 
#> Limit distance of acceptable values from the centroid :
#> [1] 9.21034
#> 
#> Number of detected outliers:
#> total 
#>   214

It is also possible to plot the outliers, using plot_outliers_mad and plot_outliers_mcd functions:

plot_outliers_mad(res2, x = depression) 

the plot function for the mcd method is very useful as it allows to compare the regression line with and without outliers (showing if there is a strong impact of outliers on the regression line):

plot_outliers_mcd(resMCD, x = data.frame(anxiety,depression))

News

Routliers 0.0.0.9000

  • Added a NEWS.md file to track changes to the package.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("Routliers")

0.0.0.3 by Marie Delacre, 3 days ago


Report a bug at https://github.com/mdelacre/Routliers/issues


Browse source code at https://github.com/cran/Routliers


Authors: Marie Delacre [aut, cre] , Olivier Klein [aut]


Documentation:   PDF Manual  


MIT + file LICENSE license


Imports MASS, stats, graphics, ggplot2

Suggests knitr, rmarkdown, testthat


See at CRAN