Robust Analysis of High Dimensional Data

A collection of methods for the robust analysis of univariate and multivariate functional data, possibly in high-dimensional cases, and hence with attention to computational efficiency and simplicity of use.


Package roahd (Robust Analysis of High-dimensional Data) allows to use a set of statistical tools for the exploration and robustification of univariate and multivariate functional datasets through the use of depth-based statistical methods.

In the implementation of functions special attention was put to their efficiency, so that they can be profitably used also for the analysis of high-dimensional datasets.

(For a full-featured description of the package, please turn to the Vignette)

A simple S3 representation of functional data object, fData, allows to encapsulate the important features of univariate functional datasets (like the grid of the dependent variable, the pointwise observations etc.):

# Grid representing the dependent variable
grid = seq( 0, 1, length.out = 100 )
 
# Pointwise-measurements of the functional dataset
Data = matrix( c( sin( 2 * pi * grid ),
                  cos ( 2 * pi * grid ),
                  sin( 2 * pi * grid + pi / 4 ) ), ncol = 100, byrow = TRUE )
 
# S3 object encapsulating the univariate functional dataset            
fD = fData( grid, Data )
 
# S3 representation of a multivariate functional dataset
mfD = mfData( grid, list( 'comp1' = Data, 'comp2' = Data ) )

Also, this allows to exploit simple calls to customised functions which simplify the exploratory analysis:

# Algebra of fData objects
fD + 1 : 100
fD * 4
 
fD_1 + fD_2
 
# Subsetting fData objects (providing other fData objects)
fD[ 1, ]
fD[ 1, 2 : 4]
 
# Smaple mean and (depth-based) median(s)
mean( fD )
mean( fD[ 1, 10 : 20 ] )
median_fData( fD, type = 'MBD' )
 
# Plotting functions
plot( fD )
plot( mean( fD ), add = TRUE )
 
plot( fD[ 2:3, :] )

A part of the package is specifically devoted to the computation of depths and other statistical indexes for functional data:

  • Band Dephts and Modified Band Depths,
  • Modified band depths for multivariate functional data,
  • Epigraph and Hypograph indexes,
  • Spearman and Kendall's correlation indexes for functional data.

These also are the core of the visualization/robustification tools like functional boxplot (fbplot) and outliergram (outliergram), allowing the visualization and identification of amplitude/shape outliers.

Thanks to the functions for the simulation of synthetic functional datasets, both fbplot and outliergram procedures can be auto-tuned to the dataset at hand, in order to control the true positive outliers rate.

News

Changelog

Here's a list of what is changed in this update of roahd:

  1. Removed check for uniformity in the grid of fData() and mfData() constructor

  2. Added the possibility to subset fData in time with logical vectors

  3. Fixes in methods BD, BD_relative, HI and EI: the previous computational technique was based on arguments from the popular reference "Exact fast computation of band depth for large functional datasets: How quickly can one million curves be ranked?" by Sun, Genton and Nychka, which in the case of BD, and HI/EI are wrong. Now the implementation exploited sticks to the definition, at the cost of a higher computational burden (and thus, time to complete the computation).

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("roahd")

1.3 by Nicholas Tarabelloni, 25 days ago


Browse source code at https://github.com/cran/roahd


Authors: Nicholas Tarabelloni [aut, cre], Ana Arribas-Gil [aut], Francesca Ieva [aut], Anna Maria Paganoni [aut], Juan Romo [aut]


Documentation:   PDF Manual  


Task views: Robust Statistical Methods, Functional Data Analysis


GPL-3 license


Imports scales, robustbase

Suggests testthat, knitr, rmarkdown


See at CRAN