A collection of methods for the robust analysis of univariate and multivariate functional data, possibly in high-dimensional cases, and hence with attention to computational efficiency and simplicity of use.
Package roahd (Robust Analysis of High-dimensional Data) allows to use a set of statistical tools for the exploration and robustification of univariate and multivariate functional datasets through the use of depth-based statistical methods.
In the implementation of functions special attention was put to their efficiency, so that they can be profitably used also for the analysis of high-dimensional datasets.
(For a full-featured description of the package, please turn to the Vignette)
fData
and mfData
objectsA simple S3
representation of functional data object, fData
,
allows to encapsulate the important features of univariate functional datasets (like the
grid of the dependent variable, the pointwise observations etc.):
# Grid representing the dependent variablegrid = seq( 0, 1, length.out = 100 ) # Pointwise-measurements of the functional datasetData = matrix( c( sin( 2 * pi * grid ), cos ( 2 * pi * grid ), sin( 2 * pi * grid + pi / 4 ) ), ncol = 100, byrow = TRUE ) # S3 object encapsulating the univariate functional dataset fD = fData( grid, Data ) # S3 representation of a multivariate functional datasetmfD = mfData( grid, list( 'comp1' = Data, 'comp2' = Data ) )
Also, this allows to exploit simple calls to customised functions which simplify the exploratory analysis:
# Algebra of fData objectsfD + 1 : 100fD * 4 fD_1 + fD_2 # Subsetting fData objects (providing other fData objects)fD[ 1, ]fD[ 1, 2 : 4] # Smaple mean and (depth-based) median(s)mean( fD )mean( fD[ 1, 10 : 20 ] )median_fData( fD, type = 'MBD' ) # Plotting functionsplot( fD )plot( mean( fD ), add = TRUE ) plot( fD[ 2:3, :] )
A part of the package is specifically devoted to the computation of depths and other statistical indexes for functional data:
These also are the core of the visualization/robustification tools like
functional boxplot (fbplot
) and outliergram (outliergram
), allowing
the visualization and identification of amplitude/shape outliers.
Thanks to the functions for the simulation of synthetic functional datasets,
both fbplot
and outliergram
procedures can be auto-tuned to the dataset
at hand, in order to control the true positive outliers rate.
Here's a list of what is changed in this update of roahd:
Removed check for uniformity in the grid of fData() and mfData() constructor
Added the possibility to subset fData in time with logical vectors
Fixes in methods BD, BD_relative, HI and EI: the previous computational technique was based on arguments from the popular reference "Exact fast computation of band depth for large functional datasets: How quickly can one million curves be ranked?" by Sun, Genton and Nychka, which in the case of BD, and HI/EI are wrong. Now the implementation exploited sticks to the definition, at the cost of a higher computational burden (and thus, time to complete the computation).