A collection of methods for the robust analysis of univariate and multivariate functional data, possibly in high-dimensional cases, and hence with attention to computational efficiency and simplicity of use.
Package roahd (Robust Analysis of High-dimensional Data) allows to use a set of statistical tools for the exploration and robustification of univariate and multivariate functional datasets through the use of depth-based statistical methods.
In the implementation of functions special attention was put to their efficiency, so that they can be profitably used also for the analysis of high-dimensional datasets.
(For a full-featured description of the package, please turn to the Vignette)
S3 representation of functional data object,
allows to encapsulate the important features of univariate functional datasets (like the
grid of the dependent variable, the pointwise observations etc.):
# Grid representing the dependent variablegrid = seq( 0, 1, length.out = 100 )# Pointwise-measurements of the functional datasetData = matrix( c( sin( 2 * pi * grid ),cos ( 2 * pi * grid ),sin( 2 * pi * grid + pi / 4 ) ), ncol = 100, byrow = TRUE )# S3 object encapsulating the univariate functional datasetfD = fData( grid, Data )# S3 representation of a multivariate functional datasetmfD = mfData( grid, list( 'comp1' = Data, 'comp2' = Data ) )
Also, this allows to exploit simple calls to customised functions which simplify the exploratory analysis:
# Algebra of fData objectsfD + 1 : 100fD * 4fD_1 + fD_2# Subsetting fData objects (providing other fData objects)fD[ 1, ]fD[ 1, 2 : 4]# Smaple mean and (depth-based) median(s)mean( fD )mean( fD[ 1, 10 : 20 ] )median_fData( fD, type = 'MBD' )# Plotting functionsplot( fD )plot( mean( fD ), add = TRUE )plot( fD[ 2:3, :] )
A part of the package is specifically devoted to the computation of depths and other statistical indexes for functional data:
These also are the core of the visualization/robustification tools like
functional boxplot (
fbplot) and outliergram (
the visualization and identification of amplitude/shape outliers.
Thanks to the functions for the simulation of synthetic functional datasets,
outliergram procedures can be auto-tuned to the dataset
at hand, in order to control the true positive outliers rate.
Here's a list of what is changed in this update of roahd:
Removed check for uniformity in the grid of fData() and mfData() constructor
Added the possibility to subset fData in time with logical vectors
Fixes in methods BD, BD_relative, HI and EI: the previous computational technique was based on arguments from the popular reference "Exact fast computation of band depth for large functional datasets: How quickly can one million curves be ranked?" by Sun, Genton and Nychka, which in the case of BD, and HI/EI are wrong. Now the implementation exploited sticks to the definition, at the cost of a higher computational burden (and thus, time to complete the computation).