Found 926 packages in 0.01 seconds
Tidy Anomaly Detection
The 'anomalize' package enables a "tidy" workflow for detecting anomalies in data. The main functions are time_decompose(), anomalize(), and time_recompose(). When combined, it's quite simple to decompose time series, detect anomalies, and create bands separating the "normal" data from the anomalous data at scale (i.e. for multiple time series). Time series decomposition is used to remove trend and seasonal components via the time_decompose() function and methods include seasonal decomposition of time series by Loess ("stl") and seasonal decomposition by piecewise medians ("twitter"). The anomalize() function implements two methods for anomaly detection of residuals including using an inner quartile range ("iqr") and generalized extreme studentized deviation ("gesd"). These methods are based on those used in the 'forecast' package and the Twitter 'AnomalyDetection' package. Refer to the associated functions for specific references for these methods.
Nearest Neighbor Observation Imputation and Evaluation Tools
Performs nearest neighbor-based imputation using one or more alternative approaches to processing multivariate data. These include methods based on canonical correlation: analysis, canonical correspondence analysis, and a multivariate adaptation of the random forest classification and regression techniques of Leo Breiman and Adele Cutler. Additional methods are also offered. The package includes functions for comparing the results from running alternative techniques, detecting imputation targets that are notably distant from reference observations, detecting and correcting for bias, bootstrapping and building ensemble imputations, and mapping results.
Fast and Robust Hierarchical Clustering with Noise Points Detection
A retake on the Genie algorithm
(Gagolewski, 2021
Multivariate Outlier Detection and Imputation for Incomplete Survey Data
Algorithms for multivariate outlier detection when missing values
occur. Algorithms are based on Mahalanobis distance or data depth.
Imputation is based on the multivariate normal model or uses nearest
neighbour donors. The algorithms take sample designs, in particular
weighting, into account. The methods are described in Bill and Hulliger
(2016)
Visualization of a Correlation Matrix
Provides a visual exploratory tool on correlation matrix that supports automatic variable reordering to help detect hidden patterns among variables.
Detect and Check for Separation and Infinite Maximum Likelihood Estimates
Provides pre-fit and post-fit methods for detecting separation and infinite maximum likelihood estimates in generalized linear models with categorical responses. The pre-fit methods apply on binomial-response generalized liner models such as logit, probit and cloglog regression, and can be directly supplied as fitting methods to the glm() function. They solve the linear programming problems for the detection of separation developed in Konis (2007, < https://ora.ox.ac.uk/objects/uuid:8f9ee0d0-d78e-4101-9ab4-f9cbceed2a2a>) using 'ROI' < https://cran.r-project.org/package=ROI> or 'lpSolveAPI' < https://cran.r-project.org/package=lpSolveAPI>. The post-fit methods apply to models with categorical responses, including binomial-response generalized linear models and multinomial-response models, such as baseline category logits and adjacent category logits models; for example, the models implemented in the 'brglm2' < https://cran.r-project.org/package=brglm2> package. The post-fit methods successively refit the model with increasing number of iteratively reweighted least squares iterations, and monitor the ratio of the estimated standard error for each parameter to what it has been in the first iteration. According to the results in Lesaffre & Albert (1989, < https://www.jstor.org/stable/2345845>), divergence of those ratios indicates data separation.
Scalable Robust Estimators with High Breakdown Point
Robust Location and Scatter Estimation and Robust
Multivariate Analysis with High Breakdown Point:
principal component analysis (Filzmoser and Todorov (2013),
Airborne LiDAR Data Manipulation and Visualization for Forestry Applications
Airborne LiDAR (Light Detection and Ranging) interface for data manipulation and visualization. Read/write 'las' and 'laz' files, computation of metrics in area based approach, point filtering, artificial point reduction, classification from geographic data, normalization, individual tree segmentation and other manipulations.
Vector Generalized Linear and Additive Models
An implementation of about 6 major classes of
statistical regression models. The central algorithm is
Fisher scoring and iterative reweighted least squares.
At the heart of this package are the vector generalized linear
and additive model (VGLM/VGAM) classes. VGLMs can be loosely
thought of as multivariate GLMs. VGAMs are data-driven
VGLMs that use smoothing. The book "Vector Generalized
Linear and Additive Models: With an Implementation in R"
(Yee, 2015)
Signal Detection Analysis
Exploring time series for signal detection. It is specifically designed
to detect possible outbreaks using infectious disease surveillance data
at the European Union / European Economic Area or country level.
Automatic detection tools used are presented in the paper
"Monitoring count time series in R: aberration detection in public health surveillance",
by Salmon (2016)