Examples: visualization, C++, networks, data cleaning, html widgets, ropensci.

Found 1174 packages in 0.02 seconds

anomalize — by Matt Dancho, 3 years ago

Tidy Anomaly Detection

The 'anomalize' package enables a "tidy" workflow for detecting anomalies in data. The main functions are time_decompose(), anomalize(), and time_recompose(). When combined, it's quite simple to decompose time series, detect anomalies, and create bands separating the "normal" data from the anomalous data at scale (i.e. for multiple time series). Time series decomposition is used to remove trend and seasonal components via the time_decompose() function and methods include seasonal decomposition of time series by Loess ("stl") and seasonal decomposition by piecewise medians ("twitter"). The anomalize() function implements two methods for anomaly detection of residuals including using an inner quartile range ("iqr") and generalized extreme studentized deviation ("gesd"). These methods are based on those used in the 'forecast' package and the Twitter 'AnomalyDetection' package. Refer to the associated functions for specific references for these methods.

deadwood — by Marek Gagolewski, 4 months ago

Outlier Detection via Trimming of Mutual Reachability Minimum Spanning Trees

Implements an anomaly detection algorithm based on mutual reachability minimum spanning trees: 'deadwood' trims protruding tree segments and marks small debris as outliers; see Gagolewski (2026) < https://deadwood.gagolewski.com/>. More precisely, the use of a mutual reachability distance pulls peripheral points farther away from each other. Tree edges with weights beyond the detected elbow point are removed. All the resulting connected components whose sizes are smaller than a given threshold are deemed anomalous. The 'Python' version of 'deadwood' is available via 'PyPI'.

modi — by Beat Hulliger, 10 months ago

Multivariate Outlier Detection and Imputation for Incomplete Survey Data

Algorithms for multivariate outlier detection when missing values occur. Algorithms are based on Mahalanobis distance or data depth. Imputation is based on the multivariate normal model or uses nearest neighbour donors. The algorithms take sample designs, in particular weighting, into account. The methods are described in Bill and Hulliger (2016) .

pcadapt — by Florian PrivĂ©, a year ago

Fast Principal Component Analysis for Outlier Detection

Methods to detect genetic markers involved in biological adaptation. 'pcadapt' provides statistical tools for outlier detection based on Principal Component Analysis. Implements the method described in (Luu, 2016) and later revised in (Privé, 2020) .

Distance — by Laura Marshall, a year ago

Distance Sampling Detection Function and Abundance Estimation

A simple way of fitting detection functions to distance sampling data for both line and point transects. Adjustment term selection, left and right truncation as well as monotonicity constraints and binning are supported. Abundance and density estimates can also be calculated (via a Horvitz-Thompson-like estimator) if survey area information is provided. See Miller et al. (2019) for more information on methods and < https://distancesampling.org/resources/vignettes.html> for example analyses.

tesseract — by Jeroen Ooms, 5 months ago

Open Source OCR Engine

Bindings to 'Tesseract': a powerful optical character recognition (OCR) engine that supports over 100 languages. The engine is highly configurable in order to tune the detection algorithms and obtain the best possible results.

cpm — by Gordon J. Ross, 6 years ago

Sequential and Batch Change Detection Using Parametric and Nonparametric Methods

Sequential and batch change detection for univariate data streams, using the change point model framework. Functions are provided to allow nonparametric distribution-free change detection in the mean, variance, or general distribution of a given sequence of observations. Parametric change detection methods are also provided for Gaussian, Bernoulli and Exponential sequences. Both the batch (Phase I) and sequential (Phase II) settings are supported, and the sequences may contain either a single or multiple change points. A full description of this package is available in Ross, G.J (2015) - "Parametric and nonparametric sequential change detection in R" available at < https://www.jstatsoft.org/article/view/v066i03>.

ggvis — by Hadley Wickham, 4 months ago

Interactive Grammar of Graphics

An implementation of an interactive grammar of graphics, taking the best parts of 'ggplot2', combining them with the reactive framework of 'shiny' and drawing web graphics using 'vega'.

cld2 — by Jeroen Ooms, a year ago

Google's Compact Language Detector 2

Bindings to Google's C++ library Compact Language Detector 2 (see < https://github.com/cld2owners/cld2#readme> for more information). Probabilistically detects over 80 languages in plain text or HTML. For mixed-language input it returns the top three detected languages and their approximate proportion of the total classified text bytes (e.g. 80% English and 20% French out of 1000 bytes). There is also a 'cld3' package on CRAN which uses a neural network model instead.

bSims — by Peter Solymos, a year ago

Agent-Based Bird Point Count Simulator

A highly scientific and utterly addictive bird point count simulator to test statistical assumptions, aid survey design, and have fun while doing it (Solymos 2024 ). The simulations follow time-removal and distance sampling models based on Matsuoka et al. (2012) , Solymos et al. (2013) , and Solymos et al. (2018) , and sound attenuation experiments by Yip et al. (2017) .