Examples: visualization, C++, networks, data cleaning, html widgets, ropensci.

Found 961 packages in 0.01 seconds

adbcdrivermanager — by Dewey Dunnington, 2 months ago

'Arrow' Database Connectivity ('ADBC') Driver Manager

Provides a developer-facing interface to 'Arrow' Database Connectivity ('ADBC') for the purposes of driver development, driver testing, and building high-level database interfaces for users. 'ADBC' < https://arrow.apache.org/adbc/> is an API standard for database access libraries that uses 'Arrow' for result sets and query parameters.

anomalize — by Matt Dancho, a year ago

Tidy Anomaly Detection

The 'anomalize' package enables a "tidy" workflow for detecting anomalies in data. The main functions are time_decompose(), anomalize(), and time_recompose(). When combined, it's quite simple to decompose time series, detect anomalies, and create bands separating the "normal" data from the anomalous data at scale (i.e. for multiple time series). Time series decomposition is used to remove trend and seasonal components via the time_decompose() function and methods include seasonal decomposition of time series by Loess ("stl") and seasonal decomposition by piecewise medians ("twitter"). The anomalize() function implements two methods for anomaly detection of residuals including using an inner quartile range ("iqr") and generalized extreme studentized deviation ("gesd"). These methods are based on those used in the 'forecast' package and the Twitter 'AnomalyDetection' package. Refer to the associated functions for specific references for these methods.

yaImpute — by Jeffrey S. Evans, 7 months ago

Nearest Neighbor Observation Imputation and Evaluation Tools

Performs nearest neighbor-based imputation using one or more alternative approaches to processing multivariate data. These include methods based on canonical correlation: analysis, canonical correspondence analysis, and a multivariate adaptation of the random forest classification and regression techniques of Leo Breiman and Adele Cutler. Additional methods are also offered. The package includes functions for comparing the results from running alternative techniques, detecting imputation targets that are notably distant from reference observations, detecting and correcting for bias, bootstrapping and building ensemble imputations, and mapping results.

genieclust — by Marek Gagolewski, 8 months ago

Fast and Robust Hierarchical Clustering with Noise Points Detection

A retake on the Genie algorithm (Gagolewski, 2021 ) - a robust hierarchical clustering method (Gagolewski, Bartoszuk, Cena, 2016 ). Now faster and more memory efficient; determining the whole hierarchy for datasets of 10M points in low dimensional Euclidean spaces or 100K points in high-dimensional ones takes only 1-2 minutes. Allows clustering with respect to mutual reachability distances so that it can act as a noise point detector or a robustified version of 'HDBSCAN*' (that is able to detect a predefined number of clusters and hence it does not dependent on the somewhat fragile 'eps' parameter). The package also features an implementation of inequality indices (the Gini, Bonferroni index), external cluster validity measures (e.g., the normalised clustering accuracy and partition similarity scores such as the adjusted Rand, Fowlkes-Mallows, adjusted mutual information, and the pair sets index), and internal cluster validity indices (e.g., the Calinski-Harabasz, Davies-Bouldin, Ball-Hall, Silhouette, and generalised Dunn indices). See also the 'Python' version of 'genieclust' available on 'PyPI', which supports sparse data, more metrics, and even larger datasets.

modi — by Beat Hulliger, 2 years ago

Multivariate Outlier Detection and Imputation for Incomplete Survey Data

Algorithms for multivariate outlier detection when missing values occur. Algorithms are based on Mahalanobis distance or data depth. Imputation is based on the multivariate normal model or uses nearest neighbour donors. The algorithms take sample designs, in particular weighting, into account. The methods are described in Bill and Hulliger (2016) .

outliertree — by David Cortes, 8 months ago

Explainable Outlier Detection Through Decision Tree Conditioning

Outlier detection method that flags suspicious values within observations, constrasting them against the normal values in a user-readable format, potentially describing conditions within the data that make a given outlier more rare. Full procedure is described in Cortes (2020) . Loosely based on the 'GritBot' < https://www.rulequest.com/gritbot-info.html> software.

Distance — by Laura Marshall, 6 months ago

Distance Sampling Detection Function and Abundance Estimation

A simple way of fitting detection functions to distance sampling data for both line and point transects. Adjustment term selection, left and right truncation as well as monotonicity constraints and binning are supported. Abundance and density estimates can also be calculated (via a Horvitz-Thompson-like estimator) if survey area information is provided. See Miller et al. (2019) for more information on methods and < https://examples.distancesampling.org/> for example analyses.

detectseparation — by Ioannis Kosmidis, 3 years ago

Detect and Check for Separation and Infinite Maximum Likelihood Estimates

Provides pre-fit and post-fit methods for detecting separation and infinite maximum likelihood estimates in generalized linear models with categorical responses. The pre-fit methods apply on binomial-response generalized liner models such as logit, probit and cloglog regression, and can be directly supplied as fitting methods to the glm() function. They solve the linear programming problems for the detection of separation developed in Konis (2007, < https://ora.ox.ac.uk/objects/uuid:8f9ee0d0-d78e-4101-9ab4-f9cbceed2a2a>) using 'ROI' < https://cran.r-project.org/package=ROI> or 'lpSolveAPI' < https://cran.r-project.org/package=lpSolveAPI>. The post-fit methods apply to models with categorical responses, including binomial-response generalized linear models and multinomial-response models, such as baseline category logits and adjacent category logits models; for example, the models implemented in the 'brglm2' < https://cran.r-project.org/package=brglm2> package. The post-fit methods successively refit the model with increasing number of iteratively reweighted least squares iterations, and monitor the ratio of the estimated standard error for each parameter to what it has been in the first iteration. According to the results in Lesaffre & Albert (1989, < https://www.jstor.org/stable/2345845>), divergence of those ratios indicates data separation.

rrcov — by Valentin Todorov, 4 days ago

Scalable Robust Estimators with High Breakdown Point

Robust Location and Scatter Estimation and Robust Multivariate Analysis with High Breakdown Point: principal component analysis (Filzmoser and Todorov (2013), ), linear and quadratic discriminant analysis (Todorov and Pires (2007)), multivariate tests (Todorov and Filzmoser (2010) ), outlier detection (Todorov et al. (2010) ). See also Todorov and Filzmoser (2009) , Todorov and Filzmoser (2010) and Boudt et al. (2019) .

cld2 — by Jeroen Ooms, a month ago

Google's Compact Language Detector 2

Bindings to Google's C++ library Compact Language Detector 2 (see < https://github.com/cld2owners/cld2#readme> for more information). Probabilistically detects over 80 languages in plain text or HTML. For mixed-language input it returns the top three detected languages and their approximate proportion of the total classified text bytes (e.g. 80% English and 20% French out of 1000 bytes). There is also a 'cld3' package on CRAN which uses a neural network model instead.