Examples: visualization, C++, networks, data cleaning, html widgets, ropensci.

Found 57 packages in 0.04 seconds

modelStudio — by Hubert Baniecki, 2 years ago

Interactive Studio for Explanatory Model Analysis

Automate the explanatory analysis of machine learning predictive models. Generate advanced interactive model explanations in the form of a serverless HTML site with only one line of code. This tool is model-agnostic, therefore compatible with most of the black-box predictive models and frameworks. The main function computes various (instance and model-level) explanations and produces a customisable dashboard, which consists of multiple panels for plots with their short descriptions. It is possible to easily save the dashboard and share it with others. modelStudio facilitates the process of Interactive Explanatory Model Analysis introduced in Baniecki et al. (2023) .

kernelshap — by Michael Mayer, 3 months ago

Kernel SHAP

Efficient implementation of Kernel SHAP, see Lundberg and Lee (2017), and Covert and Lee (2021) < http://proceedings.mlr.press/v130/covert21a>. Furthermore, for up to 14 features, exact permutation SHAP values can be calculated. The package plays well together with meta-learning packages like 'tidymodels', 'caret' or 'mlr3'. Visualizations can be done using the R package 'shapviz'.

PvSTATEM — by Tymoteusz Kwiecinski, 12 days ago

Reading, Quality Control and Preprocessing of MBA (Multiplex Bead Assay) Data

Speeds up the process of loading raw data from MBA (Multiplex Bead Assay) examinations, performs quality control checks, and automatically normalises the data, preparing it for more advanced, downstream tasks. The main objective of the package is to create a simple environment for a user, who does not necessarily have experience with R language. The package is developed within the project of the same name - 'PvSTATEM', which is an international project aiming for malaria elimination.

survex — by Mikołaj Spytek, a year ago

Explainable Machine Learning in Survival Analysis

Survival analysis models are commonly used in medicine and other areas. Many of them are too complex to be interpreted by human. Exploration and explanation is needed, but standard methods do not give a broad enough picture. 'survex' provides easy-to-apply methods for explaining survival models, both complex black-boxes and simpler statistical models. They include methods specific to survival analysis such as SurvSHAP(t) introduced in Krzyzinski et al., (2023) , SurvLIME described in Kovalev et al., (2020) as well as extensions of existing ones described in Biecek et al., (2021) .

treeshap — by Mateusz Krzyzinski, 10 months ago

Compute SHAP Values for Your Tree-Based Models Using the 'TreeSHAP' Algorithm

An efficient implementation of the 'TreeSHAP' algorithm introduced by Lundberg et al., (2020) . It is capable of calculating SHAP (SHapley Additive exPlanations) values for tree-based models in polynomial time. Currently supported models include 'gbm', 'randomForest', 'ranger', 'xgboost', 'lightgbm'.

gips — by Adam Przemysław Chojecki, a year ago

Gaussian Model Invariant by Permutation Symmetry

Find the permutation symmetry group such that the covariance matrix of the given data is approximately invariant under it. Discovering such a permutation decreases the number of observations needed to fit a Gaussian model, which is of great use when it is smaller than the number of variables. Even if that is not the case, the covariance matrix found with 'gips' approximates the actual covariance with less statistical error. The methods implemented in this package are described in Graczyk et al. (2022) .

FuzzyResampling — by Maciej Romaniuk, 2 months ago

Resampling Methods for Triangular and Trapezoidal Fuzzy Numbers

The classical (i.e. Efron's, see Efron and Tibshirani (1994, ISBN:978-0412042317) "An Introduction to the Bootstrap") bootstrap is widely used for both the real (i.e. "crisp") and fuzzy data. The main aim of the algorithms implemented in this package is to overcome a problem with repetition of a few distinct values and to create fuzzy numbers, which are "similar" (but not the same) to values from the initial sample. To do this, different characteristics of triangular/trapezoidal numbers are kept (like the value, the ambiguity, etc., see Grzegorzewski et al. , Grzegorzewski et al. (2020) , Grzegorzewski et al. (2020) , Grzegorzewski and Romaniuk (2022) , Romaniuk and Hryniewicz (2019) ). Some additional procedures related to these resampling methods are also provided, like calculation of the Bertoluzza et al.'s distance (aka the mid/spread distance, see Bertoluzza et al. (1995) "On a new class of distances between fuzzy numbers") and estimation of the p-value of the one- and two- sample bootstrapped test for the mean (see Lubiano et al. (2016, )). Additionally, there are procedures which randomly generate trapezoidal fuzzy numbers using some well-known statistical distributions.

fstcore — by Mark Klik, a year ago

R Bindings to the 'Fstlib' Library

The 'fstlib' library provides multithreaded serialization of compressed data frames using the 'fst' format. The 'fst' format allows for random access of stored data and compression with the 'LZ4' and 'ZSTD' compressors.

CEC — by Simon Garnier, a month ago

Cross-Entropy Clustering

Splits data into Gaussian type clusters using the Cross-Entropy Clustering ('CEC') method. This method allows for the simultaneous use of various types of Gaussian mixture models, for performing the reduction of unnecessary clusters, and for discovering new clusters by splitting them. 'CEC' is based on the work of Spurek, P. and Tabor, J. (2014) .

WienR — by Raphael Hartmann, a year ago

Derivatives of the First-Passage Time Density and Cumulative Distribution Function, and Random Sampling from the (Truncated) First-Passage Time Distribution

First, we provide functions to calculate the partial derivative of the first-passage time diffusion probability density function (PDF) and cumulative distribution function (CDF) with respect to the first-passage time t (only for PDF), the upper barrier a, the drift rate v, the relative starting point w, the non-decision time t0, the inter-trial variability of the drift rate sv, the inter-trial variability of the rel. starting point sw, and the inter-trial variability of the non-decision time st0. In addition the PDF and CDF themselves are also provided. Most calculations are done on the logarithmic scale to make it more stable. Since the PDF, CDF, and their derivatives are represented as infinite series, we give the user the option to control the approximation errors with the argument 'precision'. For the numerical integration we used the C library cubature by Johnson, S. G. (2005-2013) < https://github.com/stevengj/cubature>. Numerical integration is required whenever sv, sw, and/or st0 is not zero. Note that numerical integration reduces speed of the computation and the precision cannot be guaranteed anymore. Therefore, whenever numerical integration is used an estimate of the approximation error is provided in the output list. Note: The large number of contributors (ctb) is due to copying a lot of C/C++ code chunks from the GNU Scientific Library (GSL). Second, we provide methods to sample from the first-passage time distribution with or without user-defined truncation from above. The first method is a new adaptive rejection sampler building on the works of Gilks and Wild (1992; ) and Hartmann and Klauer (in press). The second method is a rejection sampler provided by Drugowitsch (2016; ). The third method is an inverse transformation sampler. The fourth method is a "pseudo" adaptive rejection sampler that builds on the first method. For more details see the corresponding help files.