Examples: visualization, C++, networks, data cleaning, html widgets, ropensci.

Found 1844 packages in 0.03 seconds

diversityForest — by Roman Hornung, 9 months ago

Innovative Complex Split Procedures in Random Forests Through Candidate Split Sampling

Implementation of three methods based on the diversity forest (DF) algorithm (Hornung, 2022, ), a split-finding approach that enables complex split procedures in random forests. The package includes: 1. Interaction forests (IFs) (Hornung & Boulesteix, 2022, ): Model quantitative and qualitative interaction effects using bivariable splitting. Come with the Effect Importance Measure (EIM), which can be used to identify variable pairs that have well-interpretable quantitative and qualitative interaction effects with high predictive relevance. 2. Two random forest-based variable importance measures (VIMs) for multi-class outcomes: the class-focused VIM, which ranks covariates by their ability to distinguish individual outcome classes from the others, and the discriminatory VIM, which measures overall covariate influence irrespective of class-specific relevance. 3. The basic form of diversity forests that uses conventional univariable, binary splitting (Hornung, 2022). Except for the multi-class VIMs, all methods support categorical, metric, and survival outcomes. The package includes visualization tools for interpreting the identified covariate effects. Built as a fork of the 'ranger' R package (main author: Marvin N. Wright), which implements random forests using an efficient C++ implementation.

party — by Torsten Hothorn, a year ago

A Laboratory for Recursive Partytioning

A computational toolbox for recursive partitioning. The core of the package is ctree(), an implementation of conditional inference trees which embed tree-structured regression models into a well defined theory of conditional inference procedures. This non-parametric class of regression trees is applicable to all kinds of regression problems, including nominal, ordinal, numeric, censored as well as multivariate response variables and arbitrary measurement scales of the covariates. Based on conditional inference trees, cforest() provides an implementation of Breiman's random forests. The function mob() implements an algorithm for recursive partitioning based on parametric models (e.g. linear models, GLMs or survival regression) employing parameter instability tests for split selection. Extensible functionality for visualizing tree-structured regression models is available. The methods are described in Hothorn et al. (2006) , Zeileis et al. (2008) and Strobl et al. (2007) .

h2o — by Tomas Fryda, 2 years ago

R Interface for the 'H2O' Scalable Machine Learning Platform

R interface for 'H2O', the scalable open source machine learning platform that offers parallelized implementations of many supervised and unsupervised machine learning algorithms such as Generalized Linear Models (GLM), Gradient Boosting Machines (including XGBoost), Random Forests, Deep Neural Networks (Deep Learning), Stacked Ensembles, Naive Bayes, Generalized Additive Models (GAM), ANOVA GLM, Cox Proportional Hazards, K-Means, PCA, ModelSelection, Word2Vec, as well as a fully automatic machine learning algorithm (H2O AutoML).

survcompare — by Diana Shamsutdinova, 8 months ago

Nested Cross-Validation to Compare Cox-PH, Cox-Lasso, Survival Random Forests

Performs repeated nested cross-validation for Cox Proportionate Hazards, Cox Lasso, Survival Random Forest, and their ensemble. Returns internally validated concordance index, time-dependent area under the curve, Brier score, calibration slope, and statistical testing of non-linear ensemble outperforming the baseline Cox model. In this, it helps researchers to quantify the gain of using a more complex survival model, or justify its redundancy. Equally, it shows the performance value of the non-linear and interaction terms, and may highlight the need of further feature transformation. Further details can be found in Shamsutdinova, Stamate, Roberts, & Stahl (2022) "Combining Cox Model and Tree-Based Algorithms to Boost Performance and Preserve Interpretability for Health Outcomes" , where the method is described as Ensemble 1.

trtf — by Torsten Hothorn, a year ago

Transformation Trees and Forests

Recursive partytioning of transformation models with corresponding random forest for conditional transformation models as described in 'Transformation Forests' (Hothorn and Zeileis, 2021, ) and 'Top-Down Transformation Choice' (Hothorn, 2018, ).

bonsai — by Emil Hvitfeldt, 8 months ago

Model Wrappers for Tree-Based Models

Bindings for additional tree-based model engines for use with the 'parsnip' package. Models include gradient boosted decision trees with 'LightGBM' (Ke et al, 2017.), conditional inference trees and conditional random forests with 'partykit' (Hothorn and Zeileis, 2015. and Hothorn et al, 2006. ), and accelerated oblique random forests with 'aorsf' (Jaeger et al, 2022 ).

yaImpute — by Jeffrey S. Evans, a year ago

Nearest Neighbor Observation Imputation and Evaluation Tools

Performs nearest neighbor-based imputation using one or more alternative approaches to processing multivariate data. These include methods based on canonical correlation: analysis, canonical correspondence analysis, and a multivariate adaptation of the random forest classification and regression techniques of Leo Breiman and Adele Cutler. Additional methods are also offered. The package includes functions for comparing the results from running alternative techniques, detecting imputation targets that are notably distant from reference observations, detecting and correcting for bias, bootstrapping and building ensemble imputations, and mapping results.

missRanger — by Michael Mayer, a year ago

Fast Imputation of Missing Values

Alternative implementation of the beautiful 'MissForest' algorithm used to impute mixed-type data sets by chaining random forests, introduced by Stekhoven, D.J. and Buehlmann, P. (2012) . Under the hood, it uses the lightning fast random forest package 'ranger'. Between the iterative model fitting, we offer the option of using predictive mean matching. This firstly avoids imputation with values not already present in the original data (like a value 0.3334 in 0-1 coded variable). Secondly, predictive mean matching tries to raise the variance in the resulting conditional distributions to a realistic level. This would allow, e.g., to do multiple imputation when repeating the call to missRanger(). Out-of-sample application is supported as well.

meta — by Guido Schwarzer, 5 months ago

General Package for Meta-Analysis

User-friendly general package providing standard methods for meta-analysis and supporting Schwarzer, Carpenter, and Rücker , "Meta-Analysis with R" (2015): - common effect and random effects meta-analysis; - several plots (forest, funnel, Galbraith / radial, L'Abbe, Baujat, bubble); - three-level meta-analysis model; - generalised linear mixed model; - logistic regression with penalised likelihood for rare events; - Hartung-Knapp method for random effects model; - Kenward-Roger method for random effects model; - prediction interval; - statistical tests for funnel plot asymmetry; - trim-and-fill method to evaluate bias in meta-analysis; - meta-regression; - cumulative meta-analysis and leave-one-out meta-analysis; - import data from 'RevMan 5'; - produce forest plot summarising several (subgroup) meta-analyses.

icRSF — by Hui Xu, 8 years ago

A Modified Random Survival Forest Algorithm

Implements a modification to the Random Survival Forests algorithm for obtaining variable importance in high dimensional datasets. The proposed algorithm is appropriate for settings in which a silent event is observed through sequentially administered, error-prone self-reports or laboratory based diagnostic tests. The modified algorithm incorporates a formal likelihood framework that accommodates sequentially administered, error-prone self-reports or laboratory based diagnostic tests. The original Random Survival Forests algorithm is modified by the introduction of a new splitting criterion based on a likelihood ratio test statistic.