Last updated on 2020-10-28 by Torsten Hothorn
Several add-on packages implement ideas and methods developed at the borderline between computer science and statistics - this field of research is usually referred to as machine learning. The packages can be roughly structured into the following topics:
ctree()
is based on
non-parametric conditional inference procedures for testing
independence between response and each input variable whereas
mob()
can be used to partition parametric models.
Extensible tools for visualizing binary trees
and node distributions of the response are available in package
party and partykit as well.
svm()
from
e1071 offers an interface to the LIBSVM library and
package kernlab implements a flexible framework
for kernel learning (including SVMs, RVMs and other kernel
learning algorithms). An interface to the SVMlight implementation
(only for one-against-all classification) is provided in package
klaR.
The relevant dimension in kernel feature spaces can be estimated
using rdetools which also offers procedures for model selection
and prediction.tune()
for hyper parameter tuning and
function errorest()
(ipred) can be used for
error rate estimation. The cost parameter C for support vector
machines can be chosen utilizing the functionality of package
svmpath.
Functions for ROC analysis and other visualisation techniques
for comparing candidate classifiers are available from package
ROCR.
Packages hdi and stabs implement stability
selection for a range of models, hdi
also offers other inference procedures in high-dimensional models.stats::termplot()
function package can be used to plot the
terms in a model whose predict method supports type="terms"
.
The effects package provides graphical and tabular effect
displays for models with a linear predictor (e.g., linear and generalized
linear models). Friedman’s partial dependence plots (PDPs), that are low
dimensional graphical renderings of the prediction function, are implemented
in a few packages. gbm, randomForest and
randomForestSRC provide their own functions for displaying PDPs,
but are limited to the models fit with those packages (the function
partialPlot
from randomForest is more limited since
it only allows for one predictor at a time). Packages pdp,
plotmo, and ICEbox are more general and allow for the
creation of PDPs for a wide variety of machine learning models (e.g., random
forests, support vector machines, etc.); both pdp and
plotmo support multivariate displays (plotmo is
limited to two predictors while pdp uses trellis graphics to
display PDPs involving three predictors). By default, plotmo
fixes the background variables at their medians (or first level for factors)
which is faster than constructing PDPs but incorporates less information.
ICEbox focuses on constructing individual conditional expectation
(ICE) curves, a refinement over Friedman's PDPs. ICE curves, as well as
centered ICE curves can also be constructed with the partial()
function from the pdp package. ggRandomForests
provides ggplot2-based tools for the graphical exploration of random forest
models (e.g., variable importance plots and PDPs) from the
randomForest and randomForestSRC packages.
8 years ago by Anders Gorst-Rasmussen
Regularization for semiparametric additive hazards regression
5 months ago by Reza Mohammadi
Bayesian Structure Learning in Graphical Models using Birth-Death MCMC
8 months ago by Miron Bartosz Kursa
Wrapper Algorithm for All Relevant Feature Selection
a year ago by Marko Robnik-Sikonja
Classification, Regression and Feature Evaluation
3 months ago by David Meyer
Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien
a year ago by Christoph Bergmeir
Fuzzy Rule-Based Systems for Classification and Regression Tasks
13 days ago by Trevor Hastie
Lasso and Elastic-Net Regularized Generalized Linear Models
3 years ago by Mee Young Park
L1 Regularization Path for Generalized Linear Models and Cox Proportional Hazards Model
7 months ago by Patrick Breheny
Regularization Paths for Regression Models with Grouped Covariates
3 months ago by Erin LeDell
R Interface for the 'H2O' Scalable Machine Learning Platform
4 years ago by Thibault Helleputte
Linear Predictive Models Based on the 'LIBLINEAR' C/C++ Library
6 months ago by Patrick Breheny
Regularization Paths for SCAD and MCP Penalized Regression Models
a year ago by Michal Majka
High Performance Implementation of the Naive Bayes Algorithm
9 months ago by Brian Ripley
Feed-Forward Neural Networks and Multinomial Log-Linear Models
4 years ago by Holger von Jouanne-Diedrich
One Rule Machine Learning Classification Algorithm with Enhancements
a year ago by Christoph Bergmeir
OPUS Miner Algorithm for Filtered Top-k Association Discovery
6 years ago by Daniela Witten
Penalized Classification using Fisher's Linear Discriminant
4 months ago by Stephen Milborrow
Plot a Model's Residuals, Response, and Partial Dependence Plots
3 years ago by Andy Liaw
Breiman and Cutler's Random Forests for Classification and Regression
a year ago by Udaya B. Kogalur
Fast Unified Random Forests for Survival, Regression, and Classification (RF-SRC)
a year ago by Mark Seligman
Extensible, Parallelizable Implementation of the Random Forest Algorithm
8 years ago by Jan Saputra Mueller
Relevant Dimension Estimation (RDE) in Feature Spaces
2 years ago by Jasjeet Singh Sekhon
R Version of GENetic Optimization Using Derivatives
a year ago by Christoph Bergmeir
Continuous Optimization using Memetic Algorithms with Local Search Chains (MA-LS-Chains) in R
a year ago by Christoph Bergmeir
Data Analysis Using Rough Set and Fuzzy Rough Set Theories
a year ago by Christoph Bergmeir
Neural Networks using the Stuttgart Neural Network Simulator (SNNS)
13 days ago by Bob Obenchain
Maximum Likelihood Shrinkage using Generalized Ridge or Least Angle Regression Methods
6 years ago by Korbinian Strimmer
Shrinkage Discriminant Analysis and CAT Score Variable Selection
8 months ago by Reza Mohammadi
Bayesian Graphical Estimation using Spike-and-Slab Priors