Meta-package for statistical and machine learning with a unified interface for model fitting, prediction, performance assessment, and presentation of results. Approaches for model fitting and prediction of numerical, categorical, or censored time-to-event outcomes include traditional regression models, regularization methods, tree-based methods, support vector machines, neural networks, ensembles, data preprocessing, filtering, and model tuning and selection. Performance metrics are provided for model assessment and can be estimated with independent test sets, split sampling, cross-validation, or bootstrap resampling. Resample estimation can be executed in parallel for faster processing and nested in cases of model tuning and selection. Modeling results can be summarized with descriptive statistics; calibration curves; variable importance; partial dependence plots; confusion matrices; and ROC, lift, and other performance curves.
MachineShop
is a meta-package for statistical and machine learning
with a unified interface for model fitting, prediction, performance
assessment, and presentation of results. Support is provided for
predictive modeling of numerical, categorical, and censored
time-to-event outcomes and for resample (bootstrap, cross-validation,
and split training-test sets) estimation of model performance. This
vignette introduces the package interface with a survival data analysis
example, followed by supported methods of variable specification;
applications to other response variable types; available performance
metrics, resampling techniques, and graphical and tabular summaries; and
modeling strategies.
# Current release from CRANinstall.packages("MachineShop")# Development version from GitHub# install.packages("devtools")devtools::install_github("brian-j-smith/MachineShop", ref = "develop")# Development version with vignettesdevtools::install_github("brian-j-smith/MachineShop", ref = "develop", build_vignettes = TRUE)
Once installed, the following R
commands will load the package and
display its help system documentation. Online documentation and examples
are available at the MachineShop main
website.
library(MachineShop)# Package help summary?MachineShop# VignetteRShowDoc("Introduction", package = "MachineShop")
auc
, fnr
, fpr
, rpp
, tnr
, tpr
.SurvMatrix
classes for predicted survival events and probabilities to eliminate need for separate times
arguments in calibration, confusion, metrics, and performance functions.MLControl
argument surv_times
to times
.case_weight
and case_strata
variables.BARTModel
.accuracy
, f_score
, kappa2
, npv
, ppv
, pr_auc
, precision
, recall
, roc_index
, sensitivity
, specificity
cindex
, gini
, mae
, mse
, msle
, r2
, rmse
, rmsle
.performance
and metric methods for ConfusionMatrix
.MLModel
slot and constructor argument nvars
with design
.BARTMachineModel
, LARSModel
.gini
, multi-class pr_auc
and roc_auc
, multivariate rmse
, msle
, rmsle
.MLMetric
class for performance metrics.as.data.frame
method for ModelFrame
.expand.model
function.label
slot to MLModel
.metricinfo/modelinfo
support for mixed argument types.calibration
argument n
to breaks
.modelmetrics
function to performance
.ModelMetrics/Diff
classes to Performance/Diff
.MLModelTune
slot resamples
to performance
.AdaBagModel
, AdaBoostModel
, BlackBoostModel
, EarthModel
, FDAModel
, GAMBoostModel
, GLMBoostModel
, MDAModel
, NaiveBayesModel
, PDAModel
, RangerModel
, RPartModel
, TreeModel
modelmetrics
function.accuracy
, brier
, cindex
, cross_entropy
, f_score
, kappa2
, mae
, mse
, npv
, ppv
, pr_auc
, precision
, r2
, recall
, roc_auc
, roc_index
, sensitivity
, specificity
, weighted_kappa2
.cutoff
argument to confusion
function.modelinfo
and metricinfo
functions.modelmetrics
method for Resamples
.ModelMetrics
class with print
and summary
methods.response
method for recipe
.Calibration
constructor.Confusion
constructor.Lift
constructor.calibration
arguments to observed and predicted responses.confusion
arguments to observed and predicted responses.lift
arguments to observed and predicted responses.metrics
and stats
function arguments to accept function names.Resamples
to arguments with multiple models.CoxModel
, GLMModel
, and SurvRegModel
constructor definitions so that model control parameters are specified directly instead of with a separate control
argument/structure.predict(..., times = numeric())
function calls to survival model fits to return predicted values in the same direction as survival times.predict(..., times = numeric())
function calls to CForestModel
fits to return predicted means instead of medians.tune
function argument metrics
to be defined in terms of a user-specified metric or metrics.cutoff
, cutoff_index
, na.rm
, and summary
.LMModel
), linear discriminant analysis (LDAModel
), and quadratic discriminant analysis (QDAModel
).strata
argument of ModelFrame
or the role of "case_strata"
for recipe variables."case_weight"
for recipe variables.prepper
due to its relocation from rsample
to recipes
.KNNModel
), stacked regression models (StackedModel
), super learner models (SuperModel
), and extreme gradient boosting (XGBModel
).TrainControl
) and split training and test sets (SplitControl
).ModelFrame
class for general model formula and dataset specification.modelmetrics()
.predict()
to automatically preprocess recipes and to use training data as the newdata
default.tune()
to lists of models.summary()
argument stats
to functions.GBMModel
and GLMNetModel
.MLControl
argument na.rm
default from FALSE
to TRUE
.na.rm
argument from modelmetrics()
.