Meta-package for statistical and machine learning with a common interface for model fitting, prediction, performance assessment, and presentation of results. Supports predictive modeling of numerical, categorical, and censored time-to-event outcomes and resample (bootstrap and cross-validation) estimation of model performance.

`MachineShop`

is a meta-package for statistical and machine learning
with a unified interface for model fitting, prediction, performance
assessment, and presentation of results. Support is provided for
predictive modeling of numerical, categorical, and censored
time-to-event outcomes and for resample (bootstrap, cross-validation,
and split training-test sets) estimation of model performance. This
vignette introduces the package interface with a survival data analysis
example, followed by supported methods of variable specification;
applications to other response variable types; available performance
metrics, resampling techniques, and graphical and tabular summaries; and
modeling strategies.

- Unified and concise interface for model fitting, prediction, and performance assessment.
- Current support for 49 established models from 25
**R**packages. - Ensemble modeling with stacked regression and super learners.
- Modeling of response variables types: binary factors, multi-class nominal and ordinal factors, numeric vectors and matrices, and censored time-to-event survival.
- Model specification with traditional formulas and with flexible pre-processing recipes.
- Resample estimation of predictive performance, including cross-validation, bootstrap resampling, and split training-test set validation.
- Parallel execution of resampling algorithms.
- Choices of performance metrics: accuracy, areas under ROC and
precision recall curves, Brier score, coefficient of determination
(R
^{2}), concordance index, cross entropy, F score, Gini coefficient, unweighted and weighted Cohen’s kappa, mean absolute error, mean squared error, mean squared log error, positive and negative predictive values, precision and recall, and sensitivity and specificity. - Graphical and tabular performance summaries: calibration curves, confusion matrices, partial dependence plots, performance curves, lift curves, and variable importance.
- Model tuning over automatically generated grids of parameter values and randomly sampled grid points.
- Model selection and comparisons for any combination of models and model parameter values.
- User-definable models and performance metrics.

# Current release from CRANinstall.packages("MachineShop")# Development version from GitHub# install.packages("devtools")devtools::install_github("brian-j-smith/MachineShop", ref = "develop")# Development version with vignettesdevtools::install_github("brian-j-smith/MachineShop", ref = "develop", build_vignettes = TRUE)

Once installed, the following `R`

commands will load the package and
display its help system documentation. Online documentation and examples
are available at the MachineShop main
website.

library(MachineShop)# Package help summary?MachineShop# VignetteRShowDoc("Introduction", package = "MachineShop")

- Implement metrics:
`auc`

,`fnr`

,`fpr`

,`rpp`

,`tnr`

,`tpr`

. - Implement performance curves, including ROC and precision recall.
- Implement
`SurvMatrix`

classes for predicted survival events and probabilities to eliminate need for separate`times`

arguments in calibration, confusion, metrics, and performance functions. - Add calibration curves for predicted survival means.
- Add lift curves for predicted survival probabilities.
- Add recipe support for survival and matrix outcomes.
- Rename
`MLControl`

argument`surv_times`

to`times`

. - Fix identification of recipe
`case_weight`

and`case_strata`

variables. - Launch package website.
- Bring Introduction vignette up to date with package features.

- Implement model:
`BARTModel`

. - Implement model tuning over automatically generated grids of parameter values and random sampling of grid points.
- Add metrics for predicted survival times:
`accuracy`

,`f_score`

,`kappa2`

,`npv`

,`ppv`

,`pr_auc`

,`precision`

,`recall`

,`roc_index`

,`sensitivity`

,`specificity`

- Add metrics for predicted survival means:
`cindex`

,`gini`

,`mae`

,`mse`

,`msle`

,`r2`

,`rmse`

,`rmsle`

. - Add
`performance`

and metric methods for`ConfusionMatrix`

. - Add confusion matrices for predicted survival times.
- Standardize predict functions to return mean survival when times are not specified.
- Replace
`MLModel`

slot and constructor argument`nvars`

with`design`

.

- Implement models:
`BARTMachineModel`

,`LARSModel`

. - Implement performance metrics:
`gini`

, multi-class`pr_auc`

and`roc_auc`

, multivariate`rmse`

,`msle`

,`rmsle`

. - Implement smooth calibration curves.
- Implement
`MLMetric`

class for performance metrics. - Add
`as.data.frame`

method for`ModelFrame`

. - Add
`expand.model`

function. - Add
`label`

slot to`MLModel`

. - Expand
`metricinfo/modelinfo`

support for mixed argument types. - Rename
`calibration`

argument`n`

to`breaks`

. - Rename
`modelmetrics`

function to`performance`

. - Rename
`ModelMetrics/Diff`

classes to`Performance/Diff`

. - Change
`MLModelTune`

slot`resamples`

to`performance`

.

- Implement models:
`AdaBagModel`

,`AdaBoostModel`

,`BlackBoostModel`

,`EarthModel`

,`FDAModel`

,`GAMBoostModel`

,`GLMBoostModel`

,`MDAModel`

,`NaiveBayesModel`

,`PDAModel`

,`RangerModel`

,`RPartModel`

,`TreeModel`

- Implement user-specified performance metrics in
`modelmetrics`

function. - Implement metrics:
`accuracy`

,`brier`

,`cindex`

,`cross_entropy`

,`f_score`

,`kappa2`

,`mae`

,`mse`

,`npv`

,`ppv`

,`pr_auc`

,`precision`

,`r2`

,`recall`

,`roc_auc`

,`roc_index`

,`sensitivity`

,`specificity`

,`weighted_kappa2`

. - Add
`cutoff`

argument to`confusion`

function. - Add
`modelinfo`

and`metricinfo`

functions. - Add
`modelmetrics`

method for`Resamples`

. - Add
`ModelMetrics`

class with`print`

and`summary`

methods. - Add
`response`

method for`recipe`

. - Export
`Calibration`

constructor. - Export
`Confusion`

constructor. - Export
`Lift`

constructor. - Extend
`calibration`

arguments to observed and predicted responses. - Extend
`confusion`

arguments to observed and predicted responses. - Extend
`lift`

arguments to observed and predicted responses. - Extend
`metrics`

and`stats`

function arguments to accept function names. - Extend
`Resamples`

to arguments with multiple models. - Change
`CoxModel`

,`GLMModel`

, and`SurvRegModel`

constructor definitions so that model control parameters are specified directly instead of with a separate`control`

argument/structure. - Change
`predict(..., times = numeric())`

function calls to survival model fits to return predicted values in the same direction as survival times. - Change
`predict(..., times = numeric())`

function calls to`CForestModel`

fits to return predicted means instead of medians. - Change
`tune`

function argument`metrics`

to be defined in terms of a user-specified metric or metrics. - Deprecate MLControl arguments
`cutoff`

,`cutoff_index`

,`na.rm`

, and`summary`

.

- Implement linear models (
`LMModel`

), linear discriminant analysis (`LDAModel`

), and quadratic discriminant analysis (`QDAModel`

). - Implement confusion matrices.
- Support matrix response variables.
- Support user-specified stratification variables for resampling via the
`strata`

argument of`ModelFrame`

or the role of`"case_strata"`

for recipe variables. - Support user-specified case weights for model fitting via the role of
`"case_weight"`

for recipe variables. - Provide fallback for models with undefined variable importance.
- Update the importing of
`prepper`

due to its relocation from`rsample`

to`recipes`

.

- Implement partial dependence, calibration, and lift estimation and plotting.
- Implement k-nearest neighbors model (
`KNNModel`

), stacked regression models (`StackedModel`

), super learner models (`SuperModel`

), and extreme gradient boosting (`XGBModel`

). - Implement resampling constructors for training resubstitution (
`TrainControl`

) and split training and test sets (`SplitControl`

). - Implement
`ModelFrame`

class for general model formula and dataset specification. - Add multi-class Brier score to
`modelmetrics()`

. - Extend
`predict()`

to automatically preprocess recipes and to use training data as the`newdata`

default. - Extend
`tune()`

to lists of models. - Extent
`summary()`

argument`stats`

to functions. - Fix survival probability calculations in
`GBMModel`

and`GLMNetModel`

. - Change
`MLControl`

argument`na.rm`

default from`FALSE`

to`TRUE`

. - Removed
`na.rm`

argument from`modelmetrics()`

.

- Initial public release