A Common API to Modeling and Analysis Functions

A common interface is provided to allow users to specify a model without having to remember the different argument names across different functions or computational engines (e.g. 'R', 'Spark', 'Stan', etc).

Travis build status Coverage status

One issue with different functions available in R that do the same thing is that they can have different interfaces and arguments. For example, to fit a random forest classification model, we might have:

rf_1 <- randomForest(x, y, mtry = 12, ntree = 2000, importance = TRUE)
# From ranger
rf_2 <- ranger(
  y ~ ., 
  data = dat, 
  mtry = 12, 
  num.trees = 2000, 
  importance = 'impurity'
# From sparklyr
rf_3 <- ml_random_forest(
  intercept = FALSE, 
  response = "y", 
  features = names(dat)[names(dat) != "y"], 
  col.sample.rate = 12,
  num.trees = 2000

Note that the model syntax is very different and that the argument names (and formats) are also different. This is a pain if you go between implementations.

In this example,

  • the type of model is "random forest"
  • the mode of the model is "classification" (as opposed to regression, etc).
  • the computational engine is the name of the R package.

The idea of parsnip is to:

  • Separate the definition of a model from its evaluation.
  • Decouple the model specification from the implementation (whether the implementation is in R, spark, or something else). For example, the user would call rand_forest instead of ranger::ranger or other specific packages.
  • Harmonize the argument names (e.g. n.trees, ntrees, trees) so that users can remember a single name. This will help across model types too so that trees will be the same argument across random forest as well as boosting or bagging.

Using the example above, the parsnip approach would be

rand_forest(mtry = 12, trees = 2000) %>%
  set_engine("ranger", importance = 'impurity') %>%
  fit(y ~ ., data = dat)

The engine can be easily changed and the mode can be determined when fit is called. To use Spark, the change is simple:

rand_forest(mtry = 12, trees = 2000) %>%
  set_engine("spark") %>%
  fit(y ~ ., data = dat)

To install it, use:



parsnip 0.0.1

First CRAN release


  • The engine, and any associated arguments, are now specified using set_engine. There is no engine argument


  • Arguments to modeling functions are now captured as quosures.
  • others has been replaced by ...
  • Data descriptor names have beemn changed and are now functions. The descriptor definitions for "cols" and "preds" have been switched.


  • regularization was changed to penalty in a few models to be consistent with this change.
  • If a mode is not chosen in the model specification, it is assigned at the time of fit. 51
  • The underlying modeling packages now are loaded by namespace. There will be some exceptions noted in the documentation for each model. For example, in some predict methods, the earth package will need to be attached to be fully operational.


  • To be consistent with snake_case, newdata was changed to new_data.
  • A predict_raw method was added.


  • A package dependency suffered a new change.


  • The fit interface was previously used to cover both the x/y interface as well as the formula interface. Now, fit is the formula interface and fit_xy is for the x/y interface.
  • Added a NEWS.md file to track changes to the package.
  • predict methods were overhauled to be consistent.
  • MARS was added.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.