A common interface is provided to allow users to specify a model without having to remember the different argument names across different functions or computational engines (e.g. 'R', 'Spark', 'Stan', etc).
One issue with different functions available in R that do the same thing is that they can have different interfaces and arguments. For example, to fit a random forest classification model, we might have:
rf_1 <- randomForest(x, y, mtry = 12, ntree = 2000, importance = TRUE) # From rangerrf_2 <- ranger( y ~ ., data = dat, mtry = 12, num.trees = 2000, importance = 'impurity') # From sparklyrrf_3 <- ml_random_forest( dat, intercept = FALSE, response = "y", features = names(dat)[names(dat) != "y"], col.sample.rate = 12, num.trees = 2000)
Note that the model syntax is very different and that the argument names (and formats) are also different. This is a pain if you go between implementations.
In this example,
The idea of parsnip
is to:
rand_forest
instead of ranger::ranger
or other specific packages.n.trees
, ntrees
, trees
) so that users can remember a single name. This will help across model types too so that trees
will be the same argument across random forest as well as boosting or bagging.Using the example above, the parsnip
approach would be
rand_forest(mtry = 12, trees = 2000) %>% set_engine("ranger", importance = 'impurity') %>% fit(y ~ ., data = dat)
The engine can be easily changed and the mode can be determined when fit
is called. To use Spark, the change is simple:
rand_forest(mtry = 12, trees = 2000) %>% set_engine("spark") %>% fit(y ~ ., data = dat)
To install it, use:
require(devtools)install_github("tidymodels/parsnip")
First CRAN release
set_engine
. There is no engine
argumentothers
has been replaced by ...
regularization
was changed to penalty
in a few models to be consistent with this change.predict
methods, the earth
package will need to be attached to be fully operational.snake_case
, newdata
was changed to new_data
.predict_raw
method was added.fit
interface was previously used to cover both the x/y interface as well as the formula interface. Now, fit
is the formula interface and fit_xy
is for the x/y interface.NEWS.md
file to track changes to the package.predict
methods were overhauled to be consistent.