Prediction Rule Ensembles

Derives prediction rule ensembles (PREs). Largely follows the procedure for deriving PREs as described in Friedman & Popescu (2008; ), with adjustments and improvements. The main function pre() derives prediction rule ensembles consisting of rules and/or linear terms for continuous, binary, count, multinomial, and multivariate continuous responses. Function gpe() derives generalized prediction ensembles, consisting of rules, hinge and linear functions of the predictor variables.


pre: an R package for deriving prediction rule ensembles

pre is an R package for deriving prediction rule ensembles for binary, multinomial, (multivariate) continuous, count and survival outcome variables. Input variables may be numeric, ordinal and categorical. An extensive description of the implementation and functionality is provided in Fokkema (2017). The package largely implements the algorithm for deriving prediction rule ensembles as described in Friedman & Popescu (2008), with several adjustments:

  1. The package is completely R based, allowing users better access to the results and more control over the parameters used for generating the prediction rule ensemble.
  2. The unbiased tree induction algorithms of Hothorn, Hornik, & Zeileis (2006) is used for deriving prediction rules, by default. Alternatively, the (g)lmtree algorithm of Zeileis, Hothorn, & Hornik (2008) can be employed, or the classification and regression tree (CART) algorithm of Breiman, Friedman, Olshen, & Stone (1984).
  3. The package supports a wider range of response variables.
  4. The package allows for plotting the final rule ensemble as a collection of simple decision trees.
  5. The initial ensembles may be generated as in bagging, boosting and/or random forests.
  6. Hinge functions of predictor variables may be included as baselearners, like in the multivariate adaptive regression splines method of Friedman (1991), using the gpe() function.

Note that pre is under development, and much work still needs to be done.

Example: Prediction rule ensemble for predicting ozone levels

To get a first impression of how pre works, we will fit a prediction rule ensemble to predict Ozone levels using the airquality dataset. We can fit a prediction rule ensemble using the pre() function:

library(pre)
set.seed(42)
airq.ens <- pre(Ozone ~ ., data = airquality[complete.cases(airquality), ])

We can print the resulting ensemble (alternatively, we could use the print method):

airq.ens
#> Final ensemble with cv error within 1se of minimum: 
#>   lambda =  2.331694
#>   number of terms = 13
#>   mean cv error (se) = 302.4644 (79.28454)
#> 
#>   cv error type : Mean-Squared Error
#> 
#>          rule  coefficient                          description
#>   (Intercept)   72.9680699                                    1
#>       rule191  -15.6401487              Wind > 5.7 & Temp <= 87
#>       rule173   -8.6645924              Wind > 5.7 & Temp <= 82
#>       rule204    8.1715564         Wind <= 10.3 & Solar.R > 148
#>        rule42   -7.6928586              Wind > 6.3 & Temp <= 84
#>        rule10   -6.8032890              Temp <= 84 & Temp <= 77
#>       rule192   -4.6926624  Wind > 5.7 & Temp <= 87 & Day <= 23
#>        rule93    3.1468762              Temp > 77 & Wind <= 8.6
#>        rule51   -2.6981570              Wind > 5.7 & Temp <= 84
#>        rule25   -2.4481192              Wind > 6.3 & Temp <= 82
#>        rule28   -2.1119330              Temp <= 84 & Wind > 7.4
#>        rule74   -0.8276940              Wind > 6.9 & Temp <= 84
#>       rule200   -0.4479854                       Solar.R <= 201
#>       rule166   -0.1202175              Wind > 6.9 & Temp <= 82

We can plot the baselarners in the ensemble using the plot method (note that only the nine most important baselearners are requested here):

plot(airq.ens, nterms = 9, cex = .5)

We can obtain the estimated coefficients for each of the baselearners using the coef method:

coefs <- coef(airq.ens)
coefs[1:10,]
#>            rule coefficient                         description
#> 201 (Intercept)   72.968070                                   1
#> 166     rule191  -15.640149             Wind > 5.7 & Temp <= 87
#> 149     rule173   -8.664592             Wind > 5.7 & Temp <= 82
#> 178     rule204    8.171556        Wind <= 10.3 & Solar.R > 148
#> 39       rule42   -7.692859             Wind > 6.3 & Temp <= 84
#> 10       rule10   -6.803289             Temp <= 84 & Temp <= 77
#> 167     rule192   -4.692662 Wind > 5.7 & Temp <= 87 & Day <= 23
#> 84       rule93    3.146876             Temp > 77 & Wind <= 8.6
#> 48       rule51   -2.698157             Wind > 5.7 & Temp <= 84
#> 23       rule25   -2.448119             Wind > 6.3 & Temp <= 82

We can assess the importance of input variables as well as baselearners using the importance() function:

importance(airq.ens, round = 4)

We can generate predictions for new observations using the predict method:

predict(airq.ens, newdata = airquality[1:4,])
#>        1        2        3        4 
#> 31.10390 20.82041 20.82041 21.26840

We can obtain partial dependence plots to assess the effect of single predictor variables on the outcome using the singleplot() function:

singleplot(airq.ens, varname = "Temp")

We can obtain partial dependence plots to assess the effects of pairs of predictor variables on the outcome using the pairplot() function:

pairplot(airq.ens, varnames = c("Temp", "Wind"))
#> Loading required namespace: akima
#> NOTE: function pairplot uses package 'akima', which has an ACM license. See also https://www.acm.org/publications/policies/software-copyright-notice.

We can assess the expected prediction error of the prediction rule ensemble through cross validation (10-fold, by default) using the cvpre() function:

set.seed(43)
airq.cv <- cvpre(airq.ens)
#> $MSE
#>       MSE        se 
#> 332.48191  72.23573 
#> 
#> $MAE
#>       MAE        se 
#> 13.186533  1.200747
airq.cv$accuracy
#> $MSE
#>       MSE        se 
#> 332.48191  72.23573 
#> 
#> $MAE
#>       MAE        se 
#> 13.186533  1.200747

We can assess the presence of input variable interactions using the interact() and bsnullinteract() funtions:

set.seed(44)
nullmods <- bsnullinteract(airq.ens)
int <- interact(airq.ens, nullmods = nullmods, c("Temp", "Wind", "Solar.R", "Month"))

We can check assess correlations between the baselearners using the corplot() function:

corplot(airq.ens)

Including hinge functions (multivariate adaptive regression splines)

More complex prediction ensembles can be obtained using the gpe() function. The abbreviation gpe stands for generalized prediction ensembles, which may also include hinge functions of the predictor variables as described in Friedman (1991), in addition to rules and/or linear terms. Addition of such hinge functions may further improve predictive accuracy. See the following example:

set.seed(42)
airq.gpe <- gpe(Ozone ~ ., data = airquality[complete.cases(airquality),], 
    base_learners = list(gpe_trees(), gpe_linear(), gpe_earth()))
airq.gpe
#> 
#> Final ensemble with cv error within 1se of minimum: 
#>   lambda =  2.44272
#>   number of terms = 14
#>   mean cv error (se) = 296.5474 (74.18922)
#> 
#>   cv error type : Mean-squared Error
#> 
#>                                   description  coefficient
#>                                   (Intercept)  67.22404190
#>                                    Temp <= 77  -7.72729559
#>                       Temp <= 84 & Wind > 7.4  -0.03574864
#>                  Wind <= 10.3 & Solar.R > 148   6.29678603
#>                       Wind > 5.7 & Temp <= 82  -6.51245200
#>                       Wind > 5.7 & Temp <= 84  -7.58076900
#>                       Wind > 5.7 & Temp <= 87  -9.64346611
#>           Wind > 5.7 & Temp <= 87 & Day <= 23  -4.38371322
#>                       Wind > 6.3 & Temp <= 82  -4.18790433
#>                       Wind > 6.3 & Temp <= 84  -4.88269073
#>                       Wind > 6.9 & Temp <= 82  -0.12188611
#>                       Wind > 6.9 & Temp <= 84  -0.93529314
#>   eTerm(Solar.R * h(6.3 - Wind), scale = 150)   1.42794086
#>        eTerm(h(6.9 - Wind) * Day, scale = 16)   1.35764132
#>   eTerm(Solar.R * h(9.7 - Wind), scale = 410)   9.84395780
#> 
#>   'h' in the 'eTerm' indicates the hinge function

References

Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and regression trees. Chapman&Hall/CRC.

Fokkema, M. (2017). Pre: An r package for fitting prediction rule ensembles. arXiv:1707.07149. Retrieved from https://arxiv.org/abs/1707.07149

Friedman, J. (1991). Multivariate adaptive regression splines. The Annals of Statistics, 19(1), 1–67.

Friedman, J., & Popescu, B. (2008). Predictive learning via rule ensembles. The Annals of Applied Statistics, 2(3), 916–954. Retrieved from http://www.jstor.org/stable/30245114

Hothorn, T., Hornik, K., & Zeileis, A. (2006). Unbiased recursive partitioning: A conditional inference framework. Journal of Computational and Graphical Statistics, 15(3), 651–674.

Zeileis, A., Hothorn, T., & Hornik, K. (2008). Model-based recursive partitioning. Journal of Computational and Graphical Statistics, 17(2), 492–514.

News

Changes in Version 0.6 (2018-08-03)

  • Added support for survival responses (i.e., family = "cox") in pre()

  • Added summary methods for pre and gpe.

  • Extended support to all response variable types available in pre() for functions plot(), importance() and cvpre().

  • plot.pre now allows for specifying separate plotting colors for rules with positive and negative coefficients.

  • coef and print methods for pre now return descriptions for the intercept (and factor variables), thanks to suggestion by Stephen Milborrow.

  • Bug fix in pre(): ordered factors no longer yield error. Implemented new argument 'ordinal' in pre(), which specifies how ordered factors should be processed.

  • Bug fix in cvpre(): pclass argument now processed correctly.

  • Bug fix in cvpre(): previously, SDs insteas of SEs were returned for binary classification. Accurate standard errors are returned now.

  • Bugs fixed in coef.pre(), print.pre(), plot.pre() and importance() when tree.unbiased = FALSE, thanks to a bug report by Stephen Milborrow.

Changes in Version 0.5 (2018-05-07)

  • Function pre() now also supports multinomial and multivariate gaussian response variables.

  • Function pre() now has argument 'tree.unbiased'; if set to FALSE, the CART algorithm (as implemented in package 'rpart') is employed for rule induction.

  • Argument 'maxdepth' of function pre() allows for specifying varying maximum depth across trees, through specifying a vector of length ntrees, or a random number generating function. See ?maxdepth.sampler for details.

Changes in Version 0.4 (2017-08-31)

  • Added dataset 'carrillo'

  • By default, a gradient boosting approach is now taken for all response types. That is, partykit::ctree() and a learning rate of .01 is employed by default. Alternatively, glmtree() can be employed for tree induction by sprecifying use.grad = FALSE.

  • The 'family' argument in pre() now takes character strings as well as glm family objects.

  • Functions pairplot() and interact() now use HCL instead of highly saturated HSV colors as default plotting colors.

  • Bug fixed in plot.pre: Node directions are now in accordance with rule definition.

  • Bug fixed in predict.pre: No error printed when response variable is not supplied.

Changes in Version 0.3 (2017-08-03):

  • Function gpe() added, which fits general prediction ensembles. By default, it fits an ensemble of rules, linear and hinge functions. Function gpe() allows for specifying custom baselearner generating functions and a custom fitting function for the final model.

  • Numerous bugs fixed, yielding faster computation times and clearer plots with more customization options.

  • Added support for count responses. Function pre() now has a 'family' argument, which should be set to 'poisson' for count outcomes (the 'family' argument is set automatically to 'gaussian' for numeric response variables and to 'binomial' for binary response variables (factors)).

  • A gradient boosting approach for binary outcomes is applied, by default, substantially reducing computation times. This can be turned off through the 'use.grad' argument in function pre().

  • The default of the 'learnrate' argument of function pre() has been changed to .01, by default. Before, it was .01 for continuous outcomes, but 0 for binary outcomes, to reduce computation time. With gradient boosting implemented, computation time is much reduced.

  • Argument 'tree.control' in function pre() allows for passing arguments to partykit tree fitting functions.

  • Arguments for the cv.glmnet() function are directly passed through better use of ... . Most importantly, this means that argument 'mod.sel.crit' cannot be used anymore and should be referred to as 'type.measure' (which will be directly passed to cv.glmnet). Similarly, 'thres' and 'standardize' are not explicit arguments of function pre() anymore and can now be directly passed to cv.glmnet() using ... .

  • Better use of sample weights: weights specified with the 'weights' argument in pre() are now used as weights in the subsampling procedure, instead of as observation weights in the tree-fitting procedure.

  • Added corplot() function, which shows the correlation between the baselearners in the ensemble.

  • Function pairplot() returns a heatmap by default, a 3D or contour plot can also be requested.

  • Appearance of plot resulting from interaction() improved.

Changes in Version 0.2 (2017-04-25):

  • Added print() and plot() method for objects of class pre.

  • Added support for using functions like factor() and log() in formula statement of function pre(). (thanks to Bill Venables for suggesting this)

  • Added support for parallel computating in functions pre(), cvpre(), bsnullinteract() and interact().

  • Winsorizing points used for the linear terms are reported in the description of the base learners returned by coef() and importance(). (Thanks to Rishi Sadhir for suggesting this)

  • Added README file.

  • Legend included in plot for interaction test statistics.

  • Fixed importance() function to allow for selecting final ensemble with different value than 'lambda.1se'.

  • Cleaned up all occurrences of set.seed()

  • Fixed cvpre() function: penalty.par.val argument now included

  • Many minor bug fixes.

Changes in Version 0.1 (2016-12-23):

  • First CRAN release.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("pre")

0.6.0 by Marjolein Fokkema, 8 months ago


https://github.com/marjoleinF/pre


Report a bug at https://github.com/marjoleinF/pre/issues


Browse source code at https://github.com/cran/pre


Authors: Marjolein Fokkema [aut, cre] , Benjamin Christoffersen [aut]


Documentation:   PDF Manual  


GPL-2 | GPL-3 license


Imports earth, Formula, glmnet, graphics, methods, partykit, rpart, stringr, survival

Suggests akima, datasets, doParallel, foreach, glmertree, grid, mlbench, testthat, mboost


Suggested by plotmo.


See at CRAN