Creates classifier for binary outcomes using Adaptive Boosting
(AdaBoost) algorithm on decision stumps with a fast C++ implementation.
For a description of AdaBoost, see Freund and Schapire (1997)
Machine learning package used to build and test classifiers using AdaBoost on decision stumps.
Creates classifier for binary outcomes using Adaptive Boosting (AdaBoost) on decision stumps with a fast C++ implementation. Feature vectors may be a combination of continuous (numeric) and categorical (string, factor) elements. Methods for classifier assessment, predictions, and cross-validation also included. The advantage of this type of classifier is that it is non-linear but it is more interpretable than random forests, neural-nets, and other non-linear classifiers.
See jadonwagstaff.github.io/sboost for a description of how the classifier functions, and what makes this classifier more interpretable than others.
For original paper describing AdaBoost see:
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55(1), 119-139 (1997)
Install this package from the CRAN repository.
Alternatively, use devtools to install the development version of this package.
To install devtools on R run:
After devtools is installed, to install the sboost package on R run:
sboost - Main machine learning algorithm, uses categorical or continuous features to build a classifier that predicts a binary outcome. Run
?sboost::sboost to see documentation in R.
validate - Uses k-fold cross validation on a training set to validate the classifier.
assess - Shows performance of a classifier on a set of feature vectors and outcomes.
predict - Outputs predictions of a classifier on a set of feature vectors.
Classifier output from sboost() now includes right_categories column which is similar to left_categories but is associated with the outcomes in right column. When assessing this new classifier, if a categorical input cannot be found in either right_categories or left_categories (i.e. was not found in training data) the vote for this feature will now be 0. (Before this, if an input was not found in left_categories, it was assumed that the input would be associated with the right outcome.)
There is a new optional parameter in the sboost() and validate() functions called verbose. The default value for verbose is FALSE, and there is no change from previous versions when verbose = FALSE. If verbose is set to TRUE, a progress bar will appear in the console for each classifier that is created.