a Fast Implementation of Adaboost

Implements Adaboost based on C++ backend code. This is blazingly fast and especially useful for large, in memory data sets. The package uses decision trees as weak classifiers. Once the classifiers have been trained, they can be used to predict new data. Currently, we support only binary classification tasks. The package implements the Adaboost.M1 algorithm and the real Adaboost(SAMME.R) algorithm.


fastAdaboost is a blazingly fast implementation of adaboost for R. It uses C++ code in the backend to provide an implementation of adaboost that is about 100 times faster than native R based libraries currently available. This is especially useful if your data size is large. fastAdaboost works only for binary classification tasks presently. It implements Freund and Schapire's Adaboost.M1 and Zhu et. al's SAMME.R (real adaboost) algorithms.

Install

It is not submitted to CRAN yet.

devtools::install_github("souravc83/fastAdaboost")

Quick Demo

library("fastAdaboost")
set.seed(9999)
 
num_each <- 1000
fakedata <- data.frame( X=c(rnorm(num_each,0,1),rnorm(num_each,1.5,1)), Y=c(rep(0,num_each),rep(1,num_each) ) )
fakedata$Y <- factor(fakedata$Y)
#run adaboost
test_adaboost <- adaboost(Y~X, fakedata, 10)
#print(A)
pred <- predict( test_adaboost, newdata=fakedata)
print(paste("Adaboost Error on fakedata:",pred$error))
#> [1] "Adaboost Error on fakedata: 0.1225"
print(table(pred$class,fakedata$Y))
#>    
#>       0   1
#>   0 848  93
#>   1 152 907
 
test_real_adaboost <- real_adaboost(Y~X, fakedata, 10)
pred_real <- predict(test_real_adaboost,newdata=fakedata)
print(paste("Real Adaboost Error on fakedata:", pred_real$error))
#> [1] "Real Adaboost Error on fakedata: 0.1105"
print(table(pred_real$class,fakedata$Y))
#>    
#>       0   1
#>   0 906 127
#>   1  94 873

Performance Benchmarking

How fast is fastAdaboost compared to native R implementations? I used the microbenchmark package to compare the running times of fastAdaboost with Adabag, which is one of the most popular native R based libraries which implements the Adaboost algorithm. The benchmarking indicates that fastAdaboost is about ~45-50 times faster than R based implementation. This is a huge benefit when data sizes are large.

library(microbenchmark)
library(adabag)
library(MASS)
 
#using fastAdaboost
data(bacteria)
print(
  microbenchmark
  ( 
    boost_obj <- adaboost(y~.,bacteria , 10),
    pred <- predict(boost_obj,bacteria) 
  )
  )
#> Unit: milliseconds
#>                                        expr      min       lq    mean
#>  boost_obj <- adaboost(y ~ ., bacteria, 10) 58.01665 58.69384 60.6658
#>        pred <- predict(boost_obj, bacteria) 26.91593 27.41415 29.5689
#>    median       uq      max neval cld
#>  59.20298 60.13180 74.54155   100   b
#>  27.91902 32.50484 37.58375   100  a
 
#using adabag
print(
  microbenchmark
  ( 
    adabag_obj <-boosting(y~.,bacteria,boos=F,mfinal=10),
    pred_adabag <- predict(adabag_obj, bacteria)
  )
  )
#> Unit: milliseconds
#>                                                            expr        min
#>  adabag_obj <- boosting(y ~ ., bacteria, boos = F, mfinal = 10) 2497.55208
#>                    pred_adabag <- predict(adabag_obj, bacteria)   34.50564
#>          lq       mean     median         uq       max neval cld
#>  2659.99737 2848.80065 2809.39769 2988.49017 3629.1527   100   b
#>    35.72336   45.21379   37.16913   42.22947  242.7932   100  a

News

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("fastAdaboost")

1.0.0 by Sourav Chatterjee, 3 years ago


https://github.com/souravc83/fastAdaboost


Report a bug at https://github.com/souravc83/fastAdaboost/issues


Browse source code at https://github.com/cran/fastAdaboost


Authors: Sourav Chatterjee [aut, cre]


Documentation:   PDF Manual  


MIT + file LICENSE license


Imports Rcpp, rpart

Suggests testthat, knitr, MASS

Linking to Rcpp


See at CRAN