Extremely Fast Implementation of a Naive Bayes Classifier

This is an extremely fast implementation of a Naive Bayes classifier. This package is currently the only package that supports a Bernoulli distribution, a Multinomial distribution, and a Gaussian distribution, making it suitable for both binary features, frequency counts, and numerical features. Another feature is the support of a mix of different event models. Only numerical variables are allowed, however, categorical variables can be transformed into dummies and used with the Bernoulli distribution. The implementation is largely based on the paper "A comparison of event models for Naive Bayes anti-spam e-mail filtering" written by K.M. Schneider (2003) . Any issues can be submitted to: < https://github.com/mskogholt/fastNaiveBayes/issues>.


fastNaiveBayes

CRAN status Travis build status Codecov test coverage CRAN Downloads Total CRAN Downloads Weekly

Overview

This is an extremely fast implementation of a Naive Bayes classifier. This package is currently the only package that supports a Bernoulli distribution, a Multinomial distribution, and a Gaussian distribution, making it suitable for both binary features, frequency counts, and numerical features. Another feature is the support of a mix of different event models. Only numerical variables are allowed, however, categorical variables can be transformed into dummies and used with the Bernoulli distribution.

This implementation offers a huge performance gain compared to other implementations in R. The execution times were compared on a data set of tweets and this package was found to be around 283 to 34,841 times faster for the Bernoulli event models and 17 to 60 times faster for the Multinomial model. For the Gaussian distribution this package was found to be between 2.8 and 1679 times faster. See the vignette for more details. The implementation is largely based on the paper "A comparison of event models for Naive Bayes anti-spam e-mail filtering" written by K.M. Schneider (2003).

Any issues can be submitted to: https://github.com/mskogholt/fastNaiveBayes/issues.

Installation

Install the package with:

install.packages("fastNaiveBayes")

Or install the development version using devtools with:

library(devtools)
devtools::install_github("mskogholt/fastNaiveBayes")

Usage

rm(list=ls())
library(fastNaiveBayes)
 
cars <- mtcars
y <- as.factor(ifelse(cars$mpg>25,'High','Low'))
x <- cars[,2:ncol(cars)]
dist <- fastNaiveBayes::fastNaiveBayes.detect_distribution(x, nrows = nrow(x))
print(dist)
mod <- fastNaiveBayes.mixed(x,y,laplace = 1)
pred <- predict(mod, newdata = x)
mean(pred!=y)
 
# Bernoulli only
vars <- c(dist$bernoulli, dist$multinomial)
newx <- x[,vars]
for(i in 1:ncol(newx)){
 newx[[i]] <- as.factor(newx[[i]])
}
new_mat <- model.matrix(y ~ . -1, cbind(y,newx))
mod <- fastNaiveBayes.bernoulli(new_mat, y, laplace = 1)
pred <- predict(mod, newdata = new_mat)
mean(pred!=y)
 
# Construction sparse Matrix:
mod <- fastNaiveBayes.bernoulli(new_mat, y, laplace = 1, sparse = TRUE)
pred <- predict(mod, newdata = new_mat)
mean(pred!=y)
 
# OR:
new_mat <- Matrix::Matrix(as.matrix(new_mat), sparse = TRUE)
mod <- fastNaiveBayes.bernoulli(new_mat, y, laplace = 1)
pred <- predict(mod, newdata = new_mat)
mean(pred!=y)
 
# Multinomial only
vars <- c(dist$bernoulli, dist$multinomial)
newx <- x[,vars]
mod <- fastNaiveBayes.multinomial(newx, y, laplace = 1)
pred <- predict(mod, newdata = newx)
mean(pred!=y)
 
# Gaussian only
vars <- c('hp', dist$gaussian)
newx <- x[,vars]
mod <- fastNaiveBayes.gaussian(newx, y)
pred <- predict(mod, newdata = newx)
mean(pred!=y)

News

fastNaiveBayes 1.1.2

New Features

  • threshold in all predict functions to ensure a minimum probability
  • Added tweets and tweetsDTM datasets as example data and for time comparisons
  • Changed Gaussian model to achieve a huge speed-up
  • Removed inefficiencies for both the Bernoulli and Multinomial models. Much faster now.

Bug Fixes

  • With 2x1 matrices error were thrown

Other Changes

  • Removed std_threshold in Gaussian model, not necessary since the introduction of the above threshold feature
  • Changed comparison to other packages in vignette

fastNaiveBayes 1.1.1

New Features

  • Detect distribution. Automatically determine the distributions of a matrix for use with mixed Naive Bayes model
  • A threshold for the standard deviation for the Gaussian event model. This way one can ensure that probabilities are real numbers and not NaN's due to standard deviation being 0.

Bug Fixes

Other Changes

  • Expanded unit tests.
  • Changed comparison to other packages in vignette
  • small change to bernoulli predict function

fastNaiveBayes 1.0.1

Bug Fixes

  • Fixed bug in Gaussian predict function.

Other Changes

  • Changed Readme
  • Changed description
  • Added unit tests and Travis-ci

fastNaiveBayes 1.0.0

Initial Release of package

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("fastNaiveBayes")

2.1.0 by Martin Skogholt, 3 months ago


https://github.com/mskogholt/fastNaiveBayes


Report a bug at https://github.com/mskogholt/fastNaiveBayes/issues


Browse source code at https://github.com/cran/fastNaiveBayes


Authors: Martin Skogholt


Documentation:   PDF Manual  


GPL-3 license


Imports Matrix, stats

Suggests knitr, rmarkdown, testthat


See at CRAN