Preprocessing Algorithms for Imbalanced Datasets

Class imbalance usually damages the performance of classifiers. Thus, it is important to treat data before applying a classifier algorithm. This package includes recent resampling algorithms in the literature: (Barua et al. 2014) ; (Das et al. 2015) , (Zhang et al. 2014) ; (Gao et al. 2014) ; (Almogahed et al. 2014) . It also includes an useful interface to perform oversampling.


BuildStatus minimal Rversion CRAN_Status_Badge packageversion

imbalance provides a set of tools to work with imbalanced datasets: novel oversampling algorithms, filtering of instances and evaluation of synthetic instances.

Installation

You can install imbalance from Github with:

# install.packages("devtools")
devtools::install_github("ncordon/imbalance")

Examples

Run pdfos algorithm on newthyroid1 imbalanced dataset and plot a comparison between attributes.

library("imbalance")
data(newthyroid1)
 
newSamples <- pdfos(newthyroid1, numInstances = 80)
# Join new samples with old imbalanced dataset
newDataset <- rbind(newthyroid1, newSamples)
# Plot a visual comparison between both datasets
plotComparison(newthyroid1, newDataset, attrs = names(newthyroid1)[1:3], cols = 2, classAttr = "Class")

After filtering examples with neater:

filteredSamples <- neater(newthyroid1, newSamples, iterations = 500)
#> [1] "10 samples filtered by NEATER"
filteredNewDataset <- rbind(newthyroid1, filteredSamples)
plotComparison(newthyroid1, filteredNewDataset, attrs = names(newthyroid1)[1:3])

Execute method ADASYN using the wrapper provided by the package, comparing imbalance ratios of the dataset before and after oversampling:

imbalanceRatio(glass0)
#> [1] 0.4861111
newDataset <- oversample(glass0, method = "ADASYN")
imbalanceRatio(newDataset)
#> [1] 0.9722222

News

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("imbalance")

1.0.0 by Ignacio Cordón, a year ago


http://github.com/ncordon/imbalance


Report a bug at http://github.com/ncordon/imbalance/issues


Browse source code at https://github.com/cran/imbalance


Authors: Ignacio Cordón [aut, cre] , Salvador García [aut] , Alberto Fernández [aut] , Francisco Herrera [aut]


Documentation:   PDF Manual  


GPL (>= 2) | file LICENSE license


Imports bnlearn, KernelKnn, ggplot2, utils, stats, mvtnorm, Rcpp, smotefamily, FNN, C50

Suggests testthat, knitr, rmarkdown

Linking to Rcpp, RcppArmadillo


Imported by smartdata.

Suggested by randomForestSRC.


See at CRAN