Preprocessing Algorithms for Imbalanced Datasets

Class imbalance usually damages the performance of classifiers. Thus, it is important to treat data before applying a classifier algorithm. This package includes recent resampling algorithms in the literature: (Barua et al. 2014) ; (Das et al. 2015) , (Zhang et al. 2014) ; (Gao et al. 2014) ; (Almogahed et al. 2014) . It also includes an useful interface to perform oversampling.

BuildStatus minimal Rversion CRAN_Status_Badge packageversion

imbalance provides a set of tools to work with imbalanced datasets: novel oversampling algorithms, filtering of instances and evaluation of synthetic instances.


You can install imbalance from Github with:

# install.packages("devtools")


Run pdfos algorithm on newthyroid1 imbalanced dataset and plot a comparison between attributes.

newSamples <- pdfos(newthyroid1, numInstances = 80)
# Join new samples with old imbalanced dataset
newDataset <- rbind(newthyroid1, newSamples)
# Plot a visual comparison between both datasets
plotComparison(newthyroid1, newDataset, attrs = names(newthyroid1)[1:3], cols = 2, classAttr = "Class")

After filtering examples with neater:

filteredSamples <- neater(newthyroid1, newSamples, iterations = 500)
#> [1] "10 samples filtered by NEATER"
filteredNewDataset <- rbind(newthyroid1, filteredSamples)
plotComparison(newthyroid1, filteredNewDataset, attrs = names(newthyroid1)[1:3])

Execute method ADASYN using the wrapper provided by the package, comparing imbalance ratios of the dataset before and after oversampling:

#> [1] 0.4861111
newDataset <- oversample(glass0, method = "ADASYN")
#> [1] 0.9722222


Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.


1.0.0 by Ignacio Cordón, a year ago

Report a bug at

Browse source code at

Authors: Ignacio Cordón [aut, cre] , Salvador García [aut] , Alberto Fernández [aut] , Francisco Herrera [aut]

Documentation:   PDF Manual  

GPL (>= 2) | file LICENSE license

Imports bnlearn, KernelKnn, ggplot2, utils, stats, mvtnorm, Rcpp, smotefamily, FNN, C50

Suggests testthat, knitr, rmarkdown

Linking to Rcpp, RcppArmadillo

Imported by smartdata.

Suggested by randomForestSRC.

See at CRAN