Preprocessing Algorithms for Imbalanced Datasets

Class imbalance usually damages the performance of classifiers. Thus, it is important to treat data before applying a classifier algorithm. This package includes recent resampling algorithms in the literature: (Barua et al. 2014) ; (Das et al. 2015) , (Zhang et al. 2014) ; (Gao et al. 2014) ; (Almogahed et al. 2014) . It also includes an useful interface to perform oversampling.

BuildStatus minimal Rversion CRAN_Status_Badge packageversion

imbalance provides a set of tools to work with imbalanced datasets: novel oversampling algorithms, filtering of instances and evaluation of synthetic instances.


You can install imbalance from Github with:

# install.packages("devtools")


Run pdfos algorithm on newthyroid1 imbalanced dataset and plot a comparison between attributes.

newSamples <- pdfos(newthyroid1, numInstances = 80)
# Join new samples with old imbalanced dataset
newDataset <- rbind(newthyroid1, newSamples)
# Plot a visual comparison between both datasets
plotComparison(newthyroid1, newDataset, attrs = names(newthyroid1)[1:3], cols = 2, classAttr = "Class")

After filtering examples with neater:

filteredSamples <- neater(newthyroid1, newSamples, iterations = 500)
#> [1] "10 samples filtered by NEATER"
filteredNewDataset <- rbind(newthyroid1, filteredSamples)
plotComparison(newthyroid1, filteredNewDataset, attrs = names(newthyroid1)[1:3])

Execute method ADASYN using the wrapper provided by the package, comparing imbalance ratios of the dataset before and after oversampling:

#> [1] 0.4861111
newDataset <- oversample(glass0, method = "ADASYN")
#> [1] 0.9722222


Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("imbalance") by Ignacio Cordón, a year ago

Report a bug at

Browse source code at

Authors: Ignacio Cordón [aut, cre] , Salvador García [aut] , Alberto Fernández [aut] , Francisco Herrera [aut]

Documentation:   PDF Manual  

GPL (>= 2) | file LICENSE license

Imports bnlearn, KernelKnn, ggplot2, utils, stats, mvtnorm, Rcpp, smotefamily, FNN, C50

Suggests testthat, knitr, rmarkdown

Linking to Rcpp, RcppArmadillo

Suggested by randomForestSRC.

See at CRAN