Preprocessing Algorithms for Imbalanced Datasets

Class imbalance usually damages the performance of classifiers. Thus, it is important to treat data before applying a classifier algorithm. This package includes recent resampling algorithms in the literature: (Barua et al. 2014) ; (Das et al. 2015) , (Zhang et al. 2014) ; (Gao et al. 2014) ; (Almogahed et al. 2014) . It also includes an useful interface to perform oversampling.


BuildStatus minimal Rversion CRAN_Status_Badge packageversion

imbalance provides a set of tools to work with imbalanced datasets: novel oversampling algorithms, filtering of instances and evaluation of synthetic instances.

Installation

You can install imbalance from Github with:

# install.packages("devtools")
devtools::install_github("ncordon/imbalance")

Examples

Run pdfos algorithm on newthyroid1 imbalanced dataset and plot a comparison between attributes.

library("imbalance")
data(newthyroid1)
 
newSamples <- pdfos(newthyroid1, numInstances = 80)
# Join new samples with old imbalanced dataset
newDataset <- rbind(newthyroid1, newSamples)
# Plot a visual comparison between both datasets
plotComparison(newthyroid1, newDataset, attrs = names(newthyroid1)[1:3], cols = 2, classAttr = "Class")

After filtering examples with neater:

filteredSamples <- neater(newthyroid1, newSamples, iterations = 500)
#> [1] "10 samples filtered by NEATER"
filteredNewDataset <- rbind(newthyroid1, filteredSamples)
plotComparison(newthyroid1, filteredNewDataset, attrs = names(newthyroid1)[1:3])

Execute method ADASYN using the wrapper provided by the package, comparing imbalance ratios of the dataset before and after oversampling:

imbalanceRatio(glass0)
#> [1] 0.4861111
newDataset <- oversample(glass0, method = "ADASYN")
imbalanceRatio(newDataset)
#> [1] 0.9722222

News

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("imbalance")

1.0.0 by Ignacio Cordón, a year ago


http://github.com/ncordon/imbalance


Report a bug at http://github.com/ncordon/imbalance/issues


Browse source code at https://github.com/cran/imbalance


Authors: Ignacio Cordón [aut, cre] , Salvador García [aut] , Alberto Fernández [aut] , Francisco Herrera [aut]


Documentation:   PDF Manual  


GPL (>= 2) | file LICENSE license


Imports bnlearn, KernelKnn, ggplot2, utils, stats, mvtnorm, Rcpp, smotefamily, FNN, C50

Suggests testthat, knitr, rmarkdown

Linking to Rcpp, RcppArmadillo


Imported by smartdata.


See at CRAN