Utilities for Multi-Label Learning

Multi-label learning strategies and others procedures to support multi- label classification in R. The package provides a set of multi-label procedures such as sampling methods, transformation strategies, threshold functions, pre-processing techniques and evaluation metrics. A complete overview of the matter can be seen in Zhang, M. and Zhou, Z. (2014) and Gibaja, E. and Ventura, S. (2015) .


The utiml package is a framework to support multi-label processing, like Mulan on Weka.

The main methods available on this package are organized in the groups:

  • Classification methods
  • Evaluation methods
  • Pre-process utilities
  • Sampling methods
  • Threshold methods

Instalation

The installation process is similar to other packages available on CRAN:

install.packages("utiml")

This will also install mldr. To run the examples in this document, you also need to install the packages:

# Base classifiers (SVM and Random Forest)
install.packages(c("e1071", "randomForest"))
devtools::install_github("rivolli/utiml")

Multi-label Classification

library(utiml)
 
# Create two partitions (train and test) of toyml multi-label dataset
ds <- create_holdout_partition(toyml, c(train=0.65, test=0.35))
 
# Create a Binary Relevance Model using e1071::svm method
brmodel <- br(ds$train, "SVM", seed=123)
 
# Predict
prediction <- predict(brmodel, ds$test)
 
# Show the predictions
head(as.bipartition(prediction))
head(as.ranking(prediction))
 
# Apply a threshold
newpred <- rcut_threshold(prediction, 2)
 
# Evaluate the models
result <- multilabel_evaluate(ds$tes, prediction, "bipartition")
thresres <- multilabel_evaluate(ds$tes, newpred, "bipartition")
 
# Print the result
print(round(cbind(Default=result, RCUT=thresres), 3))
library(utiml)
 
# Create three partitions (train, val, test) of emotions dataset
partitions <- c(train = 0.6, val = 0.2, test = 0.2)
ds <- create_holdout_partition(emotions, partitions, method="iterative")
 
# Create an Ensemble of Classifier Chains using Random Forest (randomForest package)
eccmodel <- ecc(ds$train, "RF", m=3, cores=parallel::detectCores(), seed=123)
 
# Predict
val <- predict(eccmodel, ds$val, cores=parallel::detectCores())
test <- predict(eccmodel, ds$test, cores=parallel::detectCores())
 
# Apply a threshold
thresholds <- scut_threshold(val, ds$val, cores=parallel::detectCores())
new.val <- fixed_threshold(val, thresholds)
new.test <- fixed_threshold(test, thresholds)
 
# Evaluate the models
measures <- c("subset-accuracy", "F1", "hamming-loss", "macro-based") 
 
result <- cbind(
  Test = multilabel_evaluate(ds$tes, test, measures),
  TestWithThreshold = multilabel_evaluate(ds$tes, new.test, measures),
  Validation = multilabel_evaluate(ds$val, val, measures),
  ValidationWithThreshold = multilabel_evaluate(ds$val, new.val, measures)
)
 
print(round(result, 3))

More examples and details are available on functions documentations and vignettes, please refer to the documentation.

News

Changelog

New multi-label transformation methods including pairwise and multiclass approaches. Some fixes from previous version.

  • lcard threshold calibration
  • Use categorical attributes in multilabel datasets and methods
  • LIFT multi-label classification method
  • RPC multi-label classification method
  • CRL multi-label classification method
  • LP multi-label classification method
  • RAkEL multi-label classification method
  • BASELINE multi-label classification method
  • PPT multi-label classification method
  • PS multi-label classification method
  • EPS multi-label classification method
  • HOMER multi-label classification method
  • Add Empty Model as base method to fix training labels with few examples
  • multilabel_confusion_matrix accepts a data.frame or matrix with the predicitons
  • Change EBR and ECC to use threshold calibration
  • Include empty.prediction configuration to enable/disable empty predictions
  • Majority Ensemble Predictions Votes
  • Majority Ensemble Predictions Probability
  • Base method not found message error
  • Base method support any attribute names
  • Normalize data ignore attributes with a single value
  • MBR support labels without positive examples
  • Fix average precision and coverage measures to support instances without labels

First release of utiml:

  • Classification methods: Binary Relevance (BR); BR+; Classifier Chains; ConTRolled Label correlation exploitation (CTRL); Dependent Binary Relevance (DBR); Ensemble of Binary Relevance (EBR); Ensemble of Classifier Chains (ECC); Meta-Binary Relevance (MBR or 2BR); Nested Stacking (NS); Pruned and Confident Stacking Approach (Prudent); and, Recursive Dependent Binary Relevance (RDBR)
  • Evaluation methods: Create a multi-label confusion matrix and multi-label measures
  • Pre-process utilities: fill sparce data; normalize data; remove attributes; remove labels; remove skewness labels; remove unique attributes; remove unlabeled instances; and, replace nominal attributes
  • Sampling methods: Create subsets of multi-label dataset; create holdout and k-fold partitions; and, stratification methods
  • Threshold methods: Fixed threshold; MCUT; PCUT; RCUT; SCUT; and, subset correction
  • Synthetic dataset: toyml

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("utiml")

0.1.3 by Adriano Rivolli, 3 months ago


https://github.com/rivolli/utiml


Report a bug at https://github.com/rivolli/utiml


Browse source code at https://github.com/cran/utiml


Authors: Adriano Rivolli [aut, cre]


Documentation:   PDF Manual  


GPL | file LICENSE license


Imports stats, utils

Depends on mldr

Suggests C50, e1071, FSelector, infotheo, kknn, knitr, parallel, randomForest, rJava, rmarkdown, rpart, RWeka, testthat, xgboost


See at CRAN