Utilities for Multi-Label Learning

Multi-label learning strategies and others procedures to support multi- label classification in R. The package provides a set of multi-label procedures such as sampling methods, transformation strategies, threshold functions, pre-processing techniques and evaluation metrics. A complete overview of the matter can be seen in Zhang, M. and Zhou, Z. (2014) and Gibaja, E. and Ventura, S. (2015) A Tutorial on Multi-label Learning.

Travis-CI Build Status

The utiml package is a framework to support multi-label processing, like Mulan on Weka.

The main methods available on this package are organized in the groups:

  • Classification methods
  • Evaluation methods
  • Pre-process utilities
  • Sampling methods
  • Threshold methods


The installation process is similar to other packages available on CRAN:


This will also install mldr. To run the examples in this document, you also need to install the packages:

# Base classifiers (SVM and Random Forest)
install.packages(c("e1071", "randomForest"))

Install via github (development version)


Multi-label Classification

Running Binary Relevance Method

# Create two partitions (train and test) of toyml multi-label dataset
ds <- create_holdout_partition(toyml, c(train=0.65, test=0.35))
# Create a Binary Relevance Model using e1071::svm method
brmodel <- br(ds$train, "SVM", seed=123)
# Predict
prediction <- predict(brmodel, ds$test)
# Show the predictions
# Apply a threshold
newpred <- rcut_threshold(prediction, 2)
# Evaluate the models
result <- multilabel_evaluate(ds$tes, prediction, "bipartition")
thresres <- multilabel_evaluate(ds$tes, newpred, "bipartition")
# Print the result
print(round(cbind(Default=result, RCUT=thresres), 3))

Running Ensemble of Classifier Chains

# Create three partitions (train, val, test) of emotions dataset
partitions <- c(train = 0.6, val = 0.2, test = 0.2)
ds <- create_holdout_partition(emotions, partitions, method="iterative")
# Create an Ensemble of Classifier Chains using Random Forest (randomForest package)
eccmodel <- ecc(ds$train, "RF", m=3, cores=parallel::detectCores(), seed=123)
# Predict
val <- predict(eccmodel, ds$val, cores=parallel::detectCores())
test <- predict(eccmodel, ds$test, cores=parallel::detectCores())
# Apply a threshold
thresholds <- scut_threshold(val, ds$val, cores=parallel::detectCores())
new.val <- fixed_threshold(val, thresholds)
new.test <- fixed_threshold(test, thresholds)
# Evaluate the models
measures <- c("subset-accuracy", "F1", "hamming-loss", "macro-based") 
result <- cbind(
  Test = multilabel_evaluate(ds$tes, test, measures),
  TestWithThreshold = multilabel_evaluate(ds$tes, new.test, measures),
  Validation = multilabel_evaluate(ds$val, val, measures),
  ValidationWithThreshold = multilabel_evaluate(ds$val, new.val, measures)
print(round(result, 3))

More examples and details are available on functions documentations and vignettes, please refer to the documentation.

How to cite?

  author = {Adriano Rivolli and Andre C. P. L. F. de Carvalho},
  title = {{The utiml Package: Multi-label Classification in R}},
  year = {2018},
  journal = {{The R Journal}},
  doi = {10.32614/RJ-2018-041},
  url = {https://doi.org/10.32614/RJ-2018-041},
  pages = {24--37},
  volume = {10},
  number = {2}



utiml 0.1.5 (current)

Minor changes

  • cv method also returns the prediction

Bug fixes

  • macro-AUC for constant score predictions
  • validation fold
  • set.seed suppress warnings

utiml 0.1.4

New Features

  • MLKNN algorithm
  • ranking-loss baseline
  • label problem evaluation measures
  • kfold bult-in method
  • The foodtruck dataset
  • ESL algorithm

Minor changes

  • confusion matrix in matrix format

Bug fixes

  • Stratification sampling to support instances without labels
  • Fixed threshold with multiple values
  • Update documentation

utiml 0.1.3

Major changes

  • Change multilabel_evaluation to also return the label measures

Bug fixes

  • Bugfix in brplus because the newfeatures were using different levels
  • Fix baseline using hamming-loss to prevent empty label prediction
  • Fix empty prediction when all labels have the same probability

Minor changes

  • Fix type mistakes in documentation

utiml 0.1.2

Major changes

  • change base.method parameter name for base.algorithm

Bug fixes

  • Bugfix in homer to deal with labels without intances and to predict instances based on the meta-label scores
  • Refactory of merge_mlconfmat
  • Ensure reproducibility in all cases

utiml 0.1.1

New multi-label transformation methods including pairwise and multiclass approaches. Some fixes from previous version.

Major changes

  • lcard threshold calibration
  • Use categorical attributes in multilabel datasets and methods
  • LIFT multi-label classification method
  • RPC multi-label classification method
  • CRL multi-label classification method
  • LP multi-label classification method
  • RAkEL multi-label classification method
  • BASELINE multi-label classification method
  • PPT multi-label classification method
  • PS multi-label classification method
  • EPS multi-label classification method
  • HOMER multi-label classification method

Minor changes

  • Add Empty Model as base method to fix training labels with few examples
  • multilabel_confusion_matrix accepts a data.frame or matrix with the predicitons
  • Change EBR and ECC to use threshold calibration
  • Include empty.prediction configuration to enable/disable empty predictions

Bug fixes

  • Majority Ensemble Predictions Votes
  • Majority Ensemble Predictions Probability
  • Base method not found message error
  • Base method support any attribute names
  • Normalize data ignore attributes with a single value
  • MBR support labels without positive examples
  • Fix average precision and coverage measures to support instances without labels

utiml 0.1.0

First release of utiml:

  • Classification methods: Binary Relevance (BR); BR+; Classifier Chains; ConTRolled Label correlation exploitation (CTRL); Dependent Binary Relevance (DBR); Ensemble of Binary Relevance (EBR); Ensemble of Classifier Chains (ECC); Meta-Binary Relevance (MBR or 2BR); Nested Stacking (NS); Pruned and Confident Stacking Approach (Prudent); and, Recursive Dependent Binary Relevance (RDBR)
  • Evaluation methods: Create a multi-label confusion matrix and multi-label measures
  • Pre-process utilities: fill sparse data; normalize data; remove attributes; remove labels; remove skewness labels; remove unique attributes; remove unlabeled instances; and, replace nominal attributes
  • Sampling methods: Create subsets of multi-label dataset; create holdout and k-fold partitions; and, stratification methods
  • Threshold methods: Fixed threshold; MCUT; PCUT; RCUT; SCUT; and, subset correction
  • Synthetic dataset: toyml

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.


0.1.7 by Adriano Rivolli, 8 months ago


Report a bug at https://github.com/rivolli/utiml

Browse source code at https://github.com/cran/utiml

Authors: Adriano Rivolli [aut, cre]

Documentation:   PDF Manual  

GPL-3 license

Imports stats, utils, methods

Depends on mldr, parallel, ROCR

Suggests C50, e1071, infotheo, kknn, knitr, randomForest, rmarkdown, markdown, rpart, testthat, xgboost

See at CRAN