Multi-label learning strategies and others procedures to support multi-
label classification in R. The package provides a set of multi-label procedures such as
sampling methods, transformation strategies, threshold functions, pre-processing
techniques and evaluation metrics. A complete overview of the matter can be seen in
Zhang, M. and Zhou, Z. (2014)
The utiml package is a framework to support multi-label processing, like Mulan on Weka.
The main methods available on this package are organized in the groups:
The installation process is similar to other packages available on CRAN:
install.packages("utiml")
This will also install mldr. To run the examples in this document, you also need to install the packages:
# Base classifiers (SVM and Random Forest)install.packages(c("e1071", "randomForest"))
devtools::install_github("rivolli/utiml")
library(utiml) # Create two partitions (train and test) of toyml multi-label datasetds <- create_holdout_partition(toyml, c(train=0.65, test=0.35)) # Create a Binary Relevance Model using e1071::svm methodbrmodel <- br(ds$train, "SVM", seed=123) # Predictprediction <- predict(brmodel, ds$test) # Show the predictionshead(as.bipartition(prediction))head(as.ranking(prediction)) # Apply a thresholdnewpred <- rcut_threshold(prediction, 2) # Evaluate the modelsresult <- multilabel_evaluate(ds$tes, prediction, "bipartition")thresres <- multilabel_evaluate(ds$tes, newpred, "bipartition") # Print the resultprint(round(cbind(Default=result, RCUT=thresres), 3))
library(utiml) # Create three partitions (train, val, test) of emotions datasetpartitions <- c(train = 0.6, val = 0.2, test = 0.2)ds <- create_holdout_partition(emotions, partitions, method="iterative") # Create an Ensemble of Classifier Chains using Random Forest (randomForest package)eccmodel <- ecc(ds$train, "RF", m=3, cores=parallel::detectCores(), seed=123) # Predictval <- predict(eccmodel, ds$val, cores=parallel::detectCores())test <- predict(eccmodel, ds$test, cores=parallel::detectCores()) # Apply a thresholdthresholds <- scut_threshold(val, ds$val, cores=parallel::detectCores())new.val <- fixed_threshold(val, thresholds)new.test <- fixed_threshold(test, thresholds) # Evaluate the modelsmeasures <- c("subset-accuracy", "F1", "hamming-loss", "macro-based") result <- cbind( Test = multilabel_evaluate(ds$tes, test, measures), TestWithThreshold = multilabel_evaluate(ds$tes, new.test, measures), Validation = multilabel_evaluate(ds$val, val, measures), ValidationWithThreshold = multilabel_evaluate(ds$val, new.val, measures)) print(round(result, 3))
More examples and details are available on functions documentations and vignettes, please refer to the documentation.
@article{RJ-2018-041,
author = {Adriano Rivolli and Andre C. P. L. F. de Carvalho},
title = {{The utiml Package: Multi-label Classification in R}},
year = {2018},
journal = {{The R Journal}},
doi = {10.32614/RJ-2018-041},
url = {https://doi.org/10.32614/RJ-2018-041},
pages = {24--37},
volume = {10},
number = {2}
}
cv
method also returns the predictionmultilabel_evaluation
to also return the label measuresbrplus
because the newfeatures were using different levelsbaseline
using hamming-loss to prevent empty label predictionhomer
to deal with labels without intances and to predict instances
based on the meta-label scoresNew multi-label transformation methods including pairwise and multiclass approaches. Some fixes from previous version.
multilabel_confusion_matrix
accepts a data.frame or matrix with the predicitonsFirst release of utiml:
Binary Relevance (BR)
; BR+
; Classifier Chains
;
ConTRolled Label correlation exploitation (CTRL)
; Dependent Binary Relevance (DBR)
;
Ensemble of Binary Relevance (EBR)
; Ensemble of Classifier Chains (ECC)
;
Meta-Binary Relevance (MBR or 2BR)
; Nested Stacking (NS)
;
Pruned and Confident Stacking Approach (Prudent)
; and, Recursive Dependent Binary Relevance (RDBR)