An implementation of several Hierarchical Ensemble Methods (HEMs) for Directed Acyclic Graphs (DAGs). 'HEMDAG' package: 1) reconciles flat predictions with the topology of the ontology; 2) can enhance the predictions of virtually any flat learning methods by taking into account the hierarchical relationships between ontology classes; 3) provides biologically meaningful predictions that always obey the true-path-rule, the biological and logical rule that governs the internal coherence of biomedical ontologies; 4) is specifically designed for exploiting the hierarchical relationships of DAG-structured taxonomies, such as the Human Phenotype Ontology (HPO) or the Gene Ontology (GO), but can be safely applied to tree-structured taxonomies as well (as FunCat), since trees are DAGs; 5) scales nicely both in terms of the complexity of the taxonomy and in the cardinality of the examples; 6) provides several utility functions to process and analyze graphs; 7) provides several performance metrics to evaluate HEMs algorithms. (Marco Notaro, Max Schubach, Peter N. Robinson and Giorgio Valentini (2017)
Do.GPAV.holdout
;precision.at.all.recall.levels.single.class
(labels are all negatives/positives);precision.at.given.recall.levels.over.classes
(labels in a fold are all negatives/positives);do.stratified.cv.data.single.class
(sampling of the labels with just one positive/negative);compute.performance
to the following high level functions:
Do.TPR.DAG
and Do.TPR.DAG.holdout
;Do.HTD
and Do.HTD.holdout
;Do.GPAV
and Do.GPAV.holdout
;Do.heuristic.methods
and Do.heuristic.methods.holdout
;lexicographical.topological.sort
;precrec
package:
precision.at.all.recall.levels.single.class
;PXR.at.multiple.recall.levels.over.classes
substituted with precision.at.given.recall.levels.over.classes
;.txt
) or compressed (.gz
);CRAN
Package Check Results: removed unneeded header and define from GPAV C++
source codeGPAV
algorithm (Burdakov et al., Journal of Computational Mathematics, 2006 -- link);GPAV
algorithm in the top-down step of the functions TPR.DAG
, Do.TPR.DAG
and Do.TPR.DAG.holdout
;help("HEMDAG-defunct")
;C++
code of GPAV
algorithm;Improved performance metrics:
compute.Fmeasure.multilabel
;PXR.at.multiple.recall.levels.over.classes
;AUPRC
, AUROC
, FMM
, PXR
) can be computed either one-shot or averaged across folds;Improved the high-level hierarchical ensemble functions:
metric
: maximization by FMAX
or PRC
(see manual for further details);do.stratified.cv.data.single.class
;Added TPR-DAG
: function gathering several hierarchical ensemble variants;
Added Do.TPR.DAG
: high-level function to run TPR-DAG
cross-validated experiments;
Added Do.TPR.DAG.holdout
: high-level functions to run TPR-DAG
holdout experiments;
The following TPR-DAG
and DESCENS
high-level functions were removed:
NOTE: all the removed functions can be run opportunely setting the input parameters of the new high-level function
Do.TPR.DAG
(for cross-validated experiments) andDo.TPR.DAG.holdout
(for hold-out experiments);
DESCENS
algorithm;MAX
, AND
, OR
(Obozinski et al., Genome Biology, 2008 -- link);tupla.matrix
function;CITATION
file;