Meta-Feature Extractor

Extracts meta-features from datasets to support the design of recommendation systems based on Meta-Learning. The meta-features, also called characterization measures, are able to characterize the complexity of datasets and to provide estimates of algorithm performance. The package contains not only the standard characterization measures, but also more recent characterization measures. By making available a large set of meta-feature extraction functions, tasks like comprehensive data characterization, deep data exploration and large number of Meta-Learning based data analysis can be performed. These concepts are described in the paper: Fabio Pinto, Carlos Soares, and Joao Mendes-Moreira. Towards automatic generation of metafeatures. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pages 215 - 226, 2016, .


Travis-CI Build Status

Extracts meta-features from datasets to support the design of recommendation systems based on Meta-Learning (MtL). The meta-features, also called characterization measures, are able to characterize the complexity of datasets and to provide estimates of algorithm performance. The package contains not only the standard, but also more recent characterization measures. By making available a large set of meta-feature extraction functions, this package allows a comprehensive data characterization, a deep data exploration and a large number of MtL-based data analysis.

Measures

In MtL, meta-features are designed to extract general properties able to characterize datasets. The meta-feature values should provide relevant evidences about the performance of algorithms, allowing the design of MtL-based recommendation systems. Thus, these measures must be able to predict, with a low computational cost, the performance of the algorithms under evaluation. In this package, the meta-feature measures are divided into five groups:

  • General: General information related to the dataset, also known as simple measures, such as number of instances, attributes and classes.
  • Statistical: Standard statistical measures to describe the numerical properties of a distribution of data.
  • Information-theoretic: Particularly appropriate to describe discrete (categorical) attributes and their relationship with the classes.
  • Model-based: Measures designed to extract characteristics like the depth, the shape and size of a Decision Tree (DT) model induced from a dataset.
  • Landmarking: Represents the performance of simple and efficient learning algorithms.

Installation

The installation process is similar to other packages available on CRAN:

install.packages("mfe")

It is possible to install the development version using:

if (!require("devtools")) {
    install.packages("devtools")
}
devtools::install_github("rivolli/mfe")
library("mfe")

Example of use

The simplest way to extract meta-features is using the metafeatures method. The method can be called by a symbolic description of the model or by a data frame. The parameters are the dataset and the group of measures to be extracted. The default parameter is extract all the measures. To extract a specific measure, use the function related with the group. A simple example is given next:

## Extract all measures using formula
metafeatures(Species ~ ., iris)
 
## Extract all measures using data frame
metafeatures(iris[,1:4], iris[,5])
 
## Extract general, statistical and information-theoretic measures
metafeatures(Species ~ ., iris, groups=c("general", "statistical", "infotheo"))
 
## Extract the DT model based measures
model.based(Species ~ ., iris)
 
## Show the the available groups
ls.metafeatures()

Several measures return more than one value. To aggregate the returned values, post processed methods can be used. This method can compute min, max, mean, median, kurtosis, standard deviation, among others (see the post.processing documentation for more details). The default methods are the mean and the sd. Next, it is possible to see an example of the use of this method:

## Extract all measures using min, median and max 
metafeatures(Species ~ ., iris, summary=c("min", "median", "max"))
                          
## Extract all measures using quantile
metafeatures(Species ~ ., iris, summary="quantile")

Developer notes

In the current version, the meta-feature extractor supports only classification problems. The authors plan to extend the package to add clustering and regression measures and to support MtL evaluation measures. For more specific information on how to extract each group of measures, please refer to the functions documentation page and the examples contained therein. For a general overview of the mfe package, please have a look at the associated vignette.

To cite mfe in publications use:

  • Rivolli, A., Garcia, L. P. F., Soares, C., Vanschoren, J., and de Carvalho, A. C. P. L. F. (2018). Towards Reproducible Empirical Research in Meta-Learning. arXiv:1808.10406

To submit bugs and feature requests, report at project issues.

News

Version 0.1.2 [current]

Minor changes

  • Change license

Bugfixes

  • Support dataset with weird attribute's names
  • Changed the comparison of the nrNorm statistical meta-feature

Version 0.1.1

Minor changes

  • Add new measures in all groups
  • Support to new summarization techniques in post.processing method
  • Increase the robustness of the Decision Tree algorithm to imbalanced datasets
  • Change in the Decision Tree algorithm to support more Landmarking measures
  • Support categorical attributes on the 1NN and eNN measures

Bugfixes

  • Fix Decision Tree model errors related with minority class with an unique instance
  • Support datasets that have columns named with numbers
  • Fix harmonic and geometric mean

First release of mfe:

  • General meta-features
  • Statistical meta-features
  • Discriminant meta-features
  • Information theoretical meta-features
  • Model based Decision Tree meta-features
  • Landmarking meta-features

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("mfe")

0.1.2 by Adriano Rivolli, a month ago


https://github.com/rivolli/mfe


Report a bug at https://github.com/rivolli/mfe/issues


Browse source code at https://github.com/cran/mfe


Authors: Adriano Rivolli [aut, cre] , Luis P. F. Garcia [aut] , Andre C. P. L. F. de Carvalho [ths]


Documentation:   PDF Manual  


MIT + file LICENSE license


Imports cluster, e1071, infotheo, MASS, rpart, rrcov, stats, utils

Suggests knitr, rmarkdown, testthat


See at CRAN