sortinghat

sortinghat is a classification framework to streamline the evaluation of classifiers (classification models and algorithms) and seeks to determine the best classifiers on a variety of simulated and benchmark data sets. Several error-rate estimators are included to evaluate the performance of a classifier. This package is intended to complement the well-known 'caret' package.


The sortinghat package is a framework in R to streamline the evaluation of classifiers (classification models and algorithms) and seeks to determine the best classifiers on a variety of simulated and benchmark data sets with a collection of benchmark metrics.

Installation

You can install the stable version on CRAN:

install.packages('sortinghat', dependencies = TRUE)

If you prefer to download the latest version, instead type:

library(devtools)
install_github('sortinghat', 'ramey')

Benchmarking

A primary goal of sortinghat is to enable rapid benchmarking across a variety of classification scenarios. To achieve this, we provide a large selection of both real and simulated data sets collected from the literature and around the Internet. With sortinghat, researchers can quickly replicate findings within the literature as well as rapidly prototype new classifiers.

The list of real and simulated data sets will continue to grow. Contributions are greatly appreciated as pull requests.

Data Sets

Benchmark data sets are useful for evaluating and comparing classifiers...

(Work in Progress: Version 0.2 will include a collection of benchmark data sets)

Simulated Data Sets

In addition to benchmark data sets, sortinghat provide a large collection of data-generating models for simulations based on studies in the literature. Thus far, we have added multivariate simulation models based on the following family of distributions:

  • Multivariate Normal
  • Multivariate Student's t
  • Multivariate Contaminated Normal
  • Multivariate Uniform

Moreover, data can be generated based on the well-known configurations from:

The simulated data sets listed above can be generated via the simdata function.

Error-Rate Estimation

Classifier superiority is often determined by classification error rate (1 - accuracy). To assess classification efficacy, we utilize the following error-rate estimators:

Each of these error rates can be accessed via the errorest function, which acts as a wrapper around the error-rate estimators listed above.

News

sortinghat 0.1

  • Initial release of sortinghat

New Features

  • Simulated data sets and configurations are each available in functions prefaced with simdata_. The simdata function is a wrapper around each of these. See ?simdata for a list of all the available simulated data sets and the implementation details.

  • Several error-rate estimators are available, including cross-validation, .632, .632+, and others. The name of each estimator's function is prefaced with errorest_. Also, errorest is a wrapper function around the error-rate estimators implemented. See ?errorest for a list of all available error-rate estimators and the implementation details.

Miscellaneous

  • cv_partition: Partitions data for cross-validation.

  • partition_data: Randomly partitions data sets into training and test data sets with a specified percentage in each.

  • which_min: Determines the index (location) of the minimum element in a vector. Breaks ties in a variety of ways -- in particular, at random. This function is intended to replace the base which.min function.

  • cov_intraclass: Constructs a p-dimensional intraclass covariance matrix.

  • cov_autocorrelation: Constructs a p-dimensional covariance matrix with an autocorrelation structure.

  • cov_block_autocorrelation: Constructs a p-dimensional block-diagonal covariance matrix with autocorrelated blocks. Based on Guo, Hastie, and Tibshirani (2007).

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("sortinghat")

0.1 by John A. Ramey, 5 years ago


http://github.com/ramhiser/sortinghat


Browse source code at https://github.com/cran/sortinghat


Authors: John A. Ramey


Documentation:   PDF Manual  


MIT + file LICENSE license


Imports MASS, bdsmatrix, mvtnorm

Suggests testthat


See at CRAN