Accurate calculations and visualization of precision-recall and ROC (Receiver Operator Characteristics)
curves. Saito and Rehmsmeier (2015)
The aim of the
precrec package is to provide an integrated platform that enables robust performance evaluations of binary classifiers. Specifically,
precrec offers accurate calculations of ROC (Receiver Operator Characteristics) and precision-recall curves. All the main calculations of
precrec are implemented with C++/Rcpp.
precrec provides accurate precision-recall curves.
precrec also calculates AUC scores with high accuracy.
precrec calculates curves in a matter of seconds even for a fairly large dataset. It is much faster than most other tools that calculate ROC and precision-recall curves.
In addition to precision-recall and ROC curves,
precrec offers basic evaluation measures.
precrec calculates confidence intervals when multiple test sets are given. It automatically shows confidence bands about the averaged curve in the corresponding plot.
precrec calculates partial AUCs for specified x and y ranges. It can also draw partial ROC and precision-recall curves for the specified ranges.
precrec provides several useful functions that lack in most other evaluation tools.
Install the release version of
precrec from CRAN with
Alternatively, you can install a development version of
precrec from our GitHub repository. To install it:
Make sure you have a working development environment.
devtools from CRAN with
precrec from the GitHub repository with
precrec package provides the following six functions.
|evalmod||Main function to calculate evaluation measures|
|mmdata||Reformat input data for performance evaluation calculation|
|join_scores||Join scores of multiple models into a list|
|join_labels||Join observed labels of multiple test datasets into a list|
|create_sim_samples||Create random samples for simulations|
|format_nfold||Create n-fold cross validation dataset from data frame|
precrec package provides eight S3 generics for the S3 object created by the
evalmod function. N.B. The R language specifies S3 objects and S3 generic functions as part of the most basic object-oriented system in R.
|base||Print the calculation results and the summary of the test data|
|as.data.frame||base||Convert a precrec object to a data frame|
|plot||graphics||Plot performance evaluation measures|
|autoplot||ggplot2||Plot performance evaluation measures with ggplot2|
|fortify||ggplot2||Prepare a data frame for ggplot2|
|auc||precrec||Make a data frame with AUC scores|
|part||precrec||Calculate partial curves and partial AUC scores|
|pauc||precrec||Make a data frame with pAUC scores|
Introduction to precrec - a package vignette that contains the descriptions of the functions with several useful examples. View the vignette with
vignette("introduction", package = "precrec") in R. The HTML version is also available on the GitHub Pages.
Help pages - all the functions including the S3 generics except for
help(package = "precrec") in R. The HTML version is also available on the GitHub Pages.
Following two examples show the basic usage of
evalmod function calculates ROC and Precision-Recall curves and returns an S3 object.
library(precrec)# Load a test datasetdata(P10N10)# Calculate ROC and Precision-Recall curvessscurves <- evalmod(scores = P10N10$scores, labels = P10N10$labels)
autoplot function outputs ROC and Precision-Recall curves by using the
# The ggplot2 package is requiredlibrary(ggplot2)# Show ROC and Precision-Recall plotsautoplot(sscurves)
Precrec: fast and accurate precision-recall and ROC curve calculations in R
Takaya Saito; Marc Rehmsmeier
Bioinformatics 2017; 33 (1): 145-147.
Classifier evaluation with imbalanced datasets - our web site that contains several pages with useful tips for performance evaluation on binary classifiers.
The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets - our paper that summarized potential pitfalls of ROC plots with imbalanced datasets and advantages of using precision-recall plots instead.
Fix a bug with as.data.frame when multiple datasets given
Add format_nfold function to convert a dataframe with n-fold data to a list
Add 'aucroc' mode for fast AUC (ROC)
Change how to treat 'show_cb' and 'raw_curves' options
Improve as.data.frame with Rcpp
Create github pages with pkgdown
Add new measures
New generic function
Improved the testing enviroment
Improved several documents
The first release version of
The package offers five functions