Estimate cutpoints that optimize a specified metric in binary classification tasks
and validate performance using bootstrapping. Some methods for more robust cutpoint
estimation are supported, e.g. a parametric method assuming normal distributions,
bootstrapped cutpoints, and smoothing of the metric values per cutpoint using
Generalized Additive Models. Various plotting functions are included. For an overview
of the package see Thiele and Hirschfeld (2020)
sigfig
argument to print.cutpointr
to allow for specifying the number of
significant digits to be printedadd_metric()
function to add further metrics to the output of cutpointr()
roc01
metric function to calculate the distance of points on the ROC
curve to the point (0,1) on ROC spaceplot_sensitivity_specificity()
if boot_runs = 0
spar = NULL
in maximize_spline_metric
)cutpoint_nr
boot
column is now always returned and NA
, if no bootstrapping was
run, so that the number of returned columns is constantuse_midpoints
is now also passed to method
by cutpointr
to allow for
the calculation of midpoints within maximize_boot_metric
and minimize_boot_metric
,
which before happened in cutpointr
, leading to slightly biased cutpoints
in certain scenariosnknots
is now calculated by
stats::.nknots_smspl
and spar = 1
cutpoint_tol
argument to define a tolerance around the optimized metric,
so that multiple cutpoints in the vicinity of the target metric can be returned
and to avoid not returning other "optimal" cutpoints due to floating-point
problemsbreak_ties = c
break_ties
, the returned main metric is now not the optimal one but the
one corresponding to the summarized cutpoint (thus may be worse than the
optimal one)maximize_gam_metric
and minimize_gam_metric
for smoothing via
generalized additive modelsgeom_ribbon
now use size = 0
to plot no lines around the
(transparent) areasplot_cutpointr
plr
(positive likelihood ratio), nlr
(negative likelihood ratio),
false_discovery_rate
, and false_omission_rate
silent
argument for roc().cutpointr_
now accepts functions instead of character strings as method
or metric
use_midpoints
parameter. If TRUE (default FALSE) the
returned optimal cutpoint will be the mean of the optimal cutpoint and the next
lowest observation (for direction = ">="
) or the next highest observation
(for direction = "<="
)sum_ppvnpv
, prod_ppvnpv
, and abs_d_ppvnpv
to sum_ppv_npv
,
prod_ppv_npv
, and abs_d_ppvnpv
to match the naming scheme to the names of
the metrics that optimize sensitivity and specificitysummary_sd
function now also returns 5% and 95% percentiles that are
included in the output of summary
minimize_boot_metric
and
maximize_boot_metric
was changed from 200 to 50summary
function now returns a data.frame instead of a list, also
the printing method for summary_cutpointr
has been slightly modifiedplot_sensitivity_specificity
for plotting cutpoints vs.
sensitivity and specificity on the y-axisoc_optimalCutpoints
functionROCR
and OptimalCutpoints
by rewriting tests and
storing benchmark resultsdata
argument. Thus, it can be used as before by specifying data
, x
, and class
or alternatively without specifying data
and directly supplying the vectors
of predictions and outcomes as x
and class
.silent
argument for optionally suppressing messages (e.g. which class
is assumed to be the positive one)