Allows one to compute consonance (confidence)
intervals for various statistical tests along with their corresponding
P-values, S-values, and likelihoods. The intervals can be plotted to
create consonance, surprisal, and likelihood functions allowing one to
see what effect sizes are compatible with the test model at various
consonance levels rather than being limited to one interval estimate
such as 95%. These methods are discussed by Poole C. (1987)
Interval estimates such as
intervals are now widely reported in many journals alongside the exact
P-value of a statistical test and point estimate.
While this is a large improvement over what constituted statistical reporting in the past two decades, it is still largely inadequate.
Take for example, the 95% compatibility interval. As many have stated before, there is nothing special about 95%, yet we rarely see intervals of any other level. Choosing to compute a 95% interval is as mindless as choosing a 5% alpha level for hypothesis testing. A single compatibility interval is only a slice of a wide range of compatibility intervals at different levels. Reporting 95% intervals only promotes cargo-cult statistics since there is not much thought behind the choice. (1)
rather than conscientious practice*.” - Stark & Saltelli, 2018
Thus, we propose that instead of only calculating one interval estimate, every interval associated with a compatibility level be calculated, along with its corresponding P-value and S-value, and plotted to form a function. (2-8)
This can be accomplished using the concurve package in R.
"Statistical software enables and promotes cargo-cult statistics. Marketing and adoption of statistical software are driven by ease of use and the range of statistical routines the software implements. Offering complex and “modern” methods provides a competitive advantage. And some disciplines have in effect standardised on particular statistical software, often proprietary software.
Statistical software does not help you know what to compute, nor how to interpret the result. It does not offer to explain the assumptions behind methods, nor does it flag delicate or dubious assumptions. It does not warn you about multiplicity or p-hacking. It does not check whether you picked the hypothesis or analysis after looking at the data, nor track the number of analyses you tried before arriving at the one you sought to publish – another form of multiplicity. The more “powerful” and “user-friendly” the software is, the more it invites cargo-cult statistics." - Stark & Saltelli, 2018